Convert mysql database from latin1 to utf8 the RIGHT way

You’ll see many blog posts around the interwebs stating that you can just dump a mysql database via mysqldump – globally replace “latin1” (or some other character set) in the dump file – and then import that into a utf8 database and it’ll just work. This appears, however, to be WRONG. It does not force mysql to convert the text, it only fools you into believing that that your latin1 characters have been converted. You have to actually convert the text yourself, the columns will just be unconverted latin1 sitting in a utf8 table.

One way to do this is to convert the column in question to binary and back again – assuming your database/table is set to utf8, this will force MySQL to convert the character set correctly.

Another – better – way is to just use iconv to convert during the dump process. This will convert latin1 characters to utf8 properly.

mysqldump --add-drop-table database_to_correct | replace CHARSET=latin1 CHARSET=utf8 | iconv -f latin1 -t utf8 | mysql database_to_correct

PLEASE correct me if I’m wrong – this seems like yet another mysql idiosyncrasy that shouldn’t exist.

7 thoughts on “Convert mysql database from latin1 to utf8 the RIGHT way

  1. Very Interesting solution it appends so Often that I dump MySql Data And get Weird Characters from the Latin encoding.

    Thanks for sharing

  2. Thanks for sharing.
    Im getting an “ERROR 1253 (42000) at line 199: COLLATION ‘latin1_general_ci’ is not valid for CHARACTER SET ‘utf8′” Do you know what to do?

    Thanks
    Felix

  3. Hey Dan,

    Just saw this and wanted to point out that since you dealt with this/posted this I have completely rewritten the WordPress Codex article about converting database charsets:

    https://codex.wordpress.org/Converting_Database_Character_Sets

    The article that was there was insanely bad and barely helpful. It had clearly been written by way too many authors who paid no attention to whether it made sense as a whole. It was also littered with links to plugins written when WP first changed the default charset (when conversion was a popular need), but all the plugins were totally non-functional and the links were just confusing.

    The new article takes the approach that you should write and test a SQL script that does the conversion. It takes time and needs to be custom for each site but is safe and effective and the guide is exthaustive in explaining the strange details of MySQL’s insane demands. Just wanted to make sure there was a link on this post in case anyone stumbles here looking for advice. I meant to post about my changes to the Codex page on my own blog but like all good ideas, these things sometimes never happen ;)

  4. Sweet! We’ve had need to do this for many apps beyond just WordPress, and those instructions look generic enough to be useful everywhere.

  5. This is so wrong, you cannot automatically use replace to change the strings!
    Imagine a blog of yours, that you will dump into a file, and you will loose data from the dump that you are converting because of converting in fact the contents of the posts!

  6. @Radek: That is the point, to modify characters forcefully so you can start with a fully UTF8 database. Iconv does a pretty good job depending on the state of your data.

Comments are closed.