- Make sure your DB is configured to support UTF-8. Configuration is DB specific so please see documentation for your respective DB.
- Make sure your Ruby source code supports UTF-8. You might be surprised to find out that ruby 1.9 encodes your source code as US-ASCII by default. Take some time to learn about the magic encoding comment.
- Make sure your regexes support UTF-8. Use posix character properties instead of standard ASCII character classes like \w \s \d
- Upcase and downcase won't work for UTF-8 strings, but there is a gem for that! Checkout unicode_utils
- If you want to compare unicode strings in MySQL, have a look at collation in their documentation and know the difference between: utf8_general_ci and utf8_bin. You might be surprised how loose the default matching is.
Tuesday, January 22, 2013
Tips for Easy UTF-8 Ruby Adventuring
Getting that search box working in Esperanto? Cherokee? Pull on your wading boots because your walking into deep waters. I can't make you an expert in UTF-8 but I can recommend that you know the following stuff before you venture forth:
Posted by Shlomo at 10:46 PM