centos l10n problem

Just about the time I believe the UTF-8 beast is in the cage, it escapes and runs amok.

This AM, I started to deploy an update to the webapp on EC2. Seems that some of the static strings in the app contained UTF-8 encoded non-ascii characters. The java compiler barfed. “The heck?”, I thought. I just compiled the app on my MacBook. I checked the usual suspects (tomcat’s server.xml, JAVA_OPTS) but everything looked fine. However, when I looked at the code, it was indeed mangled.

Crap! Was this a bug in CVS? (Yes, we still use CVS). Wait. What if I cut and paste the correct code from my Mac to the Centos server version. No luck. Couldn’t be vi. Trusty old vi. Could it be that Centos is confused? Let’s look:

$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

What the…?

I don’t know what I did but when I created my ec2 image, I must have omitted a step. None of the googled web-geniuses had solved this exact problem but it seems everyone flails about with LANG environment variable.

export LANG=en_US.UTF-8

That did the trick!

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

A fresh cvs checkout and I was back in business. I don’t feel I completely understand Centos localization configuration. At least I’m aware of it, now.

2 Comments:

  1. For better localization, I suggest using https://poeditor.com/. It didn’t cause any problems to me, it worked just great until now.

    Andy B

    2013.03.04
    03:19

  2. @Andy, I checked out ‘PO Editor’. It solves the problem of localizing an application (or managing the process thereof.) I wrote this article to describe how I configured the development and production environments to handle UTF-8 encoded characters. Reflecting upon the differences points to an error in my chosen title. This article should be entitled ‘centos i18n problem’ since the configuration merely enables the processing of multiple languages but does not in itself address l10n. Cheers, Kelly (红豹)

    kelly

    2013.03.04
    09:15

Your email will never published nor shared. Required fields are marked *...

*

*

Type your comment out: