MacRoman encoding creeps into Maven

You’d think in this day and age that modern operating systems, especially OS X, would be set for UTF8 handling by default. Not so. My previous post, centos l10n problem, showed that CentOS defaults to set its locale LANG as POSIX rather than UTF8.

Mac takes the lunacy one step further. Or should I say one step backwards in time.

I use maven2 as my build manager. Normally, I ignore the stream of info at the beginning of a build, Either it succeeds (yeah) or it fails. Either way, I’ve been more interested in seeing the end result; You know, those last few lines rather than the first few lines.

One day, I started tracking down all the warnings and errors which popped up during maven builds and tomcat startups. I noticed this one.

$ mvn -Pdevelopment clean compile package war:inplace
[INFO] Scanning for projects...

    <!-- snip -->

[WARNING] Using platform encoding (MacRoman actually) to copy↩
filtered resources, i.e. build is platform dependent!

    <!-- snip -->

[INFO] ----------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ----------------------------------------------------------------
[INFO] Total time: 8 seconds
[INFO] Finished at: Tue Apr 07 23:40:06 PDT 2009
[INFO] Final Memory: 26M/63M
[INFO] ----------------------------------------------------------------

If you’ve ever had to trace down all the UTF8 failure points in a system then you know this maxim: “Suffer not a UTF8 Failure to Live.” Once you have a failure point, Latin1 (or worse in this case–MacRoman) will leak into your database and rot your data like a cancer.

I really should hunt down the BSD system configuration equivalents to Linux but here’s a solution that is quick and easy: add a project.build.sourceEncoding element and a project.reporting.outputEncoding to your pom.xml.

<project
  xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 ↩
http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>mywebapp</artifactId>
  <packaging>war</packaging>
  <version>1.21</version>
  <name>mywebapp</name>

  <properties>

    <project.build.sourceEncoding>
      UTF-8
    </project.build.sourceEncoding>

    <project.reporting.outputEncoding>
      UTF-8
    </project.reporting.outputEncoding>

  </properties>

    <!-- snip -->

</project>

Run maven again to verify the fix.

$ mvn -Pdevelopment clean compile package war:inplace
[INFO] Scanning for projects...

    <!-- snip -->

[INFO] Using 'UTF-8' encoding to copy filtered resources.

    <!-- snip -->

[INFO] ----------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ----------------------------------------------------------------
[INFO] Total time: 7 seconds
[INFO] Finished at: Tue Apr 07 23:48:17 PDT 2009
[INFO] Final Memory: 25M/60M
[INFO] ----------------------------------------------------------------

I really do want to understand the vagaries of OS X (relative to Linux) but I’m eternally short on time. I suspect that is our lot, all of us.

Whose woods these are I think I know.
His house is in the village though;
He will not see me stopping here
To watch his woods fill up with snow.

My little horse must think it queer
To stop without a farmhouse near
Between the woods and frozen lake
The darkest evening of the year.

He gives his harness bells a shake
To ask if there is some mistake.
The only other sound's the sweep
Of easy wind and downy flake.

The woods are lovely, dark and deep.
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.

                       --Robert Frost

[update 2010-08-08] I had not yet tried moving the sourceEncoding property from pom.xml to settings.xml but Gabriele’s comment motivated me to look.

I removed the properties from pom.xml and added the following to my settings.xml file.

<settings>

  <profiles>
    <profile>
      <id>profile-x</id>
      <properties>

        <project.build.sourceEncoding>
          UTF-8
        </project.build.sourceEncoding>

        <project.reporting.outputEncoding>
          UTF-8
        </project.reporting.outputEncoding>

      </properties>
    </profile>
  </profiles>

  <activeProfiles>
    <activeProfile>profile-x</activeProfile>
  </activeProfiles>

</settings>

I’m sure there are many ways to configure settings.xml and welcome suggestions. My settings.xml was originally written for me in 2006 by Aaron at a time when I was just learning java. I’ve honestly avoided messing with it since it wasn’t broken and the documentation back then was horrendous. The docs are no longer horrendous but I still find maven complicated. Essential but complicated.

Ref: maven documentation here and here.

12 Comments:

  1. Hi man, thanks for your articule. It’s good.

    =)

    camus

    2009.06.06
    08:39

  2. Thank you for the helpful information.

    Aaron S

    2009.07.23
    13:09

  3. Perfect! This kind of stuff needs to make its way into a maven FAQ somewhere 😉

    Thanks!

    Steph Meslin-Weber

    2009.11.23
    03:28

  4. There are a few resources already describing that. I have collected a few of them here:
    http://www.martinahrer.at/blog/2007/06/01/maven2-site-encoding-problems/

    Martin Ahrer

    2009.12.07
    01:53

  5. @Martin Aher

    Great info (especially for anyone trying to get their IDE-to-Maven encoding set) at http://www.martinahrer.at/blog/2007/06/01/maven2-site-encoding-problems/ and references to even more material.

    kelly

    2009.12.07
    06:27

  6. Alternatively, you could use Ant to build projects. 🙂

    dan

    2010.02.01
    10:39

  7. You can put the config inside a in your ~/.m2/settings.xml that is active by default. This way the defaults are applied to all projects being built with Maven, and you don’t have to add it to all your pom.xml’s.

    Pavel

    2010.07.21
    01:57

  8. Pavel, I hadn’t considered putting the config in the settings.xml file. Thanks!

    kelly

    2010.07.21
    06:21

  9. Kelly and Pavel, will you show me how to set the encoding in settings.xml?

    Gabriele

    2010.08.08
    06:00

  10. Gabriele, I updated the post in response to your comment. It was something I’d been meaning to try.

    kelly

    2010.08.08
    08:13

  11. Great writeup, saved me some headache. Thanks!

    Tomer Gabel

    2012.05.13
    06:30

  12. you can set a over a global setting like this:
    export MAVEN_OPTS=-Dfile.encoding=UTF-8

    cheers

    Dude Root

    2012.11.24
    09:50

Your email will never published nor shared. Required fields are marked *...

*

*

Type your comment out: