Loading ...
Sorry, an error occurred while loading the content.


Expand Messages
  • Warren Young
    Feb 23, 2006
      I finally got around to converting our Apache::ASP application so that
      it uses UTF-8 throughout, instead of Latin-1. I learned a few things
      that aren't discussed in the archives, so I'm setting them down here for
      others to find.

      1. It's best if you use newer Perls. 5.8.0 is adequate, but has known
      bugs in its Unicode handling. When run under 5.8.0, our program
      exhibits a double UTF-8 conversion in one circumstance, while the other
      screens show the data correctly. When the same program is run under
      5.8.5, all screens show the correct data. While it's theoretically
      possible to get Perl 5.6.x to cope with UTF-8 data, I don't recommend
      messing with it. A few years ago when I first tried using UTF-8, I was
      using 5.6 and had many problems with Perl smashing my data back to
      Latin-1 incorrectly.

      2. Also use the newest mod_perl you can. There are known Unicode bugs
      in mod_perl 1.99_09 and older.

      3. You must say "use utf8;" at the top of each ASP file. If you use
      $Response->Include(), each included file also has to say "use utf8;".
      The same goes for any Perl modules you use, if you will be passing UTF-8
      strings through them.

      4. mod_perl doesn't set the LANG environment variable unless you ask it
      to. Perls 5.8 and newer use the LANG environment variable (among other
      things) to decide whether to use UTF-8 by default or not. I didn't find
      it to be necessary to ask mod_perl to set this variable in my program,
      but it can't hurt to do it. If nothing else, it's one less thing you
      have to blame if your pages aren't showing the right data. In your
      httpd.conf, right after "PerlModule Apache::ASP", say "PerlPassEnv
      LANG". This will pass your system's default value for LANG through to
      the mod_perl instances, and thus to Apache::ASP.

      5. Ensure that your data source is passing UTF-8 data correctly. In our
      program, the data comes in via an XML path, so we needed to inform the
      XML parser that the data is UTF-8. Otherwise, the XML parser assumes
      it's Latin-1, and you get a double UTF-8 conversion.

      6. Finally, you need to communicate that the data is UTF-8 to the
      browser. This is done with the Content-Type HTTP header, which you can
      set in a number of ways. I like to do it in a <meta> tag at the top of
      each file that will contain UTF-8 data:

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      Alternately, if all documents on your server should be treated as UTF-8,
      there's an Apache configuration directive to force all output to be
      declared as UTF-8.

      To unsubscribe, e-mail: asp-unsubscribe@...
      For additional commands, e-mail: asp-help@...
    • Show all 2 messages in this topic