Loading ...
Sorry, an error occurred while loading the content.

2313Re: Output character encoding

Expand Messages
  • Arnon Weinberg
    Jun 5, 2012
    • 0 Attachment
      On 2012-06-05 05:55, Warren Young wrote:
      > There are several places where you set this, not just one, and they
      > all have to agree to guarantee correct output:
      >
      > DB -> back end -> Apache -> HTML -> Apache::ASP -> browser
      >
      > If they do not all agree, you can either get mixed encodings or
      > encoding ping-ponging.
      >
      > So, you have to check all the links in that chain:

      With my test cases (provided) I have carefully narrowed down the
      inconsistency to Apache::ASP, since everything else is either not
      applicable or the same.

      > - Apache has things like the "AddDefaultCharset" directive which play
      > into this.

      No, it doesn't, since I'm not testing the browser. For the record
      though, when I use GET -e, I see the correct header in both tests:
      Content-Type: text/html; charset=ISO-8859-1

      > - For the Perl aspects, I recommend just reading the Perl manual
      > chapter on it: perldoc perlunicode. Perl's Unicode support is deep,
      > broad, and continually evolving[*]. You really must read your
      > particular version's docs to know exactly how it's going to behave.
      > There have been several breaking changes over the past decade or so.

      Perl is behaving as documented. Apache::ASP is giving me trouble.

      > - There are at least three ways to set the character encoding in your
      > HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML
      >
      > - And finally, it's possible to set a browser to ignore whatever it's
      > told by the HTTP server and the document, and force it to interpret
      > the data using some other character set.

      That's all true, but none of it matters since with a mixed encoding
      output, there is no character set encoding that I can use on the browser
      to show a correct decoding.

      >
      >> Regular perl/CGI output defaults to ISO-8859-1 encoding,
      >
      > Really? I'd expect it to take the overall Perl default, which is
      > UTF-8 on most Unix type systems with Perl 5.6 onward on OSes
      > contemporary with that version of Perl. I would have expected that
      > you'd have to go out of your way to force a return to Latin-1.

      Yes, this is right out of the manual (open):
      "... the default layer for the operating system (:raw on Unix, :crlf on
      Windows) is used."
      The :utf8 output layer encoding must be explicitly set, as it is not the
      default. However, I have not figured out how to do this successfully
      within Apache::ASP.

      > It's 2012. Please, please, please abandon Latin-1. Everything speaks
      > UTF-8 these days, at the borders at least, even systems like Windows
      > and JavaScript where it isn't the native character set. It is safe to
      > consider UTF-8 the standard Unicode encoding online.

      This is part of an exercise to do just that. At the moment, we have
      many lines of legacy code still using Latin-1, and are converting them
      step-wise to use UTF-8. As the test cases show however, they do not
      play well together on Apache::ASP (though they are fine everywhere
      else). If anyone has any suggestions on how this can be resolved so
      that we can continue the conversion, that would be much appreciated.


      --
      -------------------------------------------------------------------------------
      Arnon Weinberg
      www.back2front.ca


      ---------------------------------------------------------------------
      To unsubscribe, e-mail: asp-unsubscribe@...
      For additional commands, e-mail: asp-help@...
    • Show all 9 messages in this topic