Loading ...
Sorry, an error occurred while loading the content.

2313Re: Output character encoding

Expand Messages
  • Arnon Weinberg
    Jun 5, 2012
    • 0 Attachment
      On 2012-06-05 05:55, Warren Young wrote:
      > There are several places where you set this, not just one, and they
      > all have to agree to guarantee correct output:
      > DB -> back end -> Apache -> HTML -> Apache::ASP -> browser
      > If they do not all agree, you can either get mixed encodings or
      > encoding ping-ponging.
      > So, you have to check all the links in that chain:

      With my test cases (provided) I have carefully narrowed down the
      inconsistency to Apache::ASP, since everything else is either not
      applicable or the same.

      > - Apache has things like the "AddDefaultCharset" directive which play
      > into this.

      No, it doesn't, since I'm not testing the browser. For the record
      though, when I use GET -e, I see the correct header in both tests:
      Content-Type: text/html; charset=ISO-8859-1

      > - For the Perl aspects, I recommend just reading the Perl manual
      > chapter on it: perldoc perlunicode. Perl's Unicode support is deep,
      > broad, and continually evolving[*]. You really must read your
      > particular version's docs to know exactly how it's going to behave.
      > There have been several breaking changes over the past decade or so.

      Perl is behaving as documented. Apache::ASP is giving me trouble.

      > - There are at least three ways to set the character encoding in your
      > HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML
      > - And finally, it's possible to set a browser to ignore whatever it's
      > told by the HTTP server and the document, and force it to interpret
      > the data using some other character set.

      That's all true, but none of it matters since with a mixed encoding
      output, there is no character set encoding that I can use on the browser
      to show a correct decoding.

      >> Regular perl/CGI output defaults to ISO-8859-1 encoding,
      > Really? I'd expect it to take the overall Perl default, which is
      > UTF-8 on most Unix type systems with Perl 5.6 onward on OSes
      > contemporary with that version of Perl. I would have expected that
      > you'd have to go out of your way to force a return to Latin-1.

      Yes, this is right out of the manual (open):
      "... the default layer for the operating system (:raw on Unix, :crlf on
      Windows) is used."
      The :utf8 output layer encoding must be explicitly set, as it is not the
      default. However, I have not figured out how to do this successfully
      within Apache::ASP.

      > It's 2012. Please, please, please abandon Latin-1. Everything speaks
      > UTF-8 these days, at the borders at least, even systems like Windows
      > and JavaScript where it isn't the native character set. It is safe to
      > consider UTF-8 the standard Unicode encoding online.

      This is part of an exercise to do just that. At the moment, we have
      many lines of legacy code still using Latin-1, and are converting them
      step-wise to use UTF-8. As the test cases show however, they do not
      play well together on Apache::ASP (though they are fine everywhere
      else). If anyone has any suggestions on how this can be resolved so
      that we can continue the conversion, that would be much appreciated.

      Arnon Weinberg

      To unsubscribe, e-mail: asp-unsubscribe@...
      For additional commands, e-mail: asp-help@...
    • Show all 9 messages in this topic