Loading ...
Sorry, an error occurred while loading the content.

Output character encoding

Expand Messages
  • Arnon Weinberg
    How can I set the output character encoding of Apache::ASP output? Regular perl/CGI output defaults to ISO-8859-1 encoding, and may be easily modified using
    Message 1 of 9 , Jun 5, 2012
    View Source
    • 0 Attachment
      How can I set the output character encoding of Apache::ASP output?

      Regular perl/CGI output defaults to ISO-8859-1 encoding, and may be
      easily modified using the use open prama or binmode() function. Here is
      my test script:

      # Latin-1.cgi: ##############

      #use open ( ":utf8", ":std" );
      #binmode ( STDOUT, ":encoding(ISO-8859-1)" );

      use CGI;
      print CGI::header();

      use Encode;

      print Encode::decode('ISO-8859-1',"\xE2"),
      Encode::decode('UTF-8',Encode::encode('UTF-8',"\xE2")),
      "\x{00E2}",
      chr(0x00E2);

      #############################

      >perl Latin-1.cgi
      Content-Type: text/html; charset=ISO-8859-1

      ââââ
      >perl Latin-1.cgi | tail -1 | hexdump
      0000000 e2e2 e2e2
      0000004

      This script correctly produces ISO-8859-1 encoded output.

      However, Apache::ASP appears to default to some strange mix of
      ISO-8859-1 and UTF-8 that I can't make sense of.
      Additionally, the use open pragma and binmode() function appear to have
      no effect:

      # Latin-1.rasp: #############

      <%
      #use open ( ":utf8", ":std" );
      #binmode ( STDOUT, ":encoding(ISO-8859-1)" );

      $::Response->{Charset} = "ISO-8859-1";

      use Encode;

      print Encode::decode('ISO-8859-1',"\xE2"),
      Encode::decode('UTF-8',Encode::encode('UTF-8',"\xE2")),
      "\x{00E2}",
      chr(0x00E2);
      %>

      #############################

      >asp-perl Latin-1.rasp
      Content-Type: text/html; charset=ISO-8859-1
      Content-Length: 6
      Cache-Control: private

      ââââ
      >asp-perl Latin-1.rasp | tail -1 | hexdump
      0000000 a2c3 a2c3 e2e2
      0000006

      For some reason, the first 2 test characters are UTF-8 encoded, and the
      last 2 are ISO-8859-1 encoded.
      How can I get the same results as the CGI script above?


      --
      -------------------------------------------------------------------------------
      Arnon Weinberg
      www.back2front.ca


      ---------------------------------------------------------------------
      To unsubscribe, e-mail: asp-unsubscribe@...
      For additional commands, e-mail: asp-help@...
    • Warren Young
      ... There are several places where you set this, not just one, and they all have to agree to guarantee correct output: DB - back end - Apache - HTML -
      Message 2 of 9 , Jun 5, 2012
      View Source
      • 0 Attachment
        On 6/5/2012 3:02 AM, Arnon Weinberg wrote:
        >
        > How can I set the output character encoding of Apache::ASP output?

        There are several places where you set this, not just one, and they all
        have to agree to guarantee correct output:

        DB -> back end -> Apache -> HTML -> Apache::ASP -> browser

        If they do not all agree, you can either get mixed encodings or encoding
        ping-ponging.

        Ping-ponging is less common these days now that the world is settling on
        UTF-8. Back in the Perl 5.6/Apache 1.3/pre-Firefox days, I remember
        once chasing data through a system that stored data in the DB in
        Latin-1, which got translated to UTF-8 in the back-end daemon, which
        then sent it on to Apache and mod_perl, one of which smashed the data
        back to Latin-1 (never did nail that one down), before sending the data
        out to the browser which saw UTF-8 because Apache was configured to use
        that by default!

        So, you have to check all the links in that chain:

        - Your DB and any back-end daemon are up to you, since they're out of
        scope on this list.

        - Apache has things like the "AddDefaultCharset" directive which play
        into this.

        - For the Perl aspects, I recommend just reading the Perl manual chapter
        on it: perldoc perlunicode. Perl's Unicode support is deep, broad, and
        continually evolving[*]. You really must read your particular version's
        docs to know exactly how it's going to behave. There have been several
        breaking changes over the past decade or so.

        - There are at least three ways to set the character encoding in your
        HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML

        - And finally, it's possible to set a browser to ignore whatever it's
        told by the HTTP server and the document, and force it to interpret the
        data using some other character set.


        [*] Literally continuously. I happened to read through the Perl release
        notes from 5.8 onward last week, and I saw Unicode related changes in
        *every* major release, including the just-released 5.16!

        > Regular perl/CGI output defaults to ISO-8859-1 encoding,

        Really? I'd expect it to take the overall Perl default, which is UTF-8
        on most Unix type systems with Perl 5.6 onward on OSes contemporary with
        that version of Perl. I would have expected that you'd have to go out
        of your way to force a return to Latin-1.

        Now, if you're on a system where the native character set is still
        Latin-1, I'd understand that, but then you'd be running a 10 year old
        box, wouldn't you? :)

        > How can I get the same results as the CGI script above?

        It's 2012. Please, please, please abandon Latin-1. Everything speaks
        UTF-8 these days, at the borders at least, even systems like Windows and
        JavaScript where it isn't the native character set. It is safe to
        consider UTF-8 the standard Unicode encoding online.

        ---------------------------------------------------------------------
        To unsubscribe, e-mail: asp-unsubscribe@...
        For additional commands, e-mail: asp-help@...
      • Arnon Weinberg
        ... With my test cases (provided) I have carefully narrowed down the inconsistency to Apache::ASP, since everything else is either not applicable or the same.
        Message 3 of 9 , Jun 5, 2012
        View Source
        • 0 Attachment
          On 2012-06-05 05:55, Warren Young wrote:
          > There are several places where you set this, not just one, and they
          > all have to agree to guarantee correct output:
          >
          > DB -> back end -> Apache -> HTML -> Apache::ASP -> browser
          >
          > If they do not all agree, you can either get mixed encodings or
          > encoding ping-ponging.
          >
          > So, you have to check all the links in that chain:

          With my test cases (provided) I have carefully narrowed down the
          inconsistency to Apache::ASP, since everything else is either not
          applicable or the same.

          > - Apache has things like the "AddDefaultCharset" directive which play
          > into this.

          No, it doesn't, since I'm not testing the browser. For the record
          though, when I use GET -e, I see the correct header in both tests:
          Content-Type: text/html; charset=ISO-8859-1

          > - For the Perl aspects, I recommend just reading the Perl manual
          > chapter on it: perldoc perlunicode. Perl's Unicode support is deep,
          > broad, and continually evolving[*]. You really must read your
          > particular version's docs to know exactly how it's going to behave.
          > There have been several breaking changes over the past decade or so.

          Perl is behaving as documented. Apache::ASP is giving me trouble.

          > - There are at least three ways to set the character encoding in your
          > HTML. RTFEE: https://en.wikipedia.org/wiki/Character_encodings_in_HTML
          >
          > - And finally, it's possible to set a browser to ignore whatever it's
          > told by the HTTP server and the document, and force it to interpret
          > the data using some other character set.

          That's all true, but none of it matters since with a mixed encoding
          output, there is no character set encoding that I can use on the browser
          to show a correct decoding.

          >
          >> Regular perl/CGI output defaults to ISO-8859-1 encoding,
          >
          > Really? I'd expect it to take the overall Perl default, which is
          > UTF-8 on most Unix type systems with Perl 5.6 onward on OSes
          > contemporary with that version of Perl. I would have expected that
          > you'd have to go out of your way to force a return to Latin-1.

          Yes, this is right out of the manual (open):
          "... the default layer for the operating system (:raw on Unix, :crlf on
          Windows) is used."
          The :utf8 output layer encoding must be explicitly set, as it is not the
          default. However, I have not figured out how to do this successfully
          within Apache::ASP.

          > It's 2012. Please, please, please abandon Latin-1. Everything speaks
          > UTF-8 these days, at the borders at least, even systems like Windows
          > and JavaScript where it isn't the native character set. It is safe to
          > consider UTF-8 the standard Unicode encoding online.

          This is part of an exercise to do just that. At the moment, we have
          many lines of legacy code still using Latin-1, and are converting them
          step-wise to use UTF-8. As the test cases show however, they do not
          play well together on Apache::ASP (though they are fine everywhere
          else). If anyone has any suggestions on how this can be resolved so
          that we can continue the conversion, that would be much appreciated.


          --
          -------------------------------------------------------------------------------
          Arnon Weinberg
          www.back2front.ca


          ---------------------------------------------------------------------
          To unsubscribe, e-mail: asp-unsubscribe@...
          For additional commands, e-mail: asp-help@...
        • Thanos Chatziathanassiou
          ... Could you be a bit more specific on this ? I ve built many a site in international character sets and using Apache::ASP for well over decade, so I can tell
          Message 4 of 9 , Jun 5, 2012
          View Source
          • 0 Attachment
            > With my test cases (provided) I have carefully narrowed down the
            > inconsistency to Apache::ASP, since everything else is either not
            > applicable or the same.
            >

            Could you be a bit more specific on this ?

            I've built many a site in international character sets and using
            Apache::ASP for well over decade, so I can tell you that it it works
            just fine with UTF-8 (and ISO-8859-[157] if that matters).
            Last problem was back in 2004 when Content-Length was incorrectly
            calculated.

            > No, it doesn't, since I'm not testing the browser. For the record
            > though, when I use GET -e, I see the correct header in both tests:
            > Content-Type: text/html; charset=ISO-8859-1

            That's as simple as
            ``$Response->{ContentType} = "text/html; charset=UTF-8";''
            It doesn't tell us anything about the actual encoding of the content.
            Bear in mind that your selected encoding might be insufficient to
            display the text you're feeding it.

            > Yes, this is right out of the manual (open):
            > "... the default layer for the operating system (:raw on Unix, :crlf on
            > Windows) is used."
            > The :utf8 output layer encoding must be explicitly set, as it is not the
            > default. However, I have not figured out how to do this successfully
            > within Apache::ASP.

            How does file handling come into play here ? Not that it's relevant but
            it works quite the same way as outside of Apache::ASP.

            >
            > This is part of an exercise to do just that. At the moment, we have
            > many lines of legacy code still using Latin-1, and are converting them
            > step-wise to use UTF-8. As the test cases show however, they do not
            > play well together on Apache::ASP (though they are fine everywhere
            > else). If anyone has any suggestions on how this can be resolved so
            > that we can continue the conversion, that would be much appreciated.
            >
            >

            Have a look at Text::Iconv, iconv(1), iconv(3) and friends. Also, Encode.

            Best Regards,
            Thanos Chatziathanassiou

            ---------------------------------------------------------------------
            To unsubscribe, e-mail: asp-unsubscribe@...
            For additional commands, e-mail: asp-help@...
          • Arnon Weinberg
            ... Er, not sure how I can be more specific - the test cases are provided in my initial post (http://www.mail-archive.com/asp%40perl.apache.org/msg02662.html),
            Message 5 of 9 , Jun 5, 2012
            View Source
            • 0 Attachment
              On 2012-06-05 14:13, Thanos Chatziathanassiou wrote:
              >> With my test cases (provided) I have carefully narrowed down the
              >> inconsistency to Apache::ASP, since everything else is either not
              >> applicable or the same.
              >>
              > Could you be a bit more specific on this ?
              >

              Er, not sure how I can be more specific - the test cases are provided in
              my initial post
              (http://www.mail-archive.com/asp%40perl.apache.org/msg02662.html), they
              don't use a database, web server, or browser, so those can easily be
              eliminated as possible culprits. Ideally, the test cases should speak
              for themselves.

              > How does file handling come into play here ? Not that it's relevant but
              > it works quite the same way as outside of Apache::ASP.
              >

              I'm afraid it doesn't, as the test cases clearly demonstrate. Note:
              It's not "file handling", it's PerlIO, which refers to all I/O,
              including STDOUT.

              > Have a look at Text::Iconv, iconv(1), iconv(3) and friends. Also, Encode.

              iconv converts text files, not Perl code - that still requires wetware
              as far as I know. Encode is being used in the test cases, and clearly
              messes things up in Apache::ASP.


              --

              -------------------------------------------------------------------------------
              Arnon Weinberg
              www.back2front.ca



              ---------------------------------------------------------------------
              To unsubscribe, e-mail: asp-unsubscribe@...
              For additional commands, e-mail: asp-help@...
            • Josh Chamas
              ... Hi Arnon, All, I have gone over the thread and been stumped on this for a while. Bottom line it looks like Apache::ASP does not play well with Encode, and
              Message 6 of 9 , Jun 5, 2012
              View Source
              • 0 Attachment
                On 6/5/12 2:02 AM, Arnon Weinberg wrote:
                >
                > How can I set the output character encoding of Apache::ASP output?
                > ...

                Hi Arnon, All,

                I have gone over the thread and been stumped on this for a while. Bottom line
                it looks like Apache::ASP does not play well with Encode, and this seems to me
                to be around the PerlIO interactions and something not quite connecting right on
                a tied file handle. But I do know know the answer to solve this. :(

                To explain where there is some magic at play:

                Apache::ASP::Response does a "use bytes" which is to deal with the output stream
                correctly I believe this is around content length calculations. I think this is
                fine here, and turning this off makes things worse for these examples.

                Apache::ASP::Response is more importantly tied as a file handle when this code
                is run:

                tie *RESPONSE, 'Apache::ASP::Response', $self->{Response};
                select(RESPONSE);

                This is to allow for print to go to $Response->PRINT which aliases to
                $Response->Write. Fundamentally all output is going through $Response->Write at
                the end of the day including the script static content itself.

                What I have found is that this will output the correct bytes in this Apache::ASP
                script:

                <% print STDOUT Encode::decode('ISO-8859-1',"\xE2"); %>

                as it bypasses the tied file handle layer to $Response, so we know perl is
                working at this point!

                but doing this is where we have a problem:

                <% print Encode::decode('ISO-8859-1',"\xE2"); %>

                and immediately in the Apache::ASP::Response::Write() method the data has
                already been converted incorrectly without any processing occurring. Its as if
                by merely going through the tied interface that data goes through some
                conversion process. I have played with various IO settings as in "open ..." and
                various "use" pragmas to no avail but really shooting blind here on what could
                not be working.

                So the way I see it..

                Encoding Magic
                File handle tie Magic <--- data conversion
                Data to $Response->Write

                Encode and perltie seem to have some conflicting bits here.

                If there were some workaround here I would be glad to hear it but I seem to have
                exhausted my ability to troubleshoot this.

                Regards,

                Josh



                > # Latin-1.rasp: #############
                >
                > <%
                > #use open ( ":utf8", ":std" );
                > #binmode ( STDOUT, ":encoding(ISO-8859-1)" );
                >
                > $::Response->{Charset} = "ISO-8859-1";
                >
                > use Encode;
                >
                > print Encode::decode('ISO-8859-1',"\xE2"),
                > Encode::decode('UTF-8',Encode::encode('UTF-8',"\xE2")),
                > "\x{00E2}",
                > chr(0x00E2);
                > %>
                >
                > #############################
                >
                >>asp-perl Latin-1.rasp
                > Content-Type: text/html; charset=ISO-8859-1
                > Content-Length: 6
                > Cache-Control: private
                >
                > ââââ
                >>asp-perl Latin-1.rasp | tail -1 | hexdump
                > 0000000 a2c3 a2c3 e2e2
                > 0000006
                >
                > For some reason, the first 2 test characters are UTF-8 encoded, and the last 2
                > are ISO-8859-1 encoded.
                > How can I get the same results as the CGI script above?
                >
                >

                ---------------------------------------------------------------------
                To unsubscribe, e-mail: asp-unsubscribe@...
                For additional commands, e-mail: asp-help@...
              • Thanos Chatziathanassiou
                Apologies Arnon, I got your original message with the problem description after I had sent mine... ... That rang a bell for me: Read the section ``The UTF8
                Message 7 of 9 , Jun 6, 2012
                View Source
                • 0 Attachment
                  Apologies Arnon, I got your original message with the problem
                  description after I had sent mine...

                  >
                  > To explain where there is some magic at play:
                  >
                  > Apache::ASP::Response does a "use bytes" which is to deal with the
                  > output stream correctly I believe this is around content length
                  > calculations. I think this is fine here, and turning this off makes
                  > things worse for these examples.
                  >
                  > Apache::ASP::Response is more importantly tied as a file handle when
                  > this code is run:
                  >
                  > tie *RESPONSE, 'Apache::ASP::Response', $self->{Response};
                  > select(RESPONSE);
                  >
                  > This is to allow for print to go to $Response->PRINT which aliases to
                  > $Response->Write. Fundamentally all output is going through
                  > $Response->Write at the end of the day including the script static
                  > content itself.
                  >
                  > What I have found is that this will output the correct bytes in this
                  > Apache::ASP script:
                  >
                  > <% print STDOUT Encode::decode('ISO-8859-1',"\xE2"); %>
                  >
                  > as it bypasses the tied file handle layer to $Response, so we know perl
                  > is working at this point!
                  >
                  > but doing this is where we have a problem:
                  >
                  > <% print Encode::decode('ISO-8859-1',"\xE2"); %>
                  >
                  > and immediately in the Apache::ASP::Response::Write() method the data
                  > has already been converted incorrectly without any processing
                  > occurring. Its as if by merely going through the tied interface that
                  > data goes through some conversion process. I have played with various
                  > IO settings as in "open ..." and various "use" pragmas to no avail but
                  > really shooting blind here on what could not be working.
                  >
                  > So the way I see it..
                  >

                  That rang a bell for me:
                  Read the section ``The UTF8 flag'' in Encode to see the problem.
                  ${$Response->{out}} contains a copy of the stuff you're sending to
                  $Response->Write(), AKA $Response->WriteRef() but without copying the
                  utf-8 flag.
                  You can make the example work by simply turning the utf8 flag
                  unconditionally on via ``Encode::_utf8_on(${$Response->{out}});''
                  after the print statements in Latin-1.rasp.
                  Of course, your data should either ALL have the utf8 flag on (eg via
                  Encode::decode) or ALL have it off, because ${$Response->{out}} can
                  either have it on or off but obviously not both.

                  > Encode and perltie seem to have some conflicting bits here.
                  >
                  > If there were some workaround here I would be glad to hear it but I seem
                  > to have exhausted my ability to troubleshoot this.

                  I'm not sure there is a generic solution, except perhaps mess around
                  with ``is_utf8($$dataref)'' before appending it to $Response->{out} and
                  make sure that the same kind of data is appended (either ON or OFF) to
                  $Response->{out}.
                  See below for why this is a problem

                  >
                  >> # Latin-1.rasp: #############
                  >>
                  >> <%
                  >> #use open ( ":utf8", ":std" );
                  >> #binmode ( STDOUT, ":encoding(ISO-8859-1)" );
                  >>
                  >> $::Response->{Charset} = "ISO-8859-1";
                  >>
                  >> use Encode;
                  >>
                  >> print Encode::decode('ISO-8859-1',"\xE2"),
                  >> Encode::decode('UTF-8',Encode::encode('UTF-8',"\xE2")),

                  #these will now work if
                  #Encode::_utf8_on(${$Response->{out}});
                  #is set because they have the flag themselves

                  >> "\x{00E2}",
                  >> chr(0x00E2);

                  #these, on the other hand will not
                  #
                  #the opposite holds true for
                  #Encode::_utf8_off(${$Response->{out}});
                  #of course

                  >> %>

                  I'm sure we can design a ``proper'' solution but not without some
                  user-configurable settings and a bit of ugly code.

                  Best Regards,
                  Thanos Chatziathanassiou



                  ---------------------------------------------------------------------
                  To unsubscribe, e-mail: asp-unsubscribe@...
                  For additional commands, e-mail: asp-help@...
                • Arnon Weinberg
                  Thanks very much Josh for investigating this - it saved me some time narrowing down the issue. Even still, I did spend quite a lot of time working out a
                  Message 8 of 9 , Jun 14, 2012
                  View Source
                  • 0 Attachment
                    Thanks very much Josh for investigating this - it saved me some time
                    narrowing down the issue. Even still, I did spend quite a lot of time
                    working out a solution for my needs, and still I don't think it is
                    generalizable as-is. However, in case someone else wants to give it a
                    crack, I provide details below.

                    On 2012-06-05 19:30, Josh Chamas wrote:
                    > doing this is where we have a problem:
                    >
                    > <% print Encode::decode('ISO-8859-1',"\xE2"); %>
                    >
                    > and immediately in the Apache::ASP::Response::Write() method the data
                    > has already been converted incorrectly

                    The fact that such a simple use of Encode causes an issue is a little
                    surprising. Surely others are using Apache::ASP in multi-language
                    environments - is no one using Encode this way? How are others coping
                    with this limitation right now?

                    > Its as if by merely going through the tied interface that data goes
                    > through some conversion process.

                    Not quite, as the same results happen without a tie'd interface. The
                    "use bytes" pragma is what causes the conversion (see test script below).

                    > Apache::ASP::Response does a "use bytes" which is to deal with the
                    > output stream correctly I believe this is around content length
                    > calculations.
                    > I think this is fine here, and turning this off makes things worse for
                    > these examples.

                    It looks like "use bytes" is now deprecated and should indeed be
                    removed. The documentation doesn't mention any trivial substitute.
                    However, this pragma mostly just overrides some built-in functions with
                    byte-oriented versions. So I made the following changes to Response.pm:
                    - changed use bytes => no bytes (just import the namespace)
                    - changed all occurrences of length() => bytes::length()
                    This resolved the mixed-encoding issue originally posted, but introduced
                    a new (more manageable) issue.

                    For debugging purposes, I peeked at the "UTF-8 flag" (Perl's internal
                    flag that indicates that a string has a known decoding). This flag
                    should be transparent in principle, but it helped make sense of the
                    behaviour of Apache::ASP.
                    Results of testing are summarized as follows:

                    1. Testing Perl/CGI, asp-perl, and Apache::ASP, all 3 give the same
                    results with the "use bytes" pragma turned on:
                    - For any string with the UTF-8 flag off, output is correctly encoded.
                    - Any string with the flag on is (double-)encoded as UTF-8, regardless
                    of the actual output encoding.
                    2. Testing Perl/CGI and asp-perl with "no bytes" produces correct results:
                    - The UTF-8 flag does not affect output - it is correctly encoded in
                    every case.
                    - However, an interesting test case is that of the double-encoding
                    problem (see http://ahinea.com/en/tech/perl-unicode-struggle.html). This
                    case is indicative of bad code, so is not a concern here, but it
                    illustrates how a tie'd filehandle differs from plain STDOUT. In this
                    case, a single "wide character" double-encodes the entire output (with
                    buffering on, this can be the entire page), instead of just the string.
                    - These test cases are demonstrated by the script below.
                    3. Testing Apache::ASP with "no bytes" produces different results from
                    the command-line (asp-perl) version, as well as different results from
                    Perl/CGI running on Apache. This suggests an interaction effect between
                    Apache and Apache::ASP (both are required to produce these results).
                    - With the UTF-8 flag off, output is correctly encoded as before.
                    - However, with "no bytes", Apache::ASP, and the UTF-8 flag on, the
                    entire output is double-encoded. This result is similar to the
                    double-encoding problem in the previous test case, except that it
                    doesn't require a "wide character" - any string with the UTF-8 flag on
                    will do.

                    This test script demonstrates all but the last test case:

                    #!/usr/bin/perl

                    use Encode;

                    foreach ( "STDOUT", "tie_use_bytes", "tie_no_bytes" )
                    {
                    print "$_: ";
                    tie *FH, $_ if ! /^S/;
                    my $STDOUT = select ( FH ) if ! /^S/;
                    print "\x{263a}",
                    Encode::decode('ISO-8859-1',"\xE2"),
                    "\xE2";
                    print "\n";
                    close ( FH ) if ! /^S/;
                    select ( $STDOUT ) if ! /^S/;
                    }

                    use strict;

                    package tie_use_bytes;
                    use bytes;

                    sub TIEHANDLE { bless {}, shift; }
                    sub PRINT { shift()->{out} .= join ( $,, @_ ); }
                    sub CLOSE { print STDOUT delete ( shift()->{out} ); }

                    package tie_no_bytes;
                    no bytes;

                    sub TIEHANDLE { bless {}, shift; }
                    sub PRINT { shift()->{out} .= join ( $,, @_ ); }
                    sub CLOSE { print STDOUT delete ( shift()->{out} ); }

                    # Output: ##################

                    Wide character in print at ...
                    STDOUT: ☺ââ # STDOUT output is correct in all cases
                    tie_use_bytes: ☺ââ # with "use bytes", the UTF-8-flagged 2nd character
                    is double-encoded
                    Wide character in print at ...
                    tie_no_bytes: ☺ââ # with "no bytes", the output is correct, but a
                    "wide character" double-encodes the entire string because of the way the
                    tie'd file handle is implemented

                    #########################

                    By the way, if it's getting difficult to wrap your head around this,
                    you're not alone.

                    At this point, I peeked at the $Response->{out} data buffer, and could
                    see that it was encoded correctly. However, the output from Apache (when
                    the UTF-8 flag is on) was not correct, suggesting that Apache is doing
                    something to encode the string in this case.
                    I decided therefore to address the problem by turning off the UTF-8
                    flag. The most fault-tolerant method I managed to come up with to do
                    this was the following:

                    ${$Response->{BinaryRef}}
                    = Encode::encode ( 'ISO-8859-1', ${$Response->{BinaryRef}},
                    sub{ Encode::encode ( 'UTF-8', chr ( shift() ) ) } )
                    if ! grep ( /^utf8$/, PerlIO::get_layers ( STDOUT ) );

                    which can go at the top of the $Response->Flush() method, or in
                    global.asa/Script_OnFlush().

                    With this solution I can now modify Apache::ASP's output encoding (eg,
                    using binmode ( STDOUT );), as originally desired, and the output
                    appears correct in all my test cases.


                    --
                    -------------------------------------------------------------------------------
                    Arnon Weinberg
                    www.back2front.ca


                    ---------------------------------------------------------------------
                    To unsubscribe, e-mail: asp-unsubscribe@...
                    For additional commands, e-mail: asp-help@...
                  • Warren Young
                    ... This answer by Tom Christiansen (yes, the guy who wrote that one book) may shed some light: http://goo.gl/miOFU Here I thought all the Unicode tweaks after
                    Message 9 of 9 , Jul 2, 2012
                    View Source
                    • 0 Attachment
                      On 6/5/2012 5:30 PM, Josh Chamas wrote:
                      > On 6/5/12 2:02 AM, Arnon Weinberg wrote:
                      >>
                      >> How can I set the output character encoding of Apache::ASP output?
                      >
                      > I have gone over the thread and been stumped on this for a while.

                      This answer by Tom Christiansen (yes, the guy who wrote that one book)
                      may shed some light: http://goo.gl/miOFU

                      Here I thought all the Unicode tweaks after 5.8 were minor things, that
                      it was all but finished a decade ago.

                      Then later, reading chromatic's Modern Perl, he only grudgingly allows
                      that 5.12 might be tolerable for some of his Unicode example code, and
                      recommends 5.14 instead.

                      ---------------------------------------------------------------------
                      To unsubscribe, e-mail: asp-unsubscribe@...
                      For additional commands, e-mail: asp-help@...
                    Your message has been successfully submitted and would be delivered to recipients shortly.