Loading ...
Sorry, an error occurred while loading the content.

Output formatting problem (text encoding?)

Expand Messages
  • karl
    Hello, I m a total newbie to Perl/Apache::ASP, but I seem to have gotten things to work on a WinXP setup. I m currently having problem with the formatting of
    Message 1 of 4 , Jul 19, 2004
    • 0 Attachment
      Hello,

      I'm a total newbie to Perl/Apache::ASP, but I seem to have gotten
      things to work on a WinXP setup.

      I'm currently having problem with the formatting of output. I have
      text output coming from a database and ' (apostrophes) are shown in
      the browser (IE6) as ? (question marks). The weird thing is if I
      save the output as an HTML file and open it in the browser, then
      everything looks fine. The only thing I can figure out is that
      original output shows up as encoded Unicode (UTF-8) in the browser;
      after I save it and open it and things look fine, it shows as being
      encoded as Western European (ISO).

      Note, on an IIS/ASP setup, the equivalent output shows up correctly
      and with Western European (ISO) encoding. The only physical
      difference I can find between the output generated by Apache::ASP
      and IIS/ASP is that the Apache::ASP has Unix style LF line-endings
      and the IIS/ASP has DOS/Windows style CRLF line-endings. However,
      I'm pretty sure that this isn't the problem because when I save the
      output from Apache::ASP and reopen in the browser and things look
      fine, it still has Unix style LF line-endings.

      So, to make a long story short, I'm trying to figure out how to get
      Apache::ASP to output (correct encoding?) so that the text looks
      correct.

      Thanks for any help!

      -Karl


      ---------------------------------------------------------------------
      To unsubscribe, e-mail: asp-unsubscribe@...
      For additional commands, e-mail: asp-help@...
    • karl
      Nevermind...I realized that this was an Apache issue. I fixed the problem by changing the AddDefaultCharset to ISO-8859-1. Thanks anyway! ... in ... browser;
      Message 2 of 4 , Jul 20, 2004
      • 0 Attachment
        Nevermind...I realized that this was an Apache issue. I fixed the
        problem by changing the AddDefaultCharset to ISO-8859-1.

        Thanks anyway!

        --- In apache-asp@yahoogroups.com, "karl" <karl@e...> wrote:
        > Hello,
        >
        > I'm a total newbie to Perl/Apache::ASP, but I seem to have gotten
        > things to work on a WinXP setup.
        >
        > I'm currently having problem with the formatting of output. I have
        > text output coming from a database and ' (apostrophes) are shown
        in
        > the browser (IE6) as ? (question marks). The weird thing is if I
        > save the output as an HTML file and open it in the browser, then
        > everything looks fine. The only thing I can figure out is that
        > original output shows up as encoded Unicode (UTF-8) in the
        browser;
        > after I save it and open it and things look fine, it shows as
        being
        > encoded as Western European (ISO).
        >
        > Note, on an IIS/ASP setup, the equivalent output shows up
        correctly
        > and with Western European (ISO) encoding. The only physical
        > difference I can find between the output generated by Apache::ASP
        > and IIS/ASP is that the Apache::ASP has Unix style LF line-endings
        > and the IIS/ASP has DOS/Windows style CRLF line-endings. However,
        > I'm pretty sure that this isn't the problem because when I save
        the
        > output from Apache::ASP and reopen in the browser and things look
        > fine, it still has Unix style LF line-endings.
        >
        > So, to make a long story short, I'm trying to figure out how to
        get
        > Apache::ASP to output (correct encoding?) so that the text looks
        > correct.
        >
        > Thanks for any help!
        >
        > -Karl
        >
        >
        > -------------------------------------------------------------------
        --
        > To unsubscribe, e-mail: asp-unsubscribe@p...
        > For additional commands, e-mail: asp-help@p...


        ---------------------------------------------------------------------
        To unsubscribe, e-mail: asp-unsubscribe@...
        For additional commands, e-mail: asp-help@...
      • Warren Young
        ... There s apostrophes and there are apostrophes. There s ASCII code 39, there s Windows code page 1252 code 146, there s Unicode code .... The
        Message 3 of 4 , Jul 20, 2004
        • 0 Attachment
          karl wrote:
          > I have
          > text output coming from a database and ' (apostrophes) are shown in
          > the browser (IE6) as ? (question marks).

          There's apostrophes and there are apostrophes. There's ASCII code 39,
          there's Windows code page 1252 code 146, there's Unicode code
          <mumble>.... The question is, which of these codes are in your
          database? You must know the answer to that question before you can
          decide how to proceed.

          Character code handling in the database/Apache::ASP/Perl5/Apache/browser
          chain is stranger than you probably expect. Here's a post I wrote a few
          months back detailing two chains I've personally observed:

          http://www.mail-archive.com/asp@.../msg01952.html

          Notice that I saw two rather different translation chains on my two test
          systems! Your particular configuration is quite different from either
          of mine, so it could give yet a third path.

          > The only thing I can figure out is that
          > original output shows up as encoded Unicode (UTF-8) in the browser;

          Don't guess, find out.

          The way I did the analysis to make that post I linked to, I dumped the
          text in question to a file at several places along the I/O chain, then I
          examined each file. You should also use a network sniffer to see what
          the HTTP headers and HTML data are without the browser getting in the
          way. There's a good list of sniffers in the Winsock Programmer's FAQ,
          if you don't have one already:

          http://tangentsoft.net/wskfaq/

          I think you'll find, as I did, that your characters are being translated
          back and forth between ISO 8859-x and Unicode multiple times, and that
          the last step isn't being done correctly.

          That last step is critical because of the high probability that the
          intermediate transformations are all lossless in your situation. All
          you have to do is communicate to the browser what the final character
          encoding is. In my particular situation, I had to change an Apache
          setting to make it send a header informing the browser that the
          character encoding was UTF-8. The browser was then able to display the
          web page correctly, nevermind that the data was stored as ISO 8859-1
          (Latin-1) in the database, and translated back and forth several times
          along the path.

          > The only physical
          > difference I can find between the output generated by Apache::ASP
          > and IIS/ASP is that the Apache::ASP has Unix style LF line-endings
          > and the IIS/ASP has DOS/Windows style CRLF line-endings.

          I'll bet you didn't compare the HTTP headers. Different web servers,
          hence different headers, hence different browser interpretation.

          ---------------------------------------------------------------------
          To unsubscribe, e-mail: asp-unsubscribe@...
          For additional commands, e-mail: asp-help@...
        • karl
          Thanks for your help Warren. I wrote my last message before seeing yours. I can see now that it can be confusing to track all the text encoding changes, but
          Message 4 of 4 , Jul 20, 2004
          • 0 Attachment
            Thanks for your help Warren. I wrote my last message before seeing
            yours. I can see now that it can be confusing to track all the text
            encoding changes, but that it is only the last one that generally
            matters (assuming lossless conversion).

            Before I discovered that the AddDefaultCharset Apache directive
            would solve my problem, I found a stopgap solution of setting
            $Response->{Charset} in my script.

            Thanks again!

            --- In apache-asp@yahoogroups.com, Warren Young <warren@e...> wrote:
            > karl wrote:
            > > I have
            > > text output coming from a database and ' (apostrophes) are shown
            in
            > > the browser (IE6) as ? (question marks).
            >
            > There's apostrophes and there are apostrophes. There's ASCII code
            39,
            > there's Windows code page 1252 code 146, there's Unicode code
            > <mumble>.... The question is, which of these codes are in your
            > database? You must know the answer to that question before you
            can
            > decide how to proceed.
            >
            > Character code handling in the
            database/Apache::ASP/Perl5/Apache/browser
            > chain is stranger than you probably expect. Here's a post I wrote
            a few
            > months back detailing two chains I've personally observed:
            >
            > http://www.mail-archive.com/asp@p.../msg01952.html
            >
            > Notice that I saw two rather different translation chains on my
            two test
            > systems! Your particular configuration is quite different from
            either
            > of mine, so it could give yet a third path.
            >
            > > The only thing I can figure out is that
            > > original output shows up as encoded Unicode (UTF-8) in the
            browser;
            >
            > Don't guess, find out.
            >
            > The way I did the analysis to make that post I linked to, I dumped
            the
            > text in question to a file at several places along the I/O chain,
            then I
            > examined each file. You should also use a network sniffer to see
            what
            > the HTTP headers and HTML data are without the browser getting in
            the
            > way. There's a good list of sniffers in the Winsock Programmer's
            FAQ,
            > if you don't have one already:
            >
            > http://tangentsoft.net/wskfaq/
            >
            > I think you'll find, as I did, that your characters are being
            translated
            > back and forth between ISO 8859-x and Unicode multiple times, and
            that
            > the last step isn't being done correctly.
            >
            > That last step is critical because of the high probability that
            the
            > intermediate transformations are all lossless in your situation.
            All
            > you have to do is communicate to the browser what the final
            character
            > encoding is. In my particular situation, I had to change an
            Apache
            > setting to make it send a header informing the browser that the
            > character encoding was UTF-8. The browser was then able to
            display the
            > web page correctly, nevermind that the data was stored as ISO 8859-
            1
            > (Latin-1) in the database, and translated back and forth several
            times
            > along the path.
            >
            > > The only physical
            > > difference I can find between the output generated by
            Apache::ASP
            > > and IIS/ASP is that the Apache::ASP has Unix style LF line-
            endings
            > > and the IIS/ASP has DOS/Windows style CRLF line-endings.
            >
            > I'll bet you didn't compare the HTTP headers. Different web
            servers,
            > hence different headers, hence different browser interpretation.
            >
            > -------------------------------------------------------------------
            --
            > To unsubscribe, e-mail: asp-unsubscribe@p...
            > For additional commands, e-mail: asp-help@p...


            ---------------------------------------------------------------------
            To unsubscribe, e-mail: asp-unsubscribe@...
            For additional commands, e-mail: asp-help@...
          Your message has been successfully submitted and would be delivered to recipients shortly.