Loading ...
Sorry, an error occurred while loading the content.

problem parsing "special" characters

Expand Messages
  • Orlando Andico
    i m encountering problems where SOAP::Lite in general, and XML::Parser in particular, have problems parsing return XML due to special characters (e.g.
    Message 1 of 5 , Jul 28 2:11 AM
    • 0 Attachment
      i'm encountering problems where SOAP::Lite in general, and XML::Parser
      in particular, have problems parsing return XML due to "special"
      characters (e.g. cedilla, enye, accented characters) in the returned
      string.

      e.g.

      i am using the babelfish translation service, and get a return string
      like this (running from the command-line):

      vous êtes un moron répugnant

      but from a web page (via mod_perl and Apache::ASP) i get:

      vous êtes un moron répugnant

      XML parsing failed: unclosed token (Line: 11, Character: 0)

      i've tried "use utf8" but to no avail.

      i also get mismatched XML tags sometimes when the return value
      contains an enye (the N with a curly thing on top of it).
    • Orlando Andico
      ... wrote: .. ... I also tried updating to Expat-1.95-6 but the problem doesn t go away. I also get mismatched tag errors in other instances. it really does
      Message 2 of 5 , Jul 28 6:16 AM
      • 0 Attachment
        --- In soaplite@yahoogroups.com, "Orlando Andico" <orly_andico@y...>
        wrote:
        ..
        > but from a web page (via mod_perl and Apache::ASP) i get:
        >
        > vous êtes un moron répugnant
        >
        > XML parsing failed: unclosed token (Line: 11, Character: 0)
        >
        > i've tried "use utf8" but to no avail.
        >
        > i also get mismatched XML tags sometimes when the return value
        > contains an enye (the N with a curly thing on top of it).

        I also tried updating to Expat-1.95-6 but the problem doesn't go away.
        I also get "mismatched tag" errors in other instances. it really does
        seem that the "special" characters are converted to <XX> where XX is a
        couple of hex digits, and this screws up XML::Parser.
      • Igor Korolev
        These 2 characters probably represent 2 bytes of utf8 encoding of your characters. Try to set encoding of your web page to proper locale. ... From: Orlando
        Message 3 of 5 , Jul 28 7:01 AM
        • 0 Attachment
          These 2 characters probably represent 2 bytes of utf8 encoding
          of your characters.

          Try to set encoding of your web page to proper locale.

          -----Original Message-----
          From: Orlando Andico [mailto:orly_andico@...]
          Sent: Monday, July 28, 2003 8:16 AM
          To: soaplite@yahoogroups.com
          Subject: [soaplite] Re: problem parsing "special" characters


          --- In soaplite@yahoogroups.com, "Orlando Andico" <orly_andico@y...>
          wrote:
          ..
          > but from a web page (via mod_perl and Apache::ASP) i get:
          >
          > vous êtes un moron répugnant
          >
          > XML parsing failed: unclosed token (Line: 11, Character: 0)
          >
          > i've tried "use utf8" but to no avail.
          >
          > i also get mismatched XML tags sometimes when the return value
          > contains an enye (the N with a curly thing on top of it).

          I also tried updating to Expat-1.95-6 but the problem doesn't go away.
          I also get "mismatched tag" errors in other instances. it really does
          seem that the "special" characters are converted to <XX> where XX is a
          couple of hex digits, and this screws up XML::Parser.



          To unsubscribe from this group, send an email to:
          soaplite-unsubscribe@yahoogroups.com



          Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        • Orlando Andico
          ... That is not the problem entirely. I tried re-doing the program as a small script, and redirected the output to the file. The output vous êtes un moron
          Message 4 of 5 , Jul 28 6:55 PM
          • 0 Attachment
            --- In soaplite@yahoogroups.com, "Igor Korolev" <IgorK@D...> wrote:
            > These 2 characters probably represent 2 bytes of utf8 encoding
            > of your characters.
            >
            > Try to set encoding of your web page to proper locale.

            That is not the problem entirely.
            I tried re-doing the program as a small script, and redirected the
            output to the file. The output

            vous êtes un moron répugnant

            actually looks like this:

            vous <C3><AA>tes un moron r<C3><A9>pugnant

            it's just that Gnome-Terminal is UTF8-aware and can render it
            properly. Opera apparently cannot when I switched it to UTF8. Also,
            the extra brackets in there look like XML tags to XML::Parser! which
            is why XML::Parser returns a "mismatched tag" error.

            Someone on this list suggested I use the Encode module to convert the
            UTF8 to ISO8859-1. But that will only fix the problem if XML::Parser
            doesn't fail (I can format the output properly). But in some of my
            other Perl scripts, the SOAP call itself FAILS due to exceptions
            thrown by XML::Parser. Obviously in this case the Encode trick won't work.

            Also, the latest version of Encode requires Perl 5.7.3 while I'm using
            5.6.1. I cannot update at this time because it's a production machine
            running mod_perl and a host of other Perl applications.
          • Orlando Andico
            ... wrote: .. ... Fixed it. XML::Parser::Lite which comes with SOAP::Lite doesn t have the problems that XML::Parser has.
            Message 5 of 5 , Jul 28 7:57 PM
            • 0 Attachment
              --- In soaplite@yahoogroups.com, "Orlando Andico" <orly_andico@y...>
              wrote:
              ..
              > Someone on this list suggested I use the Encode module to convert the
              > UTF8 to ISO8859-1. But that will only fix the problem if XML::Parser
              > doesn't fail (I can format the output properly). But in some of my
              > other Perl scripts, the SOAP call itself FAILS due to exceptions
              > thrown by XML::Parser. Obviously in this case the Encode trick won't
              > work.

              Fixed it. XML::Parser::Lite which comes with SOAP::Lite doesn't have
              the problems that XML::Parser has.
            Your message has been successfully submitted and would be delivered to recipients shortly.