Loading ...
Sorry, an error occurred while loading the content.

Method calls failing due to malformed utf-8?

Expand Messages
  • Roger
    Hi Group - boy am I glad I ve found you guys! I m interfacing to a third party web service using SOAP::Lite and have run into a problem with my method calls
    Message 1 of 4 , Dec 14, 2004
    • 0 Attachment
      Hi Group - boy am I glad I've found you guys!

      I'm interfacing to a third party web service using SOAP::Lite and
      have run into a problem with my method calls failing on certain
      requests.

      The code fragment I use is like this;

      my $method = SOAP::Data->name('GetInformation') ->attr({xmlns
      => 'http://someservice/soap/'});

      my $result = $soap->call($method => @params);

      and it is this method call that is sometimes failing - depending on
      the parameters I pass. Oh the 3rd party is a .NET service but I
      believe I have accomodated that correctly as the system seems to
      work OK most of the time.

      The error I get is;

      reference to invalid character number at line 1, column 5267, byte
      5267 at E:/Perl/site/lib/XML/Parser.pm line 187

      The XML returned begins like this;

      <?xml version="1.0" encoding="utf-8"?><soap:Envelope
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body>

      I am assuming - perhaps rashly - that this is because the XML
      document contains a character which isn't UTF-8?

      I note that in the area where the parser complains, I have a line
      that looks like this;

      <Title>Eternity Ring 12,990</Title>

      Firstly, is my diagnosis correct? I'm guessing that  isn't a
      valid UTF-8 character - or am I wrong here. I'm a bit out of my
      depth.

      Secondly (if my assumption is true) - whilst I can report this to
      the service provider and hope they clean their data, in the meantime
      is there a workaround for this that allows me to relax the rules
      that the soap::lite parser is enforcing. At present I just put an
      eval block around the method call to trap the error and stop
      additional processing of this item - but I'd really prefer to be
      able to process the data anyway.

      Any clues, links, advice much appreciated.

      Many Thanks
      Roger
      UK

      PS. My system is Win2003 server / Perl 5.8.3 build 809 / SOAP-LITE
      0.55
    • Eric Promislow
      It s failing because the character with value 3 is an invalid XML character, even if it s encoded as a character reference. From the XML spec: Character Range
      Message 2 of 4 , Dec 14, 2004
      • 0 Attachment
        It's failing because the character with value 3 is an invalid XML character,
        even if it's encoded as a character reference.

        From the XML spec:

        Character Range
        [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
        [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
        excluding the surrogate blocks, FFFE, and FFFF. */

        Fine, you say, I'll just use a character reference to represent a character
        outside this range. Not so fast:

        4.1 Character and Entity References

        [Definition: A character reference refers to a specific character in
        the ISO/IEC 10646 character set, for example one not directly
        accessible from available input devices.]
        Character Reference
        [66] CharRef ::= '&#' [0-9]+ ';'
        | '&#x' [0-9a-fA-F]+ ';' [WFC: Legal Character]

        Well-formedness constraint: Legal Character

        Characters referred to using character references MUST match the
        production for Char.

        This means that there is no way to represent characters that do not fall in the
        range specified in production [2] within XML. You need to use an external
        encoding, like base64 (utf-8 would be an internal encoding, as the parser
        is processing it).

        Unfortunately this breaks the loosely coupled nature of SOAP.

        One solution is to tell SOAP::Lite to use a parser that allows non-xml-chars
        if encoded as character references, like the .Net 1.0 and 1.1 parsers
        do (apparently
        this will be fixed in Whidbey).

        Another is to grab the code before SOAP hands it to the parser for
        deserializing, and do a pass over it, something like:

        $data =~ s/&(#x0*1?.;)/&$1/g;
        $data =~ s/&(#0*[12]?\d;)/&$1/g;
        $data =~ s/&(#0*3[01])/&$1/g;

        and then post-process this code after the XML parser returns it to
        your application.

        - Eric (making my annual foray into this list).

        > Hi Group - boy am I glad I've found you guys!

        > I'm interfacing to a third party web service using SOAP::Lite and
        > have run into a problem with my method calls failing on certain
        > requests.

        > The code fragment I use is like this;

        > my $method = SOAP::Data->name('GetInformation') ->attr({xmlns
        > => 'http://someservice/soap/'});

        > my $result = $soap->call($method => @params);

        > and it is this method call that is sometimes failing - depending on
        > the parameters I pass. Oh the 3rd party is a .NET service but I
        > believe I have accomodated that correctly as the system seems to
        > work OK most of the time.

        > The error I get is;

        > reference to invalid character number at line 1, column 5267, byte
        > 5267 at E:/Perl/site/lib/XML/Parser.pm line 187

        > The XML returned begins like this;

        > <?xml version="1.0" encoding="utf-8"?><soap:Envelope
        > xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
        > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        > xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body>

        > I am assuming - perhaps rashly - that this is because the XML
        > document contains a character which isn't UTF-8?

        > I note that in the area where the parser complains, I have a line
        > that looks like this;

        > <Title>Eternity Ring 12,990</Title>

        > Firstly, is my diagnosis correct? I'm guessing that  isn't a
        > valid UTF-8 character - or am I wrong here. I'm a bit out of my
        > depth.

        > Secondly (if my assumption is true) - whilst I can report this to
        > the service provider and hope they clean their data, in the meantime
        > is there a workaround for this that allows me to relax the rules
        > that the soap::lite parser is enforcing. At present I just put an
        > eval block around the method call to trap the error and stop
        > additional processing of this item - but I'd really prefer to be
        > able to process the data anyway.

        > Any clues, links, advice much appreciated.

        > Many Thanks
        > Roger
        > UK

        > PS. My system is Win2003 server / Perl 5.8.3 build 809 / SOAP-LITE
        > 0.55
      • eric-amick@comcast.net
        ... You re close. It s perfectly valid Unicode (UTF-8 is just a method of encoding Unicode); the problem is that the XML standard does not require that
        Message 3 of 4 , Dec 15, 2004
        • 0 Attachment

          > I am assuming - perhaps rashly - that this is because the XML
          > document contains a character which isn't UTF-8?
          >
          > I note that in the area where the parser complains, I have a

          >line that looks like this;
          >
          ><Title>Eternity Ring &#x3; 12,990</Title>
          >
          > Firstly, is my diagnosis correct? I'm guessing that &#x3; isn't a
          > valid UTF-8 character - or am I wrong here. I'm a bit out of my
          > depth.

          You're close. It's perfectly valid Unicode (UTF-8 is just a method of encoding Unicode); the problem is that the XML standard does not require that character to be accepted by XML processors.


          > Secondly (if my assumption is true) - whilst I can report this to
          > the service provider and hope they clean their data, in the meantime
          > is there a workaround for this that allows me to relax the rules
          > that the soap::lite parser is enforcing. At present I just put an
          > eval block around the method call to trap the error and stop
          > additional processing of this item - but I'd really prefer to be
          > able to process the data anyway.

          You could probably modify SOAP::Lite to allow all characters in the range \x00-\x1f, but I have no idea if it would break anything else.
           
          --
          Eric Amick
          Columbia, MD
        • Roger
          Ooops, I posted this reply about 8 hours ago but think I accidentally posted it to the author in error. Apologies to Eric and the group. Here s what I meant to
          Message 4 of 4 , Dec 15, 2004
          • 0 Attachment
            Ooops, I posted this reply about 8 hours ago but think I
            accidentally posted it to the author in error. Apologies to Eric and
            the group. Here's what I meant to say!

            <eric.promislow@g...> wrote:

            > Another [solution] is to grab the code before SOAP hands it to the
            parser for
            > deserializing, and do a pass over it, something like:
            >
            > $data =~ s/&(#x0*1?.;)/&$1/g;
            > $data =~ s/&(#0*[12]?\d;)/&$1/g;
            > $data =~ s/&(#0*3[01])/&$1/g;
            >
            > and then post-process this code after the XML parser returns it to
            > your application.
            >

            Yes, I like this plan - but I fear it is beyond my soap::lite
            knowledge. From my googling exploits, I imagine it has something to
            do with SOAP::Transport::HTTP::Client ... but does anyone know HOW I
            can intercept the XML as Eric suggests and then pass on the
            translated version to the parser?

            Any clues appreciated.

            Roger
            London, UK
          Your message has been successfully submitted and would be delivered to recipients shortly.