Loading ...
Sorry, an error occurred while loading the content.

4252Re: Method calls failing due to malformed utf-8?

Expand Messages
  • Eric Promislow
    Dec 14 4:37 PM
      It's failing because the character with value 3 is an invalid XML character,
      even if it's encoded as a character reference.

      From the XML spec:

      Character Range
      [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
      [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
      excluding the surrogate blocks, FFFE, and FFFF. */

      Fine, you say, I'll just use a character reference to represent a character
      outside this range. Not so fast:

      4.1 Character and Entity References

      [Definition: A character reference refers to a specific character in
      the ISO/IEC 10646 character set, for example one not directly
      accessible from available input devices.]
      Character Reference
      [66] CharRef ::= '&#' [0-9]+ ';'
      | '&#x' [0-9a-fA-F]+ ';' [WFC: Legal Character]

      Well-formedness constraint: Legal Character

      Characters referred to using character references MUST match the
      production for Char.

      This means that there is no way to represent characters that do not fall in the
      range specified in production [2] within XML. You need to use an external
      encoding, like base64 (utf-8 would be an internal encoding, as the parser
      is processing it).

      Unfortunately this breaks the loosely coupled nature of SOAP.

      One solution is to tell SOAP::Lite to use a parser that allows non-xml-chars
      if encoded as character references, like the .Net 1.0 and 1.1 parsers
      do (apparently
      this will be fixed in Whidbey).

      Another is to grab the code before SOAP hands it to the parser for
      deserializing, and do a pass over it, something like:

      $data =~ s/&(#x0*1?.;)/&$1/g;
      $data =~ s/&(#0*[12]?\d;)/&$1/g;
      $data =~ s/&(#0*3[01])/&$1/g;

      and then post-process this code after the XML parser returns it to
      your application.

      - Eric (making my annual foray into this list).

      > Hi Group - boy am I glad I've found you guys!

      > I'm interfacing to a third party web service using SOAP::Lite and
      > have run into a problem with my method calls failing on certain
      > requests.

      > The code fragment I use is like this;

      > my $method = SOAP::Data->name('GetInformation') ->attr({xmlns
      > => 'http://someservice/soap/'});

      > my $result = $soap->call($method => @params);

      > and it is this method call that is sometimes failing - depending on
      > the parameters I pass. Oh the 3rd party is a .NET service but I
      > believe I have accomodated that correctly as the system seems to
      > work OK most of the time.

      > The error I get is;

      > reference to invalid character number at line 1, column 5267, byte
      > 5267 at E:/Perl/site/lib/XML/Parser.pm line 187

      > The XML returned begins like this;

      > <?xml version="1.0" encoding="utf-8"?><soap:Envelope
      > xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
      > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      > xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body>

      > I am assuming - perhaps rashly - that this is because the XML
      > document contains a character which isn't UTF-8?

      > I note that in the area where the parser complains, I have a line
      > that looks like this;

      > <Title>Eternity Ring 12,990</Title>

      > Firstly, is my diagnosis correct? I'm guessing that  isn't a
      > valid UTF-8 character - or am I wrong here. I'm a bit out of my
      > depth.

      > Secondly (if my assumption is true) - whilst I can report this to
      > the service provider and hope they clean their data, in the meantime
      > is there a workaround for this that allows me to relax the rules
      > that the soap::lite parser is enforcing. At present I just put an
      > eval block around the method call to trap the error and stop
      > additional processing of this item - but I'd really prefer to be
      > able to process the data anyway.

      > Any clues, links, advice much appreciated.

      > Many Thanks
      > Roger
      > UK

      > PS. My system is Win2003 server / Perl 5.8.3 build 809 / SOAP-LITE
      > 0.55
    • Show all 4 messages in this topic