Loading ...
Sorry, an error occurred while loading the content.

Re: [soaplite] Useful module to use with SOAP::Lite

Expand Messages
  • Pierre Denis
    Hi Duncan, ... If you concatenate a utf8 flagged string with a non utf8 flagged string, the result will be a utf8 flagged string. The problems happens when you
    Message 1 of 4 , Nov 5, 2003
    • 0 Attachment
      Hi Duncan,


      > > Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
      > > minor issues:
      > >
      > > - soap calls return utf8 strings, even if the content doesn't have any
      > > utf8 encoded characters. This is a "feature" of the XML parser. This
      > can
      > > be annoying when you put this utf8 data in an HTML template and
      > specify
      > > an iso=* encoding in the HTML header. The result will be a utf8 page
      > > rendered as a iso-* content, which is messy. Most XML related module
      > > produce the same "effect".
      >
      > Not sure that I understand the problem that you are describing. A utf8
      > string that has no multi-byte sequences is identical to an ASCII string
      > as it is limited to 7 bits and should then be compatible with ISO-8859
      > encodings.
      >

      If you concatenate a utf8 flagged string with a non utf8 flagged string,
      the result will be a utf8 flagged string. The problems happens when you
      display an html page and you specify a charset iso-xxxx because you
      expect your content to be iso-xxx. If you mix strings from SOAP::Lite
      (flagged utf8 - even if it doesn't contain any utf8 characters) and some
      other content (non utf8), the result is a utf8 content.

      The browser will receive a utf8 encoded html page with a html header
      telling him that is is actually iso-xxxx encoded. The result is bad.



      > > - When you send data structures containing blessed references
      > (objects),
      > > the encoded XML is different than if the data was not blessed. This
      > may
      > > have some undesirable side effects with non perl SOAP implementations.
      > >
      > > So, here is a new module Data::Structure::Util:
      > >
      > http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structu
      > re/Util.pm
      > > It enables, amongst other things, to encode/decode any string to/from
      > > utf8 within a data structure. It also enables to remove the blessing
      > on
      > > any reference within the structure.
      >
      > Looking at the POD for the module it is not clear what the utf8_on() and
      > off() routines do. Do they simply set or clear the utf8 flag attached to
      > each scalar? Or do they do more than that?

      I should update the doc then.
      They do more than that, they actually call the perl API to encode/decode (if possible) the string.
      The flag will be set/unset as the result of the transformation.

      Regards
    • duncan_cameron2002@yahoo.co.uk
      ... That s right but Pierre was referring to utf-8 encoded strings that don t have any utf-8 characters. I took that to mean only single byte character
      Message 2 of 4 , Nov 5, 2003
      • 0 Attachment
        At 21:02:31 on 2003-11-05 Randy J. Ray <rjray@...> wrote:

        >> Not sure that I understand the problem that you are describing. A utf8
        >> string that has no multi-byte sequences is identical to an ASCII string
        >> as it is limited to 7 bits and should then be compatible with ISO-8859
        >> encodings.
        >
        >UTF-8 and ISO-8859-1 overlap, but are not identical.
        >
        That's right but Pierre was referring to utf-8 encoded strings that
        don't have any utf-8 characters. I took that to mean only single byte
        character sequences, in effect 7 bit characters.

        >The problem is not with the parser module, it's with the XML specification.
        >The XML spec says that a document's encoding defaults to UTF-8 in the absence
        >of an explicit declaration.
        >
        >Most parsers take this a step further and convert the text nodes they return
        >to the application to UTF-8 even when the document is explicitly encoded
        >otherwise. While this can be convenient, it's also consistent and predictable
        >behavior. In an application I recently wrote with XML::Parser (that did not
        >involve SOAP or XML-RPC, but did have to deal with encoding issues), I used
        >the "use bytes" pragma, and was able to do what I needed with the text data,
        >with no problems.

        I am still not convinced by Pierre's approach of messing with the
        utf-8 flag on a variable. I think that explicitly converting the
        utf-8 output from SOAP::Lite into whatever encoding he wants for his
        web page is cleaner and safer.

        Regards
        Duncan
      Your message has been successfully submitted and would be delivered to recipients shortly.