Loading ...
Sorry, an error occurred while loading the content.

Re: [soaplite] Useful module to use with SOAP::Lite

Expand Messages
  • Duncan Cameron
    ... From: Pierre Denis To: Sent: Wednesday, November 05, 2003 11:28 AM Subject: [soaplite] Useful module to
    Message 1 of 4 , Nov 5, 2003
    • 0 Attachment
      ----- Original Message -----
      From: "Pierre Denis" <pdenis@...>
      To: <soaplite@yahoogroups.com>
      Sent: Wednesday, November 05, 2003 11:28 AM
      Subject: [soaplite] Useful module to use with SOAP::Lite


      > Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
      > minor issues:
      >
      > - soap calls return utf8 strings, even if the content doesn't have any
      > utf8 encoded characters. This is a "feature" of the XML parser. This
      can
      > be annoying when you put this utf8 data in an HTML template and
      specify
      > an iso=* encoding in the HTML header. The result will be a utf8 page
      > rendered as a iso-* content, which is messy. Most XML related module
      > produce the same "effect".

      Not sure that I understand the problem that you are describing. A utf8
      string that has no multi-byte sequences is identical to an ASCII string
      as it is limited to 7 bits and should then be compatible with ISO-8859
      encodings.

      > - When you send data structures containing blessed references
      (objects),
      > the encoded XML is different than if the data was not blessed. This
      may
      > have some undesirable side effects with non perl SOAP implementations.
      >
      > So, here is a new module Data::Structure::Util:
      >
      http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structu
      re/Util.pm
      > It enables, amongst other things, to encode/decode any string to/from
      > utf8 within a data structure. It also enables to remove the blessing
      on
      > any reference within the structure.

      Looking at the POD for the module it is not clear what the utf8_on() and
      off() routines do. Do they simply set or clear the utf8 flag attached to
      each scalar? Or do they do more than that?

      Regards
      Duncan
    • Pierre Denis
      Hi Duncan, ... If you concatenate a utf8 flagged string with a non utf8 flagged string, the result will be a utf8 flagged string. The problems happens when you
      Message 2 of 4 , Nov 5, 2003
      • 0 Attachment
        Hi Duncan,


        > > Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
        > > minor issues:
        > >
        > > - soap calls return utf8 strings, even if the content doesn't have any
        > > utf8 encoded characters. This is a "feature" of the XML parser. This
        > can
        > > be annoying when you put this utf8 data in an HTML template and
        > specify
        > > an iso=* encoding in the HTML header. The result will be a utf8 page
        > > rendered as a iso-* content, which is messy. Most XML related module
        > > produce the same "effect".
        >
        > Not sure that I understand the problem that you are describing. A utf8
        > string that has no multi-byte sequences is identical to an ASCII string
        > as it is limited to 7 bits and should then be compatible with ISO-8859
        > encodings.
        >

        If you concatenate a utf8 flagged string with a non utf8 flagged string,
        the result will be a utf8 flagged string. The problems happens when you
        display an html page and you specify a charset iso-xxxx because you
        expect your content to be iso-xxx. If you mix strings from SOAP::Lite
        (flagged utf8 - even if it doesn't contain any utf8 characters) and some
        other content (non utf8), the result is a utf8 content.

        The browser will receive a utf8 encoded html page with a html header
        telling him that is is actually iso-xxxx encoded. The result is bad.



        > > - When you send data structures containing blessed references
        > (objects),
        > > the encoded XML is different than if the data was not blessed. This
        > may
        > > have some undesirable side effects with non perl SOAP implementations.
        > >
        > > So, here is a new module Data::Structure::Util:
        > >
        > http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structu
        > re/Util.pm
        > > It enables, amongst other things, to encode/decode any string to/from
        > > utf8 within a data structure. It also enables to remove the blessing
        > on
        > > any reference within the structure.
        >
        > Looking at the POD for the module it is not clear what the utf8_on() and
        > off() routines do. Do they simply set or clear the utf8 flag attached to
        > each scalar? Or do they do more than that?

        I should update the doc then.
        They do more than that, they actually call the perl API to encode/decode (if possible) the string.
        The flag will be set/unset as the result of the transformation.

        Regards
      • duncan_cameron2002@yahoo.co.uk
        ... That s right but Pierre was referring to utf-8 encoded strings that don t have any utf-8 characters. I took that to mean only single byte character
        Message 3 of 4 , Nov 5, 2003
        • 0 Attachment
          At 21:02:31 on 2003-11-05 Randy J. Ray <rjray@...> wrote:

          >> Not sure that I understand the problem that you are describing. A utf8
          >> string that has no multi-byte sequences is identical to an ASCII string
          >> as it is limited to 7 bits and should then be compatible with ISO-8859
          >> encodings.
          >
          >UTF-8 and ISO-8859-1 overlap, but are not identical.
          >
          That's right but Pierre was referring to utf-8 encoded strings that
          don't have any utf-8 characters. I took that to mean only single byte
          character sequences, in effect 7 bit characters.

          >The problem is not with the parser module, it's with the XML specification.
          >The XML spec says that a document's encoding defaults to UTF-8 in the absence
          >of an explicit declaration.
          >
          >Most parsers take this a step further and convert the text nodes they return
          >to the application to UTF-8 even when the document is explicitly encoded
          >otherwise. While this can be convenient, it's also consistent and predictable
          >behavior. In an application I recently wrote with XML::Parser (that did not
          >involve SOAP or XML-RPC, but did have to deal with encoding issues), I used
          >the "use bytes" pragma, and was able to do what I needed with the text data,
          >with no problems.

          I am still not convinced by Pierre's approach of messing with the
          utf-8 flag on a variable. I think that explicitly converting the
          utf-8 output from SOAP::Lite into whatever encoding he wants for his
          web page is cleaner and safer.

          Regards
          Duncan
        Your message has been successfully submitted and would be delivered to recipients shortly.