Loading ...
Sorry, an error occurred while loading the content.

Useful module to use with SOAP::Lite

Expand Messages
  • Pierre Denis
    Using SOAP::Lite intensively (thanks Paul!), I had to deal with two minor issues: - soap calls return utf8 strings, even if the content doesn t have any utf8
    Message 1 of 4 , Nov 5, 2003
    View Source
    • 0 Attachment
      Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
      minor issues:

      - soap calls return utf8 strings, even if the content doesn't have any
      utf8 encoded characters. This is a "feature" of the XML parser. This can
      be annoying when you put this utf8 data in an HTML template and specify
      an iso=* encoding in the HTML header. The result will be a utf8 page
      rendered as a iso-* content, which is messy. Most XML related module
      produce the same "effect".

      - When you send data structures containing blessed references (objects),
      the encoded XML is different than if the data was not blessed. This may
      have some undesirable side effects with non perl SOAP implementations.

      So, here is a new module Data::Structure::Util:
      http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structure/Util.pm
      It enables, amongst other things, to encode/decode any string to/from
      utf8 within a data structure. It also enables to remove the blessing on
      any reference within the structure.

      Enjoy!

      --
      Pierre Denis
      Development Manager
      Fotango
      +44 (0)20 7251 7021
    • Duncan Cameron
      ... From: Pierre Denis To: Sent: Wednesday, November 05, 2003 11:28 AM Subject: [soaplite] Useful module to
      Message 2 of 4 , Nov 5, 2003
      View Source
      • 0 Attachment
        ----- Original Message -----
        From: "Pierre Denis" <pdenis@...>
        To: <soaplite@yahoogroups.com>
        Sent: Wednesday, November 05, 2003 11:28 AM
        Subject: [soaplite] Useful module to use with SOAP::Lite


        > Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
        > minor issues:
        >
        > - soap calls return utf8 strings, even if the content doesn't have any
        > utf8 encoded characters. This is a "feature" of the XML parser. This
        can
        > be annoying when you put this utf8 data in an HTML template and
        specify
        > an iso=* encoding in the HTML header. The result will be a utf8 page
        > rendered as a iso-* content, which is messy. Most XML related module
        > produce the same "effect".

        Not sure that I understand the problem that you are describing. A utf8
        string that has no multi-byte sequences is identical to an ASCII string
        as it is limited to 7 bits and should then be compatible with ISO-8859
        encodings.

        > - When you send data structures containing blessed references
        (objects),
        > the encoded XML is different than if the data was not blessed. This
        may
        > have some undesirable side effects with non perl SOAP implementations.
        >
        > So, here is a new module Data::Structure::Util:
        >
        http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structu
        re/Util.pm
        > It enables, amongst other things, to encode/decode any string to/from
        > utf8 within a data structure. It also enables to remove the blessing
        on
        > any reference within the structure.

        Looking at the POD for the module it is not clear what the utf8_on() and
        off() routines do. Do they simply set or clear the utf8 flag attached to
        each scalar? Or do they do more than that?

        Regards
        Duncan
      • Pierre Denis
        Hi Duncan, ... If you concatenate a utf8 flagged string with a non utf8 flagged string, the result will be a utf8 flagged string. The problems happens when you
        Message 3 of 4 , Nov 5, 2003
        View Source
        • 0 Attachment
          Hi Duncan,


          > > Using SOAP::Lite intensively (thanks Paul!), I had to deal with two
          > > minor issues:
          > >
          > > - soap calls return utf8 strings, even if the content doesn't have any
          > > utf8 encoded characters. This is a "feature" of the XML parser. This
          > can
          > > be annoying when you put this utf8 data in an HTML template and
          > specify
          > > an iso=* encoding in the HTML header. The result will be a utf8 page
          > > rendered as a iso-* content, which is messy. Most XML related module
          > > produce the same "effect".
          >
          > Not sure that I understand the problem that you are describing. A utf8
          > string that has no multi-byte sequences is identical to an ASCII string
          > as it is limited to 7 bits and should then be compatible with ISO-8859
          > encodings.
          >

          If you concatenate a utf8 flagged string with a non utf8 flagged string,
          the result will be a utf8 flagged string. The problems happens when you
          display an html page and you specify a charset iso-xxxx because you
          expect your content to be iso-xxx. If you mix strings from SOAP::Lite
          (flagged utf8 - even if it doesn't contain any utf8 characters) and some
          other content (non utf8), the result is a utf8 content.

          The browser will receive a utf8 encoded html page with a html header
          telling him that is is actually iso-xxxx encoded. The result is bad.



          > > - When you send data structures containing blessed references
          > (objects),
          > > the encoded XML is different than if the data was not blessed. This
          > may
          > > have some undesirable side effects with non perl SOAP implementations.
          > >
          > > So, here is a new module Data::Structure::Util:
          > >
          > http://search.cpan.org/~pdenis/Data-Structure-Util-0.02/lib/Data/Structu
          > re/Util.pm
          > > It enables, amongst other things, to encode/decode any string to/from
          > > utf8 within a data structure. It also enables to remove the blessing
          > on
          > > any reference within the structure.
          >
          > Looking at the POD for the module it is not clear what the utf8_on() and
          > off() routines do. Do they simply set or clear the utf8 flag attached to
          > each scalar? Or do they do more than that?

          I should update the doc then.
          They do more than that, they actually call the perl API to encode/decode (if possible) the string.
          The flag will be set/unset as the result of the transformation.

          Regards
        • duncan_cameron2002@yahoo.co.uk
          ... That s right but Pierre was referring to utf-8 encoded strings that don t have any utf-8 characters. I took that to mean only single byte character
          Message 4 of 4 , Nov 5, 2003
          View Source
          • 0 Attachment
            At 21:02:31 on 2003-11-05 Randy J. Ray <rjray@...> wrote:

            >> Not sure that I understand the problem that you are describing. A utf8
            >> string that has no multi-byte sequences is identical to an ASCII string
            >> as it is limited to 7 bits and should then be compatible with ISO-8859
            >> encodings.
            >
            >UTF-8 and ISO-8859-1 overlap, but are not identical.
            >
            That's right but Pierre was referring to utf-8 encoded strings that
            don't have any utf-8 characters. I took that to mean only single byte
            character sequences, in effect 7 bit characters.

            >The problem is not with the parser module, it's with the XML specification.
            >The XML spec says that a document's encoding defaults to UTF-8 in the absence
            >of an explicit declaration.
            >
            >Most parsers take this a step further and convert the text nodes they return
            >to the application to UTF-8 even when the document is explicitly encoded
            >otherwise. While this can be convenient, it's also consistent and predictable
            >behavior. In an application I recently wrote with XML::Parser (that did not
            >involve SOAP or XML-RPC, but did have to deal with encoding issues), I used
            >the "use bytes" pragma, and was able to do what I needed with the text data,
            >with no problems.

            I am still not convinced by Pierre's approach of messing with the
            utf-8 flag on a variable. I think that explicitly converting the
            utf-8 output from SOAP::Lite into whatever encoding he wants for his
            web page is cleaner and safer.

            Regards
            Duncan
          Your message has been successfully submitted and would be delivered to recipients shortly.