Loading ...
Sorry, an error occurred while loading the content.

904Re: [soaplite] Base64 and Cyrillic

Expand Messages
  • Paul Kulchenko
    Oct 12, 2001
    • 0 Attachment
      Hi, Sergei!

      It really depends on what (and how) you're trying to do that.

      Here is some information that can be helpful.

      First, if you do NOT specify type, your string that has cyrillic
      characters might be encoded as base64 which doesn't hurt, but on
      other side (after decoding) encoding on this string (original) might
      be different from the rest of the message (which is utf8 for
      SOAP::Lite server).

      Second, you can always specify encoding using ->encoding() method on
      serializer or SOAP::Lite objects as described in cookbook
      (http://cookbook.soaplite.com/#internationalization and encoding).

      There is also recommendation on how to transcode your data from
      iso-8859-1 into utf8:

      (my $utf = '������') =~ tr/\0-\x{ff}//CU;

      which works only for 5.6.0. For 5.6.1 and later (and even for 5.6.0)
      you need to use pack:

      my $utf8 = pack('U*', unpack('C*', '������'));

      (there are other options, such as Unicode::String, Unicode::Map8,
      Encode, and others)

      There is a problem using *REAL* utf8 with 5.6.1 and later. You may
      NOT be able to send message that has utf8 chars in it properly. Here
      is why. LWP and other lowlevel libraries define length of the string
      using length() function that counts *chars* on multibyte characters
      (as in your case) instead of *bytes*. But on wire we're sending
      *bytes* regardless of used encoding. As the result, specified
      content-length is smaller than it should be (it counts chars instead
      of bytes) and there is no way to change it on high level ("use bytes"
      has lexical scope). After long discussion on p5p list where I was
      arguing that it should be done on LWP or lower level and Gisle
      arguing that it should be done on mine ;) or I shoudln't be sending
      utf8 at all, I decided to downgrade utf8 strings to bytes for all
      HTTP transports, so low-level libraries will work fine. I still think
      that it shouldn't be done on my level and you can switch it off if
      you like, but that's how it is now and will be included with the next
      version. There is no such problem with perl 5.6.0 where length
      calculates bytes. If you're curious about implementation details,
      it's done with

      $envelope = pack('C0A*', $envelope);

      but only if char length not equal byte length, so there is no penalty
      in most cases.

      Completing the picture, utf8 that XML::Parser returns is *not* a real
      utf8 from Perl's perspective. Strings are not marked as *utf* (at
      least not in current version). Which means that they are not handled
      as utf8 strings. Not a problem if you don't use utf8 characters, but
      you do. For you strings must be promoted into utf8 strings with:

      $string = pack('U0A*', $string);

      or something similar.

      I do NOT include this functionality for two reasons: XML::Parser
      should do that (I hope at some point it will) and, second, if you do
      it using Encode interface there is no performance penalty in
      upgrading strings into utf8 (string IS already utf8 encoded, just not
      marked as such), comparing to pack method.

      > What information have I to provide to understand what is wrong?
      As Aaron pointed, debug info will be helpful as well as information
      what you *expect* to get.

      Let me know if you have any comments, questions or suggestions. Thank
      you.

      Best wishes, Paul.

      --- Sergei Dolmatov <sergei@...> wrote:
      > Greetings!
      >
      > I test SOAP::Lite for a few days - and got that it's really cool
      > tool (and looks like it's more simple that Delphi implementation ;)
      >
      > But today got weird problem - when I return string with cyrillic
      > symbols it sometimes encodes it not properly.
      > So I have three functions that returns cyrillic strings - two of
      > them works correct (looks like always).
      >
      > What information have I to provide to understand what is wrong?
      >
      > --
      > Best regards,
      > Sergei Dolmatov.
      >
      > ------------------------ Yahoo! Groups Sponsor
      >
      > To unsubscribe from this group, send an email to:
      > soaplite-unsubscribe@yahoogroups.com
      >
      >
      >
      > Your use of Yahoo! Groups is subject to
      > http://docs.yahoo.com/info/terms/
      >
      >


      __________________________________________________
      Do You Yahoo!?
      Make a great connection at Yahoo! Personals.
      http://personals.yahoo.com
    • Show all 12 messages in this topic