Loading ...
Sorry, an error occurred while loading the content.

Re: [soaplite] Soap and encoding of non ASCII literals

Expand Messages
  • cedric.boufflers
    ... Hello Duncan, So if understand well I might be trying to double encode the strings. But what has made me done that, and might have misleading me, is the
    Message 1 of 6 , Jun 16, 2005
    • 0 Attachment
      Duncan Cameron a écrit :
      Hi Cédric
      
      I don't fully understand what problem you are having but there is a
      basic misunderstanding in your explanation above.
      
      By definition XML uses the Unicode character codes.  ISO-8859-x and
      UTF-8 are ways of encoding physical XML documents, but when the
      documents are parsed the result will be in Unicode. It just happens
      that perl uses UTF-8 to hold its internal strings (sometimes). So you
      shouldn't have to do anything special to get strings encoded as UTF-8.
      
      
      You say
        
      And in the xml the literal is encoded as "c&#xE9dric;" which is an 
      ISO-8859-1 xml entities encoding.
          
      This is a correct use of a numeric entity to refer to the Unicode
      character point 0x00E9. You would need to use a numeric entity if the
      XML was being constructed in ISO-8859-1, which wouldn't normally be the
      case with perl.
      
      You say
        
      This can be solved by enforcing string literals encoding to UTF-8 in
          
      Java.
        
      Thus I would have in the trame "c&#xXX;&#xXX;dric", which is a correct
          
        
      UTF-8 encoded string and would be seen as an UTF-8 in my perl
          
      webservice 
        
      as well.
          
      This does look as if you are misunderstanding the encoding. You seem to
      be trying to encode the two byte UTF-8 representation of the Unicode
      point 0x00E9. In effect this is double encoding, which is incorrect.
      
      In summary
      
      using a numeric entity é is valid for both UTF-8 and ISO-8859-1
      
      if you are using UTF-8, then the é character encoded as two bytes is
      correct. Depending on your perl version, this should be automatic. Your
      Java client would need to explicitly convert to UTF-8 (something like
      str.getBytes("UTF-8")).
      
        

      Hello Duncan,

      So if understand well I might be trying to double encode the strings.

      But what has made me done that, and might have misleading me, is the error I was getting from the encoding method :

      my method call was :

      use Encode::Encoder qw/encoder/
      encoder($string)->bytes('UTF-8')->iso_8859_1;

      And this was giving me the error "\xE9 does not map to UTF-8". So this why I thought that é was not a valid UTF-8 code.

      But then this might not be a SOAP problem but an encoder method problem ? Do you have any hint of why it refuses to read the string as an UTF-8 one then?

      I'm sorry because the more I learn on encoding the more I seem to get confused with it ;)

      Actually my goal is the following :

      My Web Service has to write in a database encoded in Latin-1. So I have to encode the UTF-8 string to Latin-1, otherwise the data are not stored correctly in the database. What would be the proper way to ensure that whatever the SOAP client used (java, delphi, perl, php, ...) I will get UTF-8 string that I can encode in Latin-1 in the PERL Web service?

      If this problem is not SOAP::Lite related, do you have any hints of a list where I could get help for it ? :)

      Note : Perl is 5.8 and is running under Apache1.3/mod_perl.

      Thanks a lot for your help and explanations,

      Best Regards,
      Cédric


      Regards
      
      Duncan
      
        

      -- 
      ---------------------------------------------------------------------
      BOUFFLERS Cédric : cedric.boufflers@...
      ---------------------------------------------------------------------
      NordNet - 111 Rue de Croix - 59510 Hem - France
      tél : +33 3 20 66 55 55 - fax : +33 3 20 66 55 59
      ---------------------------------------------------------------------
      http://www.securitoo.com/
      http://www.nordnet.fr/
      http://www.lerelaisinternet.com/
      ---------------------------------------------------------------------
      
    • Duncan Cameron
      ... problem ... UTF-8 ... have ... stored ... get ... My understanding is that all the parameters passed to your server class will be marked as UTF-8 (because
      Message 2 of 6 , Jun 16, 2005
      • 0 Attachment
        At 2005-06-16, 14:05:55 you wrote:
        >
        >Hello Duncan,
        >
        >So if understand well I might be trying to double encode the strings.
        >
        >But what has made me done that, and might have misleading me, is the
        >error I was getting from the encoding method :
        >
        >my method call was :
        >
        >use Encode::Encoder qw/encoder/
        >encoder($string)->bytes('UTF-8')->iso_8859_1;
        >
        >And this was giving me the error "\xE9 does not map to UTF-8". So this

        >why I thought that é was not a valid UTF-8 code.
        >
        >But then this might not be a SOAP problem but an encoder method
        problem
        >? Do you have any hint of why it refuses to read the string as an
        UTF-8
        >one then?
        >
        >I'm sorry because the more I learn on encoding the more I seem to get
        >confused with it ;)
        >
        >Actually my goal is the following :
        >
        >My Web Service has to write in a database encoded in Latin-1. So I
        have
        >to encode the UTF-8 string to Latin-1, otherwise the data are not
        stored
        >correctly in the database. What would be the proper way to ensure that

        >whatever the SOAP client used (java, delphi, perl, php, ...) I will
        get
        >UTF-8 string that I can encode in Latin-1 in the PERL Web service?
        >
        >If this problem is not SOAP::Lite related, do you have any hints of a
        >list where I could get help for it ? :)
        >
        >Note : Perl is 5.8 and is running under Apache1.3/mod_perl.
        >
        >Thanks a lot for your help and explanations,
        >
        >Best Regards,
        >Cédric
        >
        My understanding is that all the parameters passed to your server class
        will be marked as UTF-8 (because they have been through the XML
        parser), so you should be able to convert a string to 8859-1 in this
        way:

        my $octets = encode("iso-8859-1", $string, 1);

        this should throw an error if $string contains characters that are not
        in 8859-1, so you will need to handle that event within an eval.

        Regards

        Duncan




        ___________________________________________________________
        How much free photo storage do you get? Store your holiday
        snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
      • cedric.boufflers
        Hello Duncan and the list readers, I have been doing some experimentation. I have written a simple Web service in PERL : This is my method : sub get_Champs {
        Message 3 of 6 , Jun 24, 2005
        • 0 Attachment
          Hello Duncan and the list readers,

          I have been doing some experimentation. I have written a simple Web
          service in PERL :

          This is my method :

          sub get_Champs
          {
          my $class = shift;
          my $envelope = pop;

          my $champ = $envelope->valueof("//get_Champs/Champ");

          use Data::HexDump;

          return SOAP::Data->name('result' => HexDump($champ));
          }

          It just return an Hexadecimal dump of the string received.

          I have called it with a Java standard Java client :


          *- First Test*
          System.out.println(wsenc.get_Champs("cédric"));

          In response I had :
          00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 0123456789ABCDEF

          00000000 63 E9 64 72 69 63 c.dric

          So it seems my accent is encoded on a single byte there, and PERL does
          not deal the string in UTF-8 in this case.


          *- Second Test*
          System.out.println(wsenc.get_Champs(new
          String("cédric".getBytes("UTF-8"))));

          In response I had :
          00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F
          0123456789ABCDEF

          00000000 63 C3 A9 64 72 69 63 c..dric

          In this case I have a double bytes encoded accentued character. Is it
          because in this case I am doing double encoding? Although in this case
          in PERL it is seen as an UTF-8 string.

          How could I force PERL or SOAP::Lite to always deal with the string in
          UTF-8 ?

          I have tried to add this lines :

          use POSIX qw(locale_h);
          setlocale(LC_CTYPE, "en_US.UTF-8");

          But it changes nothing and the default locale of the computer is :
          LANG=en_US.UTF-8
          LANGVAR=en_US.UTF-8

          But nothing does.

          Best Regards,
          And thank you for your help.

          Cédric

          Note :
          SOAP::Lite is 0.60
          Perl is perl5 (revision 5.0 version 8 subversion 0).



          Duncan Cameron a écrit :

          >At 2005-06-16, 14:05:55 you wrote:
          >
          >
          >>Hello Duncan,
          >>
          >>So if understand well I might be trying to double encode the strings.
          >>
          >>But what has made me done that, and might have misleading me, is the
          >>error I was getting from the encoding method :
          >>
          >>my method call was :
          >>
          >>use Encode::Encoder qw/encoder/
          >>encoder($string)->bytes('UTF-8')->iso_8859_1;
          >>
          >>And this was giving me the error "\xE9 does not map to UTF-8". So this
          >>
          >>
          >
          >
          >
          >>why I thought that é was not a valid UTF-8 code.
          >>
          >>But then this might not be a SOAP problem but an encoder method
          >>
          >>
          >problem
          >
          >
          >>? Do you have any hint of why it refuses to read the string as an
          >>
          >>
          >UTF-8
          >
          >
          >>one then?
          >>
          >>I'm sorry because the more I learn on encoding the more I seem to get
          >>confused with it ;)
          >>
          >>Actually my goal is the following :
          >>
          >>My Web Service has to write in a database encoded in Latin-1. So I
          >>
          >>
          >have
          >
          >
          >>to encode the UTF-8 string to Latin-1, otherwise the data are not
          >>
          >>
          >stored
          >
          >
          >>correctly in the database. What would be the proper way to ensure that
          >>
          >>
          >
          >
          >
          >>whatever the SOAP client used (java, delphi, perl, php, ...) I will
          >>
          >>
          >get
          >
          >
          >>UTF-8 string that I can encode in Latin-1 in the PERL Web service?
          >>
          >>If this problem is not SOAP::Lite related, do you have any hints of a
          >>list where I could get help for it ? :)
          >>
          >>Note : Perl is 5.8 and is running under Apache1.3/mod_perl.
          >>
          >>Thanks a lot for your help and explanations,
          >>
          >>Best Regards,
          >>Cédric
          >>
          >>
          >>
          >My understanding is that all the parameters passed to your server class
          >will be marked as UTF-8 (because they have been through the XML
          >parser), so you should be able to convert a string to 8859-1 in this
          >way:
          >
          >my $octets = encode("iso-8859-1", $string, 1);
          >
          >this should throw an error if $string contains characters that are not
          >in 8859-1, so you will need to handle that event within an eval.
          >
          >Regards
          >
          >Duncan
          >
          >
          >
          >
          >___________________________________________________________
          >How much free photo storage do you get? Store your holiday
          >snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
          >
          >
          >
          >Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
          >
          >
          >


          --
          ---------------------------------------------------------------------
          BOUFFLERS Cédric : cedric.boufflers@...
          ---------------------------------------------------------------------
          NordNet - 111 Rue de Croix - 59510 Hem - France
          tél : +33 3 20 66 55 55 - fax : +33 3 20 66 55 59
          ---------------------------------------------------------------------
          http://www.securitoo.com/
          http://www.nordnet.fr/
          http://www.lerelaisinternet.com/
          ---------------------------------------------------------------------
        Your message has been successfully submitted and would be delivered to recipients shortly.