  • Fernando Munoz
    Jan 29, 2003
      Thanks Phillip, that solves the problem. I managed myself to find a less
      elegant but, equally effective, solution. I operates over the string passing
      the result to a second scalar that gets encoded as a string of bytes:

      my ($description, $value) = split(":",$biblio[$n]); <- These are UTF8
      my $value = sprintf("%4.2f", $value); <- Here $value goes back to a string
      of bytes
      my $lstring = length($description);
      my $newdesc = substr($description,0,$lstring); <- Here $newdesc has
      $description as a string of bytes

      After this the digests are all different and correct. It is not elegant but

      Thanks again.

      I'm guessing you'll have to somehow "cast" the UTF8 strings so that
      they're interpreted byte-by-byte, rather than character-by-character.

      Maybe try "use utf8;" and then pass utf8::encode($str) instead of $str
      to the MD5 function.

      On Wed, Jan 29, 2003 at 09:50:13AM -0800, Fernando Munoz wrote:
      > Well, there's no error logging that I can refer to, but when you try
      > to hexdec these strings (the ones coming in UTF8) no matter how
      > different the strings are, they always return the same digest.
      > Searching around I find this note :
      > "Perl 5.8 support Unicode characters in strings. Since the MD5
      > algorithm is only defined for strings of bytes, it can not be used
      > on strings that contains chars with ordinal number above 255. The
      > MD5 functions and methods will croak if you try to feed them such
      > input data:"
      > in the documentation for Digest::MD5
      > (http://search.cpan.org/author/GAAS/Digest-MD5/MD5.pm).
