Loading ...
Sorry, an error occurred while loading the content.

mod_perl unicode, cgi and binmode

Expand Messages
  • peter pilsl
    I need to process and output data delivered via a webbrowser using the CGI-interface. To deal with real unicode-data I set the whole STDIN and STDOUT to utf8
    Message 1 of 4 , Sep 27, 2004
    • 0 Attachment
      I need to process and output data delivered via a webbrowser using the
      CGI-interface.
      To deal with "real" unicode-data I set the whole STDIN and STDOUT to
      utf8 with binmode (as recommended at
      http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script would
      not work otherwise)

      While this works perfect in a standard CGI-environment it does not work
      under mod_perl. Perl reads the input from the CGI-form and does not read
      it as unicode.


      I set up a simple script, that reads lines from a textfield and prints
      out the sorted lines. (sort order according to german locale)

      As long as you only enter "standard" western chars like A-Z everything
      is fine, but as soon as you come to german umlauts, special spanish
      chars or whatever, the script produces garbage under mod_perl.

      mod_perl:
      http://www.goldfisch.at/mod_perl/unicodetest7.pl

      standard-cgi:
      http://www.customers.goldfisch.at/cgi-bin/unicodetest7.pl

      perl is 5.8.5 and mod_perl is latest 1.99_16 and apache 2.0.51.

      If somebody shows me a way how to read unicode without using binmode, I
      would be very glad too. I didnt manage to get "real" unicode without it.

      thnx a lot,
      peter

      ---------------unicodetest7.pl-------------------------------------
      #!/usr/local/bin/perl -w
      use CGI;
      use strict;

      use POSIX qw(locale_h);
      use locale;
      setlocale(LC_COLLATE, "de_AT");

      binmode(STDOUT,":utf8");
      binmode(STDIN,":utf8");

      my $query = new CGI;
      my $charset = 'UTF-8';
      $CGI::XHTML= 0;
      print
      $query->header(-charset=>$charset),$query->start_html(-title=>'Unicodetest');
      print "cgi-version = ",$CGI::VERSION," \x{263a}","<br><br>\n";

      if ($query->param('submit'))
      {
      print "your input sorted : <br><br>";

      my $si=$query->param('unicode');
      $si=~s/\r//g;
      # --- the following is to fix some unresolved CGI-problem
      my $sin='';
      foreach(0..length($si)-1) {
      $sin.=chr(ord(substr($si,$_,1)))
      };
      $si=$sin;
      #----

      foreach (sort( split(/\n/,$si))) {
      s/\r|\n//g;
      print $_;
      print "  (length=",length($_),")";
      print "  ";
      foreach my $i (0..length($_)-1) {
      print sprintf ("%04x",ord(substr($_,$i,1)))." ";
      }
      print "<br>\n";
      }
      }

      print '<br><br>enter your unicode-testtext here :
      ',$query->start_multipart_form,
      $query->textarea(-name=>'unicode',-rows=>10,-columns=>100),
      "\n<br>\n",
      $query->submit(-name=>'submit',-value=>'proceed'),"\n",
      $query->endform,"\n";
      print $query->end_html;
      ----------------------------





      --
      mag. peter pilsl
      goldfisch.at
      IT-management
      tel +43 699 1 3574035
      fax +43 699 4 3574035
      pilsl@...

      --
      Report problems: http://perl.apache.org/bugs/
      Mail list info: http://perl.apache.org/maillist/modperl.html
      List etiquette: http://perl.apache.org/maillist/email-etiquette.html
    • Markus Wichitill
      ... STDIN is not used with mod_perl. I d say, don t use CGI::param() directly, use your own param wrapper function(s) that call Encode::decode_utf8() or
      Message 2 of 4 , Sep 27, 2004
      • 0 Attachment
        peter pilsl wrote:
        > I need to process and output data delivered via a webbrowser using the
        > CGI-interface.
        > To deal with "real" unicode-data I set the whole STDIN and STDOUT to
        > utf8 with binmode (as recommended at
        > http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script would
        > not work otherwise)
        >
        > While this works perfect in a standard CGI-environment it does not work
        > under mod_perl. Perl reads the input from the CGI-form and does not read
        > it as unicode.

        STDIN is not used with mod_perl. I'd say, don't use CGI::param() directly,
        use your own param wrapper function(s) that call Encode::decode_utf8() or
        utf8::decode() for the returned values. Wrapper functions are useful anyway
        for untainting input or supporting more than one CGI input module (like
        Apache::Request in addition to CGI.pm).

        Simplified example:

        sub param
        {
        my $str = undef;
        if (MODPERL) { $str = $apr->param(shift()) }
        else { $str = $cgi->param(shift()) }
        utf8::decode($str);
        return $str;
        }

        --
        Report problems: http://perl.apache.org/bugs/
        Mail list info: http://perl.apache.org/maillist/modperl.html
        List etiquette: http://perl.apache.org/maillist/email-etiquette.html
      • Stas Bekman
        ... It depends on how you write your program. When you don t qualify your read and print calls with $r, then you do use STDIN, though mod_perl overrides it,
        Message 3 of 4 , Sep 27, 2004
        • 0 Attachment
          Markus Wichitill wrote:
          > peter pilsl wrote:
          >
          >> I need to process and output data delivered via a webbrowser using the
          >> CGI-interface.
          >> To deal with "real" unicode-data I set the whole STDIN and STDOUT to
          >> utf8 with binmode (as recommended at
          >> http://www.perldoc.com/perl5.8.0/pod/perluniintro.html. My script
          >> would not work otherwise)
          >>
          >> While this works perfect in a standard CGI-environment it does not
          >> work under mod_perl. Perl reads the input from the CGI-form and does
          >> not read it as unicode.
          >
          >
          > STDIN is not used with mod_perl.

          It depends on how you write your program. When you don't qualify your read
          and print calls with $r, then you do use STDIN, though mod_perl overrides
          it, and does the qualified $r->read() calls behind the scenes (via the
          perlio layer), but essentially mod_cgi and mod_perl do exactly the same
          thing at the end. If you turn the binmode inside your script, I think it
          should work just fine, since the perlio layer subclasses
          PerlIOBase_binmode, which is supposed to do the right thing. You can find
          a few examples of its usage in the modperl test suite (in the source
          package), just grep for 'binmode'.

          > I'd say, don't use CGI::param()
          > directly, use your own param wrapper function(s) that call
          > Encode::decode_utf8() or utf8::decode() for the returned values. Wrapper
          > functions are useful anyway for untainting input or supporting more than
          > one CGI input module (like Apache::Request in addition to CGI.pm).
          >
          > Simplified example:
          >
          > sub param
          > {
          > my $str = undef;
          > if (MODPERL) { $str = $apr->param(shift()) }
          > else { $str = $cgi->param(shift()) }
          > utf8::decode($str);
          > return $str;
          > }

          Boris (CC'ed) has started a similar discussion on the modperl dev list,
          which is now redirected to the apreq list (the home of Apache::Request),
          Boris is going to post the details on how to make Apache::Request handle
          unicode/utf8 transparently for the users. I can't see the post yet, but it
          should happen soon. subscribe to the apreq-dev-subscribe@...
          if you want to take part in that discussion.

          --
          __________________________________________________________________
          Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
          http://stason.org/ mod_perl Guide ---> http://perl.apache.org
          mailto:stas@... http://use.perl.org http://apacheweek.com
          http://modperlbook.org http://apache.org http://ticketmaster.com

          --
          Report problems: http://perl.apache.org/bugs/
          Mail list info: http://perl.apache.org/maillist/modperl.html
          List etiquette: http://perl.apache.org/maillist/email-etiquette.html
        • Markus Wichitill
          ... Yes, I was just talking about his example, which uses CGI.pm, which in turn gets its input from $r- args and $r- read under mod_perl, so binmode(STDIN)
          Message 4 of 4 , Sep 27, 2004
          • 0 Attachment
            Stas Bekman wrote:
            >> STDIN is not used with mod_perl.
            >
            > It depends on how you write your program. When you don't qualify your
            > read and print calls with $r, then you do use STDIN, though mod_perl
            > overrides it, and does the qualified $r->read() calls behind the scenes
            > (via the perlio layer), but essentially mod_cgi and mod_perl do exactly
            > the same thing at the end. If you turn the binmode inside your script, I
            > think it should work just fine, since the perlio layer subclasses
            > PerlIOBase_binmode, which is supposed to do the right thing.

            Yes, I was just talking about his example, which uses CGI.pm, which in turn
            gets its input from $r->args and $r->read under mod_perl, so binmode(STDIN)
            won't help.

            > Boris (CC'ed) has started a similar discussion on the modperl dev list,
            > which is now redirected to the apreq list (the home of Apache::Request),
            > Boris is going to post the details on how to make Apache::Request handle
            > unicode/utf8 transparently for the users. I can't see the post yet, but
            > it should happen soon. subscribe to the
            > apreq-dev-subscribe@... if you want to take part in that
            > discussion.

            I've seen the discussion, but I'm not really interested in UTF-8 support for
            either APR::Table or Apache::Request, since I don't put my own strings in
            APR tables and I use param wrapper functions anyway. Calling utf8::decode()
            for a few parameters is no big deal. I'm more interested in the UTF-8
            support that's hopefully coming with DBI 1.44, since rewriting hundreds of
            strings in hashes fetched via DBI would be much more of a performance issue.

            --
            Report problems: http://perl.apache.org/bugs/
            Mail list info: http://perl.apache.org/maillist/modperl.html
            List etiquette: http://perl.apache.org/maillist/email-etiquette.html
          Your message has been successfully submitted and would be delivered to recipients shortly.