Loading ...
Sorry, an error occurred while loading the content.
 

Re: [soaplite] xml::parser::lite and entities

Expand Messages
  • Paul Kulchenko
    Hi, Mathieu! I seriously thought about it. Shouldn t be difficult, but the problem is that it s too late to do that on application level, it should be done by
    Message 1 of 3 , Nov 28, 2001
      Hi, Mathieu!

      I seriously thought about it. Shouldn't be difficult, but the problem
      is that it's too late to do that on application level, it should be
      done by parser. Consider two cases:

      escaped &

      <a><!CDATA[&]]></a>

      and

      <a>&amp;</a>

      In first case application will get string '&' which doesn't need
      to be unescaped and in second case it'll be string '&amp;' that
      needs to be processsed. How application will know which one to
      process? Either both or none. I'm thinking about solving problem on
      Parser level, but didn't find nice solution yet. I do have though
      another version of regexp-based parser (XML::ReParser) that doesn't
      have this limitation and will be able to decode entities. There is
      also XML::Parser::PurePerl (SAX parser from Matt Sergeant) that
      should be possible to use with SOAP::Lite with XML::SAX::Expat module
      (I didn't test it though).

      Considering this I would rather finish and release XML::ReParser that
      will make work as expected, than spend time on incomplete solution.

      > I'm willing to do the actual code modification...
      Thank you for your kind offer? Any experience with DTD processing?
      ;)This part (as well as a couple of other things) is missed from
      XML::ReParser.

      Best wishes, Paul.

      --- mrdamnfrenchy@... wrote:
      > Along with the variable $SOAP::Constants::DO_NOT_USE_XML_PARSER,
      > which forces the use of XML::Parser::Lite, would it make sense to
      > have an option DECODE_ENTITIES_AFTER_PARSING or something like
      > that?
      >
      > This would run HTML::Entities's decode_entities on each string that
      > is returned.
      >
      > This could be an option of XML::Parser::Lite instead of a
      > SOAP::Lite
      > option.
      >
      > I'm willing to do the actual code modification...
      >
      > -Mathieu
      >
      >
      > ------------------------ Yahoo! Groups Sponsor
      >
      > To unsubscribe from this group, send an email to:
      > soaplite-unsubscribe@yahoogroups.com
      >
      >
      >
      > Your use of Yahoo! Groups is subject to
      > http://docs.yahoo.com/info/terms/
      >
      >


      __________________________________________________
      Do You Yahoo!?
      Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
      http://geocities.yahoo.com/ps/info1
    • Mathieu Longtin
      My issue with the application level decoding is that when I pass a string to SOAP, I don t encode it. I let SOAP::Lite decide what to do with it. On the
      Message 2 of 3 , Nov 29, 2001
        My issue with the application level decoding is that when I
        pass a string to SOAP, I don't encode it. I let SOAP::Lite
        decide what to do with it. On the receiving end, SOAP::Lite
        should probably decode what it encoded.

        So, I'm proposing to change the as_string method in
        SOAP::XMLSchemaSOAP1_1::Deserializer. It would decode <, &
        and the \0xD (if the variable is set).

        From your code, it looks like you encode string no matter
        what, even if & and < are encoded already. So it would make
        sense to reverse that on the reading end. I just don't how
        compatible that is with other SOAP implementation. Do they
        all encode & and <?

        Speaking of which, could I not just override
        SOAP::XMLSchemaSOAP1_1::Deserializer::as_string in my
        application?

        My problem is that I send large strings (> 1MB) using
        SOAP::Lite, and I'm bogged down by XML::Parser's insistence
        on breaking lines apart. XML::Parser::Lite takes 0.3 secs
        where XML::Parser takes over five minutes (I never fully
        timed it, it's just ridiculous).

        As far as DTD parsing, I did a bit of it, parsing
        dictionary entries, but I have no clue about UDDI or WSDL,
        which is probably why you need DTD parsing.

        -Mathieu

        --- Paul Kulchenko <paulclinger@...> wrote:
        > Hi, Mathieu!
        >
        > I seriously thought about it. Shouldn't be difficult, but
        > the problem
        > is that it's too late to do that on application level, it
        > should be
        > done by parser. Consider two cases:
        >
        > escaped &
        >
        > <a><!CDATA[&]]></a>
        >
        > and
        >
        > <a>&</a>
        >
        > In first case application will get string '&' which
        > doesn't need
        > to be unescaped and in second case it'll be string
        > '&' that
        > needs to be processsed. How application will know which
        > one to
        > process? Either both or none. I'm thinking about solving
        > problem on
        > Parser level, but didn't find nice solution yet. I do
        > have though
        > another version of regexp-based parser (XML::ReParser)
        > that doesn't
        > have this limitation and will be able to decode entities.
        > There is
        > also XML::Parser::PurePerl (SAX parser from Matt
        > Sergeant) that
        > should be possible to use with SOAP::Lite with
        > XML::SAX::Expat module
        > (I didn't test it though).
        >
        > Considering this I would rather finish and release
        > XML::ReParser that
        > will make work as expected, than spend time on incomplete
        > solution.
        >
        > > I'm willing to do the actual code modification...
        > Thank you for your kind offer? Any experience with DTD
        > processing?
        > ;)This part (as well as a couple of other things) is
        > missed from
        > XML::ReParser.
        >
        > Best wishes, Paul.
        >
        > --- mrdamnfrenchy@... wrote:
        > > Along with the variable
        > $SOAP::Constants::DO_NOT_USE_XML_PARSER,
        > > which forces the use of XML::Parser::Lite, would it
        > make sense to
        > > have an option DECODE_ENTITIES_AFTER_PARSING or
        > something like
        > > that?
        > >
        > > This would run HTML::Entities's decode_entities on each
        > string that
        > > is returned.
        > >
        > > This could be an option of XML::Parser::Lite instead of
        > a
        > > SOAP::Lite
        > > option.
        > >
        > > I'm willing to do the actual code modification...
        > >
        > > -Mathieu
        > >
        > >
        > > ------------------------ Yahoo! Groups Sponsor
        > >
        > > To unsubscribe from this group, send an email to:
        > > soaplite-unsubscribe@yahoogroups.com
        > >
        > >
        > >
        > > Your use of Yahoo! Groups is subject to
        > > http://docs.yahoo.com/info/terms/
        > >
        > >
        >
        >
        > __________________________________________________
        > Do You Yahoo!?
        > Yahoo! GeoCities - quick and easy web site hosting, just
        > $8.95/month.
        > http://geocities.yahoo.com/ps/info1




        __________________________________________________
        Do You Yahoo!?
        Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
        http://geocities.yahoo.com/ps/info1
      Your message has been successfully submitted and would be delivered to recipients shortly.