Loading ...
Sorry, an error occurred while loading the content.

xml::parser::lite and entities

Expand Messages
  • mrdamnfrenchy@yahoo.com
    Along with the variable $SOAP::Constants::DO_NOT_USE_XML_PARSER, which forces the use of XML::Parser::Lite, would it make sense to have an option
    Message 1 of 3 , Nov 28, 2001
    • 0 Attachment
      Along with the variable $SOAP::Constants::DO_NOT_USE_XML_PARSER,
      which forces the use of XML::Parser::Lite, would it make sense to
      have an option DECODE_ENTITIES_AFTER_PARSING or something like that?

      This would run HTML::Entities's decode_entities on each string that
      is returned.

      This could be an option of XML::Parser::Lite instead of a SOAP::Lite
      option.

      I'm willing to do the actual code modification...

      -Mathieu
    • Paul Kulchenko
      Hi, Mathieu! I seriously thought about it. Shouldn t be difficult, but the problem is that it s too late to do that on application level, it should be done by
      Message 2 of 3 , Nov 28, 2001
      • 0 Attachment
        Hi, Mathieu!

        I seriously thought about it. Shouldn't be difficult, but the problem
        is that it's too late to do that on application level, it should be
        done by parser. Consider two cases:

        escaped &

        <a><!CDATA[&]]></a>

        and

        <a>&amp;</a>

        In first case application will get string '&' which doesn't need
        to be unescaped and in second case it'll be string '&amp;' that
        needs to be processsed. How application will know which one to
        process? Either both or none. I'm thinking about solving problem on
        Parser level, but didn't find nice solution yet. I do have though
        another version of regexp-based parser (XML::ReParser) that doesn't
        have this limitation and will be able to decode entities. There is
        also XML::Parser::PurePerl (SAX parser from Matt Sergeant) that
        should be possible to use with SOAP::Lite with XML::SAX::Expat module
        (I didn't test it though).

        Considering this I would rather finish and release XML::ReParser that
        will make work as expected, than spend time on incomplete solution.

        > I'm willing to do the actual code modification...
        Thank you for your kind offer? Any experience with DTD processing?
        ;)This part (as well as a couple of other things) is missed from
        XML::ReParser.

        Best wishes, Paul.

        --- mrdamnfrenchy@... wrote:
        > Along with the variable $SOAP::Constants::DO_NOT_USE_XML_PARSER,
        > which forces the use of XML::Parser::Lite, would it make sense to
        > have an option DECODE_ENTITIES_AFTER_PARSING or something like
        > that?
        >
        > This would run HTML::Entities's decode_entities on each string that
        > is returned.
        >
        > This could be an option of XML::Parser::Lite instead of a
        > SOAP::Lite
        > option.
        >
        > I'm willing to do the actual code modification...
        >
        > -Mathieu
        >
        >
        > ------------------------ Yahoo! Groups Sponsor
        >
        > To unsubscribe from this group, send an email to:
        > soaplite-unsubscribe@yahoogroups.com
        >
        >
        >
        > Your use of Yahoo! Groups is subject to
        > http://docs.yahoo.com/info/terms/
        >
        >


        __________________________________________________
        Do You Yahoo!?
        Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
        http://geocities.yahoo.com/ps/info1
      • Mathieu Longtin
        My issue with the application level decoding is that when I pass a string to SOAP, I don t encode it. I let SOAP::Lite decide what to do with it. On the
        Message 3 of 3 , Nov 29, 2001
        • 0 Attachment
          My issue with the application level decoding is that when I
          pass a string to SOAP, I don't encode it. I let SOAP::Lite
          decide what to do with it. On the receiving end, SOAP::Lite
          should probably decode what it encoded.

          So, I'm proposing to change the as_string method in
          SOAP::XMLSchemaSOAP1_1::Deserializer. It would decode <, &
          and the \0xD (if the variable is set).

          From your code, it looks like you encode string no matter
          what, even if & and < are encoded already. So it would make
          sense to reverse that on the reading end. I just don't how
          compatible that is with other SOAP implementation. Do they
          all encode & and <?

          Speaking of which, could I not just override
          SOAP::XMLSchemaSOAP1_1::Deserializer::as_string in my
          application?

          My problem is that I send large strings (> 1MB) using
          SOAP::Lite, and I'm bogged down by XML::Parser's insistence
          on breaking lines apart. XML::Parser::Lite takes 0.3 secs
          where XML::Parser takes over five minutes (I never fully
          timed it, it's just ridiculous).

          As far as DTD parsing, I did a bit of it, parsing
          dictionary entries, but I have no clue about UDDI or WSDL,
          which is probably why you need DTD parsing.

          -Mathieu

          --- Paul Kulchenko <paulclinger@...> wrote:
          > Hi, Mathieu!
          >
          > I seriously thought about it. Shouldn't be difficult, but
          > the problem
          > is that it's too late to do that on application level, it
          > should be
          > done by parser. Consider two cases:
          >
          > escaped &
          >
          > <a><!CDATA[&]]></a>
          >
          > and
          >
          > <a>&</a>
          >
          > In first case application will get string '&' which
          > doesn't need
          > to be unescaped and in second case it'll be string
          > '&' that
          > needs to be processsed. How application will know which
          > one to
          > process? Either both or none. I'm thinking about solving
          > problem on
          > Parser level, but didn't find nice solution yet. I do
          > have though
          > another version of regexp-based parser (XML::ReParser)
          > that doesn't
          > have this limitation and will be able to decode entities.
          > There is
          > also XML::Parser::PurePerl (SAX parser from Matt
          > Sergeant) that
          > should be possible to use with SOAP::Lite with
          > XML::SAX::Expat module
          > (I didn't test it though).
          >
          > Considering this I would rather finish and release
          > XML::ReParser that
          > will make work as expected, than spend time on incomplete
          > solution.
          >
          > > I'm willing to do the actual code modification...
          > Thank you for your kind offer? Any experience with DTD
          > processing?
          > ;)This part (as well as a couple of other things) is
          > missed from
          > XML::ReParser.
          >
          > Best wishes, Paul.
          >
          > --- mrdamnfrenchy@... wrote:
          > > Along with the variable
          > $SOAP::Constants::DO_NOT_USE_XML_PARSER,
          > > which forces the use of XML::Parser::Lite, would it
          > make sense to
          > > have an option DECODE_ENTITIES_AFTER_PARSING or
          > something like
          > > that?
          > >
          > > This would run HTML::Entities's decode_entities on each
          > string that
          > > is returned.
          > >
          > > This could be an option of XML::Parser::Lite instead of
          > a
          > > SOAP::Lite
          > > option.
          > >
          > > I'm willing to do the actual code modification...
          > >
          > > -Mathieu
          > >
          > >
          > > ------------------------ Yahoo! Groups Sponsor
          > >
          > > To unsubscribe from this group, send an email to:
          > > soaplite-unsubscribe@yahoogroups.com
          > >
          > >
          > >
          > > Your use of Yahoo! Groups is subject to
          > > http://docs.yahoo.com/info/terms/
          > >
          > >
          >
          >
          > __________________________________________________
          > Do You Yahoo!?
          > Yahoo! GeoCities - quick and easy web site hosting, just
          > $8.95/month.
          > http://geocities.yahoo.com/ps/info1




          __________________________________________________
          Do You Yahoo!?
          Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
          http://geocities.yahoo.com/ps/info1
        Your message has been successfully submitted and would be delivered to recipients shortly.