Loading ...
Sorry, an error occurred while loading the content.

Re: [radio-dev] Double-decoding bug

Expand Messages
  • Jake Savin
    I did some investigation, and it looks like the fix should be in xml.rss.decodeString. It calls xml.entitydecode with flAlphaEntities as true, which causes
    Message 1 of 10 , Sep 19, 2002
    • 0 Attachment
      I did some investigation, and it looks like the fix should be in
      xml.rss.decodeString. It calls xml.entitydecode with flAlphaEntities as
      true, which causes & < and > to be decoded a second time.
      (They were already decoded once by xml.complie.)

      I think the fix is as follows: In xml.rss.decodeString, change the
      following lines:

      s = string.replaceAll (s, "'", "'"); //work around a bug in
      xml.entityDecode
      s = xml.entitydecode (s, true, true)

      ...to:

      s = string.replaceAll (s, "'", "'");
      s = string.replaceAll (s, """, "\"");
      s = xml.entitydecode (s, false, true) //false -- don't decode &
      > and <

      Note the additional decoding of " to the quote character. While
      xml.compile does decode < > and &, it is not aware of the
      " and ' entities.

      On the other hand, xml.entityDecode was taking care of " for us
      (but not ') when called with flAlphaEntities as true, but as
      Matthew noted, that caused the double-decoding bug.

      So -- now that we're not passing flAlphaEntities as true, we have to
      decode " in addition to ' in xml.rss.decodeString.

      As far as I can tell, the above change will fix the double-decoding
      bug. I have not tested it yet.

      -Jake

      On Wednesday, September 18, 2002, at 05:28 PM, Dave Winer wrote:

      > Jake, I'm interested in nabbing the double-decoding problem in the
      > aggregator. Check this out.
      >
      > http://radio.weblogs.com/0105501/2002/06/29.html#a52
      >
      > He seems to have a pretty good idea where the problem is. Right in the
      > code
      > I'm working on.
      >
      > Any thoughts??
      >
      > Dave
      >
      >
      > ------------------------ Yahoo! Groups Sponsor
      > ---------------------~-->
      > Plan to Sell a Home?
      > http://us.click.yahoo.com/J2SnNA/y.lEAA/MVfIAA/nhFolB/TM
      > ---------------------------------------------------------------------
      > ~->
      >
      > To unsubscribe from this group, send an email to:
      > radio-dev-unsubscribe@yahoogroups.com
      >
      >
      >
      > Your use of Yahoo! Groups is subject to
      > http://docs.yahoo.com/info/terms/
      >
      >
    • Andre Radke
      ... Coincidentally, I recently stumbled into that same encoding hell while trying to send RPI messages via Jabber using Frontier/Radio on both ends. My test
      Message 2 of 10 , Sep 20, 2002
      • 0 Attachment
        At 11:43h -0400 19. Sep 2002, Jeremy Bowers wrote:
        >When I was working on the encoding issues for tcp.im, I found that you
        >can basically boil down the "correct encoding" question to two test cases:
        >
        >1. Does "&amp;" work correctly? [..]
        >
        >2. Does "<somecrappytag></somecrappytag>" work correctly? [..]
        >
        >Basically, it all boils down to that.
        >
        >Thought boiling down the test cases might help. I spent several hours in
        >encoding hell the night I got this right for RPI; hopefully someone else
        >can benefit from that ;-)

        Coincidentally, I recently stumbled into that same encoding hell while trying to send RPI messages via Jabber using Frontier/Radio on both ends.

        My test setup consisted of Frontier 8.0.5 for OS X on the sending end and Radio 8.0.6 for OS X on the receiving end. Both machines connected to jabberd 1.4.2 running on an OS X machine on my LAN.

        Here's the script I used for testing per Jeremy's two questions from above:

        tcp.im.rpi.client ("jabber", "noodles@...", "examples.msg", {"<a>&amp;</a>"})

        After running the script, nothing showed up in the About Radio window on the receiving machine. According to the instant messaging log, that was because none of the responders handled the message. So I set user.im.jabber.prefs.flDebug to true to enable debugging and added a bit of code to the condition script of the RPI responder to capture the message. Then I ran the script again. Here is the bit of XML that went over the wire:

        <message from="spicy@.../Frontier" to="noodles@..." type="chat">
        <body>&amp;lt;?xml version=&quot;1.0&quot;?&amp;gt;&amp;lt;methodCall&amp;gt;&amp;lt;methodName&amp;gt;examples.msg&amp;lt;/methodName&amp;gt;&amp;lt;params&amp;gt;&amp;lt;param&amp;gt;&amp;lt;value&amp;gt;&amp;lt;string&amp;gt;PGE+JmFtcDthbXA7PC9hPg==&amp;lt;/string&amp;gt;&amp;lt;/value&amp;gt;&amp;lt;/param&amp;gt;&amp;lt;/params&amp;gt;&amp;lt;/methodCall&amp;gt;</body>
        </message>

        On the receiving end, this is what the message payload looked like when tcp.im.builtinResponders.rpi.condition got a shot at it:

        <?xml version="1.0"?><methodCall><methodName>examples.msg</methodName><params><param><value><string>PGE+JmFtcDthbXA7PC9hPg==</string></value></param></params></methodCall>

        Since the condition script expects the XML tags to be fully decoded, the RPI responder did not get to handle the message. [*]


        At first, I assumed that the decoding logic on the receiving end must be broken but my current theory is that the encoding logic on the sending end is actually to blame.

        Specifically, tcp.im.builtinDrivers.jabber.code.messages.message builds the message as an XML table and adds the payload after entity encoding it. Then it passes the address of the XML table in to tcp.im.builtinDrivers.jabber.code.writeXML as the first parameter. This script accepts an XML table or the address of an XML table or a valid XML string as its first parameter. An XML string will just be pushed out onto the wire unmodified while XML tables are first run through xml.decompile which applies a second round of entity encoding internally. This second round of entity encoding is obviously not being expected by the receiving end. It is also unneccessary as far as I can tell.


        My proposed fix is to change the second parameter of the xml.entityEncode call in tcp.im.builtinDrivers.jabber.code.messages.message from true to false:

        text = xml.entityEncode (text, false)

        After making this change and running my test script again, this is what went out over the wire:

        <message from="spicy@.../Frontier" to="noodles@..." type="chat">
        <body><?xml version="1.0"?><methodCall><methodName>examples.msg</methodName><params><param><value><string>PGE+JmFtcDthbXA7PC9hPg==</string></value></param></params></methodCall></body>
        </message>

        This looked much more sensible to me. On the receiving end, the message payload looked like this when the condition script for the RPI responder got called:

        <?xml version="1.0"?><methodCall><methodName>examples.msg</methodName><params><param><value><string>PGE+JmFtcDthbXA7PC9hPg==</string></value></param></params></methodCall>

        The condition script recognized this as an RPI message and the responder did its thing. Problem solved.

        I have not discovered any downsides to my proposed fix, so I am very interested to hear the resident expert's opinion on this. Jeremy?

        -Andre




        [*] At this point, tcp.im.server on the receiving machine responds with a "Sorry, I don't understand that." message. However, when the original sender receives this human-readable error message, there is no responder that deals with the message and it responds with the same message in turn. Henceforth, the two machines happily exchange a continuous stream of messages saying that they don't understand each other. I commented out the last line of code in tcp.im.server during testing to avoid this.



        --
        Andre Radke + mailto:lists@... + http://www.spicynoodles.net/
      • Jeremy Bowers
        ... This works on my system. Also, you actually *aren t* testing encoding when you do that. In RPI, all strings are base64 encoded before going on to the IM
        Message 3 of 10 , Sep 22, 2002
        • 0 Attachment
          On Fri, 2002-09-20 at 16:18, Andre Radke wrote:
          > Coincidentally, I recently stumbled into that same encoding hell while trying to send RPI messages via Jabber using Frontier/Radio on both ends.
          >
          > My test setup consisted of Frontier 8.0.5 for OS X on the sending end and Radio 8.0.6 for OS X on the receiving end. Both machines connected to jabberd 1.4.2 running on an OS X machine on my LAN.
          >
          > Here's the script I used for testing per Jeremy's two questions from above:
          >
          > tcp.im.rpi.client ("jabber", "noodles@...", "examples.msg", {"<a>&amp;</a>"})
          >
          > After running the script, nothing showed up in the About Radio window on the receiving machine. According to the instant messaging log, that was because none of the responders handled the message. So I set user.im.jabber.prefs.flDebug to true to enable debugging and added a bit of code to the condition script of the RPI responder to capture the message. Then I ran the script again. Here is the bit of XML that went over the wire:

          This works on my system.

          Also, you actually *aren't* testing encoding when you do that. In RPI,
          all strings are base64 encoded before going on to the IM network. AIM
          sends HTML over the wire, so the server feels free to re-format your
          HTML at will, including adding \n and the like. This is ok for
          human-to-human IM, which is essentially whitespace insensitive, but many
          obvious uses of RPI require the RPI system to guarentee that the strings
          are sent byte-for-byte. (file names, Frontier addresses or things that
          eventually end up as addresses like RSS channel names in
          aggregatorData.services, etc.)

          Thus, all strings are base64 encoded to avoid this problem. This is the
          one change to the XML-RPC spec I had to make, other then the big obvious
          one of not returning anything.

          > <message from="spicy@.../Frontier" to="noodles@..." type="chat">
          > <body>&amp;lt;?xml version=&quot;1.0&quot;?&amp;gt;&amp;lt;methodCall&amp;gt;&amp;lt;methodName&amp;gt;examples.msg&amp;lt;/methodName&amp;gt;&amp;lt;params&amp;gt;&amp;lt;param&amp;gt;&amp;lt;value&amp;gt;&amp;lt;string&amp;gt;PGE+JmFtcDthbXA7PC9hPg==&amp;lt;/string&amp;gt;&amp;lt;/value&amp;gt;&amp;lt;/param&amp;gt;&amp;lt;/params&amp;gt;&amp;lt;/methodCall&amp;gt;</body>
          > </message>
          >
          > On the receiving end, this is what the message payload looked like when tcp.im.builtinResponders.rpi.condition got a shot at it:
          >
          > <?xm...

          When I put a dialog.alert(message) at the top of
          user.im.responders.rpi.condition, I see correct XML.

          I know I have the updated tcp.im, because I can see comments from AR and
          JES in my code.

          > My proposed fix is to change the second parameter of the xml.entityEncode call in tcp.im.builtinDrivers.jabber.code.messages.message from true to false:
          >
          > text = xml.entityEncode (text, false)
          >
          > After making this change and running my test script again, this is what went out over the wire:
          >
          > ...
          >
          > I have not discovered any downsides to my proposed fix, so I am very interested to hear the resident expert's opinion on this. Jeremy?

          There's two problems here. One is that RPI is working on my system, and
          not on yours, when they should be identical. This difference is a
          problem.

          The second problem is related to one you identify. It does seem to me
          now that Jabber is going one time too many through decoding and
          encoding. In theory, the fix is to *remove* the xml.entityEncode call
          from messages.message, and also to remove

          body = tcp.im.builtinDrivers.jabber.code.xmlAlphaDecode(body)
          body = xml.entityDecode(body)

          from handlers.message. In theory, this will not affect the rest of the
          tcp.im system because it's strictly internal to Jabber.

          The problem is the XML verbs are giving me problems. If I send a message
          from my human Jabber client like this:

          ''

          The XML comes in like this:

          <body>&apos;'</body>

          correctly, but xml.compile leaves me with

          ''

          as the contents of the body tag, which has lost information and there's
          nothing I can do about it in my code. So in reality, the fix doesn't
          work at all. I've added '' to the testing suite on my system.

          I don't think there's an easy fix for this. Either I'll need fixed xml
          verbs or I'll need to pass around the original text of the message so I
          can extract it with string operations. Either one is a serious task.

          > [*] At this point, tcp.im.server on the receiving machine responds with a "Sorry, I don't understand that." message. However, when the original sender receives this human-readable error message, there is no responder that deals with the message and it responds with the same message in turn. Henceforth, the two machines happily exchange a continuous stream of messages saying that they don't understand each other. I commented out the last line of code in tcp.im.server during testing to avoid this.

          I think we ought to add a special responder that looks for this message
          and eats it silently. I had this while testing but took it out for some
          reason a long time ago.
        • Jeremy Bowers
          ... Then I d recommend that be done. ... Radio Userland 8.0.8 on PC, communicating with itself. (I see no particular value in having two machines for Jabber
          Message 4 of 10 , Sep 22, 2002
          • 0 Attachment
            On Sun, 2002-09-22 at 16:57, Andre Radke wrote:
            > At 8:25h -0500 22. Sep 2002, Jeremy Bowers wrote:
            > >In RPI, all strings are base64 encoded before going on to the IM network.
            >
            > Based on the code at tcp.im.rpi.encode, this does not seem to be true
            > for strings contained in lists or tables though. That could be dealt
            > with by adding an optional boolean parameter to the
            > xml.coercions.frontierValueToTaggedText kernel value that would allow
            > you to turn on base64-encoding for all strings.

            Then I'd recommend that be done.

            > >There's two problems here. One is that RPI is working on my system, and
            > >not on yours, when they should be identical. This difference is a
            > >problem.
            >
            > Could you describe your test setup so that I can try to reproduce it here?

            Radio Userland 8.0.8 on PC, communicating with itself. (I see no
            particular value in having two machines for Jabber testing because as
            long as you cycle out to the server, it should all be the same. I hope
            that's not the difference.)

            There shouldn't be any OS differences, either. I don't see any recent
            changenotes in Frontier for XML verb updates... I really don't know
            where the difference is.

            Later I'll try a fresh install myself, though I know I didn't make any
            changes to the tcp.im code to make it work so that shouldn't matter...
            then, ONE of these things that "shouldn't matter" is going to...

            > The problem is that the text of that message is actually
            > user-configurable since it is stored at user.im.prefs.sorryMessage,
            > so there's currently no way for the receiving end to reliably detect
            > it. IMHO, it makes more sense to just not reply with an error message
            > at all.

            Ah yes, that's why I removed it, I think. ;-)

            And thanks for taking a look at this stuff. I appreciate it.
          • Andre Radke
            ... It still doesn t on mine. (see below) ... Heh, good point. ... Based on the code at tcp.im.rpi.encode, this does not seem to be true for strings contained
            Message 5 of 10 , Sep 22, 2002
            • 0 Attachment
              At 8:25h -0500 22. Sep 2002, Jeremy Bowers wrote:
              >On Fri, 2002-09-20 at 16:18, Andre Radke wrote:
              > > Here's the script I used for testing per Jeremy's two questions from
              >> above:
              >>
              >> tcp.im.rpi.client ("jabber", "noodles@...",
              > > "examples.msg", {"<a>&amp;</a>"})
              >[..]
              >This works on my system.

              It still doesn't on mine. (see below)

              >Also, you actually *aren't* testing encoding when you do that.

              Heh, good point.

              >In RPI, all strings are base64 encoded before going on to the IM network.

              Based on the code at tcp.im.rpi.encode, this does not seem to be true
              for strings contained in lists or tables though. That could be dealt
              with by adding an optional boolean parameter to the
              xml.coercions.frontierValueToTaggedText kernel value that would allow
              you to turn on base64-encoding for all strings.

              >There's two problems here. One is that RPI is working on my system, and
              >not on yours, when they should be identical. This difference is a
              >problem.

              Okay, I did some more testing. I did fresh installs of Frontier 9.0
              on two OS X machines and got the latest Frontier.root updates.
              Running my test script, I still get the same result as described in
              my previous message, i.e. one round of entity decoding remains to be
              done on the message body when it gets to
              tcp.im.builtinResponders.rpi.condition and therefore the RPI
              responders doesn't process the message.

              Could you describe your test setup so that I can try to reproduce it here?


              > > [*] At this point, tcp.im.server on the receiving machine responds
              >> with a "Sorry, I don't understand that." message. However, when the
              >> original sender receives this human-readable error message, there is no
              >> responder that deals with the message and it responds with the same
              >> message in turn. Henceforth, the two machines happily exchange a
              >> continuous stream of messages saying that they don't understand each
              >> other. I commented out the last line of code in tcp.im.server during
              >> testing to avoid this.
              >
              >I think we ought to add a special responder that looks for this message
              >and eats it silently.

              The problem is that the text of that message is actually
              user-configurable since it is stored at user.im.prefs.sorryMessage,
              so there's currently no way for the receiving end to reliably detect
              it. IMHO, it makes more sense to just not reply with an error message
              at all.

              -Andre



              --
              Andre Radke + mailto:lists@... + http://www.spicynoodles.net/
            • Jeremy Bowers
              ... Thanks for the info. In the meantime, lets all pretend I made a principled decision with the Jabber encoding/decoding. It has the virtue of possibly being
              Message 6 of 10 , Sep 22, 2002
              • 0 Attachment
                On Sun, 2002-09-22 at 22:05, Jake Savin wrote:
                > ps. In the longer run, there may be some things we'll want to fix in
                > the kernel's XML compiler/decompiler through the use of optional
                > parameters, but right now I'm focusing any work I do at that level on
                > fixing crashing bugs.

                Thanks for the info.

                In the meantime, lets all pretend I made a principled decision with the
                Jabber encoding/decoding. It has the virtue of possibly being the truth.
                ;-)

                At least with the current setup, A: Jabber never crashes the stream and
                B: Most people *don't* ship around crazy XML stuff like &amp;, so it
                works as correctly as possible. I think A in particular was a priority
                for me.

                I hope that works for you, Andre.
              • Jake Savin
                Hi Andre and Jeremy, I did a little digging, and it turns out that I had made a fix to tcp.rmi.encode so that strings in sub-tables would be base64-encoded,
                Message 7 of 10 , Sep 22, 2002
                • 0 Attachment
                  Hi Andre and Jeremy,

                  I did a little digging, and it turns out that I had made a fix to
                  tcp.rmi.encode so that strings in sub-tables would be base64-encoded,
                  and that this fix had been released for Radio, but not for Frontier. I
                  released the part for Frontier, so after updating Frontier.root, you
                  should see that strings inside tables will be properly encoded in RPI
                  requests.

                  In fact, this fix probably skirts the entity-encoding/decoding issue
                  altogether, since base64 strings won't include any of the problematic
                  characters that cause the problem in the first place.

                  Please let me know if this fixes the problem.

                  Thanks,
                  -Jake

                  ps. In the longer run, there may be some things we'll want to fix in
                  the kernel's XML compiler/decompiler through the use of optional
                  parameters, but right now I'm focusing any work I do at that level on
                  fixing crashing bugs.

                  On Sunday, September 22, 2002, at 02:57 PM, Andre Radke wrote:

                  >> In RPI, all strings are base64 encoded before going on to the IM
                  >> network.
                  >
                  > Based on the code at tcp.im.rpi.encode, this does not seem to be true
                  > for strings contained in lists or tables though. That could be dealt
                  > with by adding an optional boolean parameter to the
                  > xml.coercions.frontierValueToTaggedText kernel value that would allow
                  > you to turn on base64-encoding for all strings.
                  >
                  >> There's two problems here. One is that RPI is working on my system,
                  >> and
                  >> not on yours, when they should be identical. This difference is a
                  >> problem.
                  >
                  > Okay, I did some more testing. I did fresh installs of Frontier 9.0
                  > on two OS X machines and got the latest Frontier.root updates.
                  > Running my test script, I still get the same result as described in
                  > my previous message, i.e. one round of entity decoding remains to be
                  > done on the message body when it gets to
                  > tcp.im.builtinResponders.rpi.condition and therefore the RPI
                  > responders doesn't process the message.
                • Andre Radke
                  ... Right, further testing didn t turn up any significant differences between platforms or between using two machines vs. just one machine talking to itself. I
                  Message 8 of 10 , Sep 23, 2002
                  • 0 Attachment
                    At 14:39 -0500 22. Sep 2002, Jeremy Bowers wrote:
                    >On Sun, 2002-09-22 at 16:57, Andre Radke wrote:
                    > > At 8:25h -0500 22. Sep 2002, Jeremy Bowers wrote:
                    > > >There's two problems here. One is that RPI is working on my system, and
                    >> >not on yours, when they should be identical. This difference is a
                    >> >problem.
                    >>
                    >> Could you describe your test setup so that I can try to reproduce it here?
                    >
                    >Radio Userland 8.0.8 on PC, communicating with itself. (I see no
                    >particular value in having two machines for Jabber testing because as
                    >long as you cycle out to the server, it should all be the same. I hope
                    >that's not the difference.)

                    Right, further testing didn't turn up any significant differences
                    between platforms or between using two machines vs. just one machine
                    talking to itself.

                    I have found though that sending RPI messages from Radio works
                    correctly while sending RPI messages from Frontier does not,
                    regardless of whether the receiving end is running Frontier or Radio
                    in both cases.

                    Based on this observation, I think I have identified the cause for
                    the problems I encountered:

                    The versions of xml.entityEncode shipping with Radio and Frontier are
                    out of sync, specifically the code that is governed by the optional
                    flAlphaEntities flag.

                    From the Radio version of xml.entityEncode:

                    if flAlphaEntities {
                    s = string.replaceall (s, "&", "&");
                    s = string.replaceall (s, "<", "<");
                    s = string.replaceall (s, ">", ">");
                    s = string.replaceall (s, "\"", """);
                    s = string.replaceall (s, "'", "'")}

                    From the Frontier version of xml.entityEncode:

                    if flAlphaEntities {
                    s = string.replaceall (s, "<", "<");
                    s = string.replaceall (s, ">", ">");
                    s = string.replaceall (s, "&", "&");
                    s = string.replaceall (s, "\"", """)}

                    However, both apps ship with identical versions of xml.entityDecode:

                    if flAlphaEntities {
                    s = string.replaceall (s, "<", "<");
                    s = string.replaceall (s, ">", ">");
                    s = string.replaceall (s, "&", "&");
                    s = string.replaceall (s, """, "\"")}

                    In theory, the version of xml.entityEncode shipping with Radio is the
                    correct one, i.e. ampersands should be encoded before all other
                    entities.

                    If xml.entityDecode and xml.entityEncode are supposed to be inverse
                    operations of each other, it follows that xml.entityDecode needs to
                    be changed for both Radio and Frontier.

                    In practice however, I suspect that the Jabber code may not be the
                    only thing that currently relies on this encoding/decoding
                    inconsistency (bug?), so any official update of these verbs needs to
                    be carefully considered.

                    What I haven't tested is whether there is a similar asymmetry between
                    xml.compile and xml.decompile that would explain why this
                    inconsistency has not been discovered before.

                    HTH,

                    -Andre



                    --
                    Andre Radke + mailto:lists@... + http://www.spicynoodles.net/
                  Your message has been successfully submitted and would be delivered to recipients shortly.