Loading ...
Sorry, an error occurred while loading the content.

Code that lexes differently in ES3 vs ES3.1

Expand Messages
  • David-Sarah Hopwood
    Consider the following JavaScript source: [ /[/]/ /foo]/ + bar According to the ES3 spec, this is interpreted as: [ new RegExp( [ ) ] / new RegExp( foo] ) +
    Message 1 of 13 , Feb 9, 2009
    View Source
    • 0 Attachment
      Consider the following JavaScript source:

      [ /[/]/ /foo]/ + bar

      According to the ES3 spec, this is interpreted as:

      [ new RegExp("[") ] / new RegExp("foo]") + bar

      According to the ES3.1 draft spec, it is interpreted as:

      [ new RegExp("[\/]") / foo ] / +bar

      Apparently, Firefox and IE7 were lexing regexp literals in the way
      ES3.1 specifies. I had considered re-allowing regexp literals in
      Jacaranda 0.4, but this has scared me off doing so -- the potential
      for lexical confusion attacks is just too great.

      --
      David-Sarah Hopwood ⚥
    • Marcel Laverdet
      From what I remember this started out as a bug in IE and then Firefox followed suit for compatibility which left the other browsers with no choice. I can t
      Message 2 of 13 , Feb 9, 2009
      View Source
      • 0 Attachment

        From what I remember this started out as a bug in IE and then Firefox followed suit for compatibility which left the other browsers with no choice. I can't find the original bug but `/[/]/` only started parsing in FF1.5, in FF1.0 it would throw a syntax error.

        You could throw out any malformed regexp literals (any that differ between ES3 \ ES3.1) which is a fairly small subset and you would be ok.


        On Feb 9, 2009, at 9:16 AM, David-Sarah Hopwood wrote:

        Consider the following JavaScript source:

        [ /[/]/ /foo]/ + bar

        According to the ES3 spec, this is interpreted as:

        [ new RegExp("[") ] / new RegExp("foo] ") + bar

        According to the ES3.1 draft spec, it is interpreted as:

        [ new RegExp("[\/] ") / foo ] / +bar

        Apparently, Firefox and IE7 were lexing regexp literals in the way
        ES3.1 specifies. I had considered re-allowing regexp literals in
        Jacaranda 0.4, but this has scared me off doing so -- the potential
        for lexical confusion attacks is just too great.

        -- 
        David-Sarah Hopwood ⚥


      • Brendan Eich
        ... No, other browsers followed suit first. ... https://bugzilla.mozilla.org/show_bug.cgi?id=309840 Quoting from comment 0: Description From Jesse Ruderman
        Message 3 of 13 , Feb 9, 2009
        View Source
        • 0 Attachment
          On Feb 9, 2009, at 9:42 AM, Marcel Laverdet wrote:

          > From what I remember this started out as a bug in IE and then
          > Firefox followed suit for compatibility which left the other
          > browsers with no choice.

          No, other browsers followed suit first.


          > I can't find the original bug but `/[/]/` only started parsing in
          > FF1.5, in FF1.0 it would throw a syntax error.

          https://bugzilla.mozilla.org/show_bug.cgi?id=309840

          Quoting from comment 0:

          Description From Jesse Ruderman 2005-09-23 19:33:25 PST (-)
          [reply] Private

          Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b5) Gecko/
          20050923
          Firefox/1.4

          Firefox gives an "unterminated character class" error on the regular
          expression
          /[/]/. Safari and Opera 8.5 accept it.

          I noticed this because a regular expression containing [/] appears in
          http://msdn.microsoft.com/workshop/code/browdata.js, which was
          mentioned in
          bug 309695 comment 9.

          /be
        • Douglas Crockford
          ... ADsafe rejects [ /[/]/ /foo]/ + bar. Just because ECMAScript says its ok doesn t mean that ADsafe must. ADsafe insists that all internal / must have .
          Message 4 of 13 , Feb 9, 2009
          View Source
          • 0 Attachment
            David-Sarah Hopwood wrote:
            > Consider the following JavaScript source:
            >
            > [ /[/]/ /foo]/ + bar
            >
            > According to the ES3 spec, this is interpreted as:
            >
            > [ new RegExp("[") ] / new RegExp("foo]") + bar
            >
            > According to the ES3.1 draft spec, it is interpreted as:
            >
            > [ new RegExp("[\/]") / foo ] / +bar
            >
            > Apparently, Firefox and IE7 were lexing regexp literals in the way
            > ES3.1 specifies. I had considered re-allowing regexp literals in
            > Jacaranda 0.4, but this has scared me off doing so -- the potential
            > for lexical confusion attacks is just too great.

            ADsafe rejects [ /[/]/ /foo]/ + bar. Just because ECMAScript says its ok doesn't
            mean that ADsafe must. ADsafe insists that all internal / must have \.
          • Mike Samuel
            2009/2/9 Douglas Crockford ... Cajita disallows regex literals, but Valija uses the ES3.1 rule for lexing regexs and rewrites [ /[/]/
            Message 5 of 13 , Feb 9, 2009
            View Source
            • 0 Attachment
              2009/2/9 Douglas Crockford <douglas@...>
              >
              > David-Sarah Hopwood wrote:
              > > Consider the following JavaScript source:
              > >
              > > [ /[/]/ /foo]/ + bar
              > >
              > > According to the ES3 spec, this is interpreted as:
              > >
              > > [ new RegExp("[") ] / new RegExp("foo]") + bar
              > >
              > > According to the ES3.1 draft spec, it is interpreted as:
              > >
              > > [ new RegExp("[\/]") / foo ] / +bar
              > >
              > > Apparently, Firefox and IE7 were lexing regexp literals in the way
              > > ES3.1 specifies. I had considered re-allowing regexp literals in
              > > Jacaranda 0.4, but this has scared me off doing so -- the potential
              > > for lexical confusion attacks is just too great.
              >
              > ADsafe rejects [ /[/]/ /foo]/ + bar. Just because ECMAScript says its ok doesn't
              > mean that ADsafe must. ADsafe insists that all internal / must have \.

              Cajita disallows regex literals, but Valija uses the ES3.1 rule for
              lexing regexs and rewrites
              [ /[/]/ /foo]/ + bar
              to
              [ (new RegExp('[\\/]', '')) / foo ] / +bar
              though I think we do something to make sure it will always use the
              builtin RegExp ctor, again using the ES3.1 semantics.

              We rely on the builtin RegExp ctor to be callable with string
              arguments without introducing a breach, / after a close bracket token
              to always be interpreted as a div operator, and our normalization for
              string escapes to produce a string literal where the browser agrees
              with us where the string ends.
            • Marcel Laverdet
              My apologies.
              Message 6 of 13 , Feb 9, 2009
              View Source
              • 0 Attachment

                My apologies.

                On Feb 9, 2009, at 10:54 AM, Brendan Eich wrote:
                No, other browsers followed suit first.

              • Brendan Eich
                No need to apologize, and I did not aim to blame Opera or Safari in citing the record. This was not a situation where anyone fielding a browser compatible
                Message 7 of 13 , Feb 10, 2009
                View Source
                • 0 Attachment
                  No need to apologize, and I did not aim to blame Opera or Safari in citing the record. This was not a situation where anyone fielding a browser compatible enough to get market share could break the de-facto standard that IE set in the wake of Netscape going down (Netscape 4 was based on early SpiderMonkey code, which did not allow / unescaped in a character class).

                  /be

                  On Feb 9, 2009, at 11:31 PM, Marcel Laverdet wrote:



                  My apologies.

                  On Feb 9, 2009, at 10:54 AM, Brendan Eich wrote:
                  No, other browsers followed suit first.



                • David-Sarah Hopwood
                  ... I could, if I knew that there were no more bugs like this. Note that lexical confusion attacks of this kind can easily be turned into complete breaks of a
                  Message 8 of 13 , Feb 10, 2009
                  View Source
                  • 0 Attachment
                    Marcel Laverdet wrote:
                    >
                    > From what I remember this started out as a bug in IE and then Firefox
                    > followed suit for compatibility which left the other browsers with no
                    > choice. I can't find the original bug but `/[/]/` only started parsing
                    > in FF1.5, in FF1.0 it would throw a syntax error.
                    >
                    > You could throw out any malformed regexp literals (any that differ
                    > between ES3 \ ES3.1) which is a fairly small subset and you would be ok.

                    I could, if I knew that there were no more bugs like this. Note that
                    lexical confusion attacks of this kind can easily be turned into complete
                    breaks of a subset implementation:

                    [ /[/]/ /alert('toast')]/ + 1

                    Verifier sees valid, harmless code:
                    [ new RegExp("[") ] / new RegExp("alert('toast')]") + 1

                    Browser runs exploit code:
                    [ new RegExp("[\/]") / alert('toast') ] / +1

                    Since there's no way that I could reliably have known about the IE lexer
                    bug, it's just too risky.

                    Anyone know of other bugs where common JS implementations lex or parse
                    valid ES3 code with a different meaning than specified? (The only one
                    I can think of right now is \v in IE, but at least that doesn't result
                    in a parse with a different structure.)

                    --
                    David-Sarah Hopwood ⚥
                  • David-Sarah Hopwood
                    ... # This fixes a highly dup ed IE compatibility bug. It s an extension # to ECMA syntax that s
                    Message 9 of 13 , Feb 10, 2009
                    View Source
                    • 0 Attachment
                      Brendan Eich wrote:
                      > On Feb 9, 2009, at 9:42 AM, Marcel Laverdet wrote:
                      >
                      >> From what I remember this started out as a bug in IE and then
                      >> Firefox followed suit for compatibility which left the other
                      >> browsers with no choice.
                      >
                      > No, other browsers followed suit first.
                      >
                      >> I can't find the original bug but `/[/]/` only started parsing in
                      >> FF1.5, in FF1.0 it would throw a syntax error.
                      >
                      > https://bugzilla.mozilla.org/show_bug.cgi?id=309840

                      <https://bugzilla.mozilla.org/show_bug.cgi?id=309840#c12>

                      # This fixes a highly dup'ed IE compatibility bug. It's an extension
                      # to ECMA syntax that's allowed by Section 16. I'm approving it so
                      # that we can get it into 1.8b5 / Firefox 1.5b2.

                      As the example in my first post demonstrated, it is absolutely not
                      correct that this was an allowed Section 16 extension. That section
                      allows lexical extensions only if a program does not match the lexical
                      grammar in the spec (in this case using the ES3 definition of
                      RegularExpressionLiteral), and allows regexp syntax extensions only
                      if the resulting RegularExpressionBody does not match Pattern.

                      In fact this just makes me even more worried: it seems that Section 16
                      is being misinterpreted in a way that prevents independently developed
                      parsers, implemented strictly from the spec, from being able to match the
                      parsing behaviour of browsers on syntactically valid ES3 code. Is this
                      just a one-off mistake, or is Section 16 consistently being interpreted
                      too loosely?

                      --
                      David-Sarah Hopwood ⚥
                    • David-Sarah Hopwood
                      ... I m confused -- how does it know that the middle / in /[/]/ is internal ? Is it lexing according to the intersection of Pattern from section 15.10.1,
                      Message 10 of 13 , Feb 10, 2009
                      View Source
                      • 0 Attachment
                        Douglas Crockford wrote:
                        > David-Sarah Hopwood wrote:
                        >> Consider the following JavaScript source:
                        >>
                        >> [ /[/]/ /foo]/ + bar
                        >>
                        >> According to the ES3 spec, this is interpreted as:
                        >>
                        >> [ new RegExp("[") ] / new RegExp("foo]") + bar
                        >>
                        >> According to the ES3.1 draft spec, it is interpreted as:
                        >>
                        >> [ new RegExp("[\/]") / foo ] / +bar
                        >>
                        >> Apparently, Firefox and IE7 were lexing regexp literals in the way
                        >> ES3.1 specifies. I had considered re-allowing regexp literals in
                        >> Jacaranda 0.4, but this has scared me off doing so -- the potential
                        >> for lexical confusion attacks is just too great.
                        >
                        > ADsafe rejects [ /[/]/ /foo]/ + bar. Just because ECMAScript says its ok doesn't
                        > mean that ADsafe must. ADsafe insists that all internal / must have \.

                        I'm confused -- how does it know that the middle '/' in "/[/]/" is
                        "internal"? Is it lexing according to the intersection of Pattern
                        from section 15.10.1, and RegularExpressionBody?

                        --
                        David-Sarah Hopwood ⚥
                      • Brendan Eich
                        ... You re right, but so what? The IE bug and monopoly combined to create a de-facto standard. Appealing to the de-jure standard does you no good, and
                        Message 11 of 13 , Feb 10, 2009
                        View Source
                        • 0 Attachment
                          On Feb 10, 2009, at 6:34 AM, David-Sarah Hopwood wrote:
                          > Brendan Eich wrote:
                          > > On Feb 9, 2009, at 9:42 AM, Marcel Laverdet wrote:
                          > >
                          > >> From what I remember this started out as a bug in IE and then
                          > >> Firefox followed suit for compatibility which left the other
                          > >> browsers with no choice.
                          > >
                          > > No, other browsers followed suit first.
                          > >
                          > >> I can't find the original bug but `/[/]/` only started parsing in
                          > >> FF1.5, in FF1.0 it would throw a syntax error.
                          > >
                          > > https://bugzilla.mozilla.org/show_bug.cgi?id=309840
                          >
                          > <https://bugzilla.mozilla.org/show_bug.cgi?id=309840#c12>
                          >
                          > # This fixes a highly dup'ed IE compatibility bug. It's an extension
                          > # to ECMA syntax that's allowed by Section 16. I'm approving it so
                          > # that we can get it into 1.8b5 / Firefox 1.5b2.
                          >
                          > As the example in my first post demonstrated, it is absolutely not
                          > correct that this was an allowed Section 16 extension.
                          >
                          You're right, but so what? The IE bug and monopoly combined to create
                          a de-facto standard. Appealing to the de-jure standard does you no
                          good, and correcting my 2005-ear misunderstanding (you've corrected it
                          more recently in es-discuss) does not change the de-facto standard
                          trumping the de-jure one.

                          > In fact this just makes me even more worried: it seems that Section 16
                          > is being misinterpreted in a way that prevents independently developed
                          > parsers, implemented strictly from the spec, from being able to
                          > match the
                          > parsing behaviour of browsers on syntactically valid ES3 code. Is this
                          > just a one-off mistake, or is Section 16 consistently being
                          > interpreted
                          > too loosely?
                          >

                          This has nothing to do with Section 16 or my former misunderstanding
                          of it, and everything to do with IE forcing a de-facto standard. As
                          far as I know, no one at Microsoft added the bug allowing unescaped /
                          in a character class by arguing based on a misinterpretation of
                          Section 16. I think you are barking up the wrong tree.

                          /be
                          >
                          >
                          > --
                          > David-Sarah Hopwood ⚥
                          >
                          >
                          >
                        • Mike Samuel
                          ... Plenty. But I suspect you know of them. There s conditional compilation comments /* @cc_on */, and there s the newlines in block comments thing return /*
                          Message 12 of 13 , Feb 10, 2009
                          View Source
                          • 0 Attachment
                            2009/2/10 David-Sarah Hopwood <david.hopwood@...>:
                            > Marcel Laverdet wrote:
                            >>
                            >> From what I remember this started out as a bug in IE and then Firefox
                            >> followed suit for compatibility which left the other browsers with no
                            >> choice. I can't find the original bug but `/[/]/` only started parsing
                            >> in FF1.5, in FF1.0 it would throw a syntax error.
                            >>
                            >> You could throw out any malformed regexp literals (any that differ
                            >> between ES3 \ ES3.1) which is a fairly small subset and you would be ok.
                            >
                            > I could, if I knew that there were no more bugs like this. Note that
                            > lexical confusion attacks of this kind can easily be turned into complete
                            > breaks of a subset implementation:
                            >
                            > [ /[/]/ /alert('toast')]/ + 1
                            >
                            > Verifier sees valid, harmless code:
                            > [ new RegExp("[") ] / new RegExp("alert('toast')]") + 1
                            >
                            > Browser runs exploit code:
                            > [ new RegExp("[\/]") / alert('toast') ] / +1
                            >
                            > Since there's no way that I could reliably have known about the IE lexer
                            > bug, it's just too risky.
                            >
                            > Anyone know of other bugs where common JS implementations lex or parse
                            > valid ES3 code with a different meaning than specified? (The only one
                            > I can think of right now is \v in IE, but at least that doesn't result
                            > in a parse with a different structure.)

                            Plenty. But I suspect you know of them. There's conditional
                            compilation comments /* @cc_on */,
                            and there's the newlines in block comments thing return /*
                            */ foo();
                            and there's format control characters between pairs like */ and \".
                            There's other tricks you can do with \u escapes in identifiers and NUL
                            and BOM characters in source.



                            > --
                            > David-Sarah Hopwood ⚥
                          • Brendan Eich
                            ... Fixed in Firefox 3.1 beta nightlies: https://bugzilla.mozilla.org/show_bug.cgi?id=475834 We could push the fix back into a 3.0.x maintenance release if it
                            Message 13 of 13 , Feb 10, 2009
                            View Source
                            • 0 Attachment
                              On Feb 10, 2009, at 6:36 PM, Mike Samuel wrote:
                              > and there's the newlines in block comments thing return /*
                              > */ foo();
                              >

                              Fixed in Firefox 3.1 beta nightlies:

                              https://bugzilla.mozilla.org/show_bug.cgi?id=475834

                              We could push the fix back into a 3.0.x maintenance release if it
                              would help. Anyone with https://bugzilla.mozilla.org editbugs
                              permission who wants this, feel free to nominate the patch for approval.

                              /be
                            Your message has been successfully submitted and would be delivered to recipients shortly.