Loading ...
Sorry, an error occurred while loading the content.

Re: Comparing ASTs of the two Java1.5 grammars

Expand Messages
  • atripp54321
    Gack! Yahoo took away the indentation on my AST trees. If you re reading the parent post at groups.yahoo.com and you don t see indentation, just click reply
    Message 1 of 6 , Oct 31, 2004
    • 0 Attachment
      Gack!

      Yahoo took away the indentation on my AST trees.
      If you're reading the parent post at groups.yahoo.com
      and you don't see indentation, just click "reply"
      and you'll see the proper indentation.

      Andy
    • Michael Studman
      Hi Andy. Thanks for giving my grammar a test drive! It seems strange that annotations aren t stored in the AST - that was not my intention at all (and I think
      Message 2 of 6 , Nov 1, 2004
      • 0 Attachment
        Hi Andy.

        Thanks for giving my grammar a test drive!

        It seems strange that annotations aren't stored in the AST - that was
        not my intention at all (and I think it's very important that they are
        there). I will check out why that is happening and get back to the
        group.

        2a) also seems to be a bug. Again, I'll investigate and get back to you.

        Regards,
        Michael

        > -----Original Message-----
        > From: atripp54321 [mailto:atripp@...]
        > Sent: 25 October 2004 20:26
        > To: antlr-interest@yahoogroups.com
        > Subject: [antlr-interest] Comparing ASTs of the two Java1.5 grammars
        >
        >
        >
        > I went to update my JavaEmitter code for the new JDK1.5 grammar,
        > and I see we actually have two JDK1.5 grammars listed at antlr.org:
        > one by Michael Studman and another by Michael Stahl.
        > My code depends on the "shape" of the Java AST produced
        > by the grammar, and I'm sure eventually one of these two will
        > need to be chosen to be included with ANTLR as the "official" java.g.
        >
        > So I tried out these two grammars on the
        > various new 1.5 features, and here are my notes on
        > the ASTs that each of these grammars produce.
        > For reference, here's the Sun proposed Java 1.5 grammar:
        > http://java.sun.com/docs/books/jls/jls-proposed-changes.html
        >
        > 1) Annotations
        > Neither grammar stores annotations in the AST.
        > This seems right to me, as we don't store comments in the AST either.
        > Anyone who's annoyed that comments are not stored in the AST
        > will now be even more annoyed :)
        >
        > 2) Generics:
        > Given this code:
        > public Vector(Collection<? extends E> c) {
        >
        > Studman's produces this:
        > TYPE
        > IDENT Collection
        > TYPE_ARGUMENTS
        > TYPE_ARGUMENT
        > WILDCARD_TYPE
        > TYPE_UPPER_BOUNDS
        > IDENT E
        >
        > And Stahl's produces this:
        > TYPE
        > IDENT Collection
        > TYPE_ARGS
        > WILDCARD
        > LITERAL_extends
        > TYPE
        > IDENT E
        > TYPE_ARGS
        >
        > a) One places the TYPE subtree as a child IDENT, the other as a
        sibling.
        > I prefer Stahl's...seems strange for IDENT to have a child.
        > b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.
        > c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
        > but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
        > "TYPE_LOWER_BOUNDS"
        > to "TYPE_SUPER").
        > d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I
        think)
        >
        > 2) For-each loop:
        > Given this code:
        > for (Integer i : integers) {
        > }
        >
        > Studman's produces this:
        > LITERAL_for
        > FOR_EACH_CLAUSE
        > PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > IDENT Integer
        > IDENT i
        > EXPR
        > IDENT integers
        > SLIST
        >
        > And Stahl's produces this:
        > LITERAL_for
        > PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > IDENT Integer
        > TYPE_ARGS
        > IDENT i
        > EXPR
        > IDENT integers
        > SLIST
        >
        > I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
        > "FOR_INIT",
        > "FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.
        >
        > 3) Enums:
        > Given this code:
        > enum Rank2 implements whatever {ONE, TWO, THREE}
        > Studman's produces this:
        > ENUM_DEF
        > MODIFIERS
        > IDENT Rank2
        > IMPLEMENTS_CLAUSE
        > IDENT whatever
        > OBJBLOCK
        > ENUM_CONSTANT_DEF
        > ANNOTATIONS
        > IDENT ONE
        > ENUM_CONSTANT_DEF
        > ANNOTATIONS
        > IDENT TWO
        > ENUM_CONSTANT_DEF
        > ANNOTATIONS
        > IDENT THREE
        >
        > Stahl's failed with "unexpected token" exception.
        >
        > Given a full enum definitions, Studman's produced an AST that's
        identical
        > to a class definition, but with ENUM_DEF in place of CLASS_DEF.
        > Stahl's failed on this one too.
        >
        > 4) Varargs:
        > Given this code:
        > void test(int i, String... strings)
        >
        > Studman's produces this:
        > PARAMETERS
        > PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > LITERAL_int
        > IDENT i
        > VARIABLE_PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > IDENT String
        > IDENT strings
        >
        > And Stahl's produces this:
        > PARAMETERS
        > PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > LITERAL_int
        > IDENT i
        > PARAMETER_DEF
        > MODIFIERS
        > TYPE
        > IDENT String
        > TYPE_ARGS
        > ELLIPSIS
        > IDENT strings
        >
        > I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.
        >
        > 5) Static imports:
        > Given this code:
        > import static java.lang.Math.PI;
        >
        > Studman's produces this:
        > STATIC_IMPORT
        > DOT
        > DOT
        > DOT
        > IDENT java
        > IDENT lang
        > IDENT Math
        > IDENT PI
        >
        > And Stahl's produces this:
        > IMPORT
        > LITERAL_static
        > DOT
        > DOT
        > DOT
        > IDENT java
        > IDENT lang
        > IDENT Math
        > IDENT PI
        >
        > I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
        > import"
        > is just an "import" that happens to have a "static" modifier
        > (as when a variable is static),
        > or whether it's a new type of thing (in the way that a "static block"
        > differs
        > from a regular block).
        >
        > Summary:
        > Given that these two both correctly parse Java 1.5 code (which they
        seem
        > to except for the enum problem noted above), choosing one of these to
        > be the "official" java.g comes down to which produces a "better" AST.
        > I've listed the differences and it looks to me like Studman's AST's
        > look like they're more consistent with the ASTs we get today.
        >
        > And of course, some guru should look closely at the grammar to make
        > sure that it matches the "official" grammar in the JLS, and comments
        as
        > needed, make sure token names are consistent, etc.
        >
        > Andy
        >
        >
        >
        >
        >
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
        > ---
        >
        > Checked by AVG anti-virus system (http://www.grisoft.com).
        > Version: 6.0.778 / Virus Database: 525 - Release Date: 15/10/2004
        >

        ---

        Checked by AVG anti-virus system (http://www.grisoft.com).
        Version: 6.0.786 / Virus Database: 532 - Release Date: 29/10/2004
      • Michael Studman
        Hi Andy. I ve checked my grammar and as far as I can tell annotations are included in the AST. In most places ANNOTATION nodes are placed under the MODIFIERS
        Message 3 of 6 , Nov 2, 2004
        • 0 Attachment
          Hi Andy.

          I've checked my grammar and as far as I can tell annotations are
          included in the AST.

          In most places ANNOTATION nodes are placed under the MODIFIERS node
          (since in Java 5 annotations are considered a special type of
          modifier). For package definitions I use an additional node called
          ANNOTATIONS to contain ANNOTATIONs as other types of modifiers can't
          occur at this point.

          Please let me know if we're seeing the same tree!

          Michael.

          > -----Original Message-----
          > From: atripp54321 [mailto:atripp@...]
          > Sent: 25 October 2004 20:26
          > To: antlr-interest@yahoogroups.com
          > Subject: [antlr-interest] Comparing ASTs of the two Java1.5 grammars
          >
          >
          >
          > I went to update my JavaEmitter code for the new JDK1.5 grammar,
          > and I see we actually have two JDK1.5 grammars listed at antlr.org:
          > one by Michael Studman and another by Michael Stahl.
          > My code depends on the "shape" of the Java AST produced
          > by the grammar, and I'm sure eventually one of these two will
          > need to be chosen to be included with ANTLR as the "official" java.g.
          >
          > So I tried out these two grammars on the
          > various new 1.5 features, and here are my notes on
          > the ASTs that each of these grammars produce.
          > For reference, here's the Sun proposed Java 1.5 grammar:
          > http://java.sun.com/docs/books/jls/jls-proposed-changes.html
          >
          > 1) Annotations
          > Neither grammar stores annotations in the AST.
          > This seems right to me, as we don't store comments in the AST either.
          > Anyone who's annoyed that comments are not stored in the AST
          > will now be even more annoyed :)
          >
          > 2) Generics:
          > Given this code:
          > public Vector(Collection<? extends E> c) {
          >
          > Studman's produces this:
          > TYPE
          > IDENT Collection
          > TYPE_ARGUMENTS
          > TYPE_ARGUMENT
          > WILDCARD_TYPE
          > TYPE_UPPER_BOUNDS
          > IDENT E
          >
          > And Stahl's produces this:
          > TYPE
          > IDENT Collection
          > TYPE_ARGS
          > WILDCARD
          > LITERAL_extends
          > TYPE
          > IDENT E
          > TYPE_ARGS
          >
          > a) One places the TYPE subtree as a child IDENT, the other as a
          sibling.
          > I prefer Stahl's...seems strange for IDENT to have a child.
          > b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.
          > c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
          > but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
          > "TYPE_LOWER_BOUNDS"
          > to "TYPE_SUPER").
          > d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I
          think)
          >
          > 2) For-each loop:
          > Given this code:
          > for (Integer i : integers) {
          > }
          >
          > Studman's produces this:
          > LITERAL_for
          > FOR_EACH_CLAUSE
          > PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > IDENT Integer
          > IDENT i
          > EXPR
          > IDENT integers
          > SLIST
          >
          > And Stahl's produces this:
          > LITERAL_for
          > PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > IDENT Integer
          > TYPE_ARGS
          > IDENT i
          > EXPR
          > IDENT integers
          > SLIST
          >
          > I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
          > "FOR_INIT",
          > "FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.
          >
          > 3) Enums:
          > Given this code:
          > enum Rank2 implements whatever {ONE, TWO, THREE}
          > Studman's produces this:
          > ENUM_DEF
          > MODIFIERS
          > IDENT Rank2
          > IMPLEMENTS_CLAUSE
          > IDENT whatever
          > OBJBLOCK
          > ENUM_CONSTANT_DEF
          > ANNOTATIONS
          > IDENT ONE
          > ENUM_CONSTANT_DEF
          > ANNOTATIONS
          > IDENT TWO
          > ENUM_CONSTANT_DEF
          > ANNOTATIONS
          > IDENT THREE
          >
          > Stahl's failed with "unexpected token" exception.
          >
          > Given a full enum definitions, Studman's produced an AST that's
          identical
          > to a class definition, but with ENUM_DEF in place of CLASS_DEF.
          > Stahl's failed on this one too.
          >
          > 4) Varargs:
          > Given this code:
          > void test(int i, String... strings)
          >
          > Studman's produces this:
          > PARAMETERS
          > PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > LITERAL_int
          > IDENT i
          > VARIABLE_PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > IDENT String
          > IDENT strings
          >
          > And Stahl's produces this:
          > PARAMETERS
          > PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > LITERAL_int
          > IDENT i
          > PARAMETER_DEF
          > MODIFIERS
          > TYPE
          > IDENT String
          > TYPE_ARGS
          > ELLIPSIS
          > IDENT strings
          >
          > I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.
          >
          > 5) Static imports:
          > Given this code:
          > import static java.lang.Math.PI;
          >
          > Studman's produces this:
          > STATIC_IMPORT
          > DOT
          > DOT
          > DOT
          > IDENT java
          > IDENT lang
          > IDENT Math
          > IDENT PI
          >
          > And Stahl's produces this:
          > IMPORT
          > LITERAL_static
          > DOT
          > DOT
          > DOT
          > IDENT java
          > IDENT lang
          > IDENT Math
          > IDENT PI
          >
          > I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
          > import"
          > is just an "import" that happens to have a "static" modifier
          > (as when a variable is static),
          > or whether it's a new type of thing (in the way that a "static block"
          > differs
          > from a regular block).
          >
          > Summary:
          > Given that these two both correctly parse Java 1.5 code (which they
          seem
          > to except for the enum problem noted above), choosing one of these to
          > be the "official" java.g comes down to which produces a "better" AST.
          > I've listed the differences and it looks to me like Studman's AST's
          > look like they're more consistent with the ASTs we get today.
          >
          > And of course, some guru should look closely at the grammar to make
          > sure that it matches the "official" grammar in the JLS, and comments
          as
          > needed, make sure token names are consistent, etc.
          >
          > Andy
          >
          >
          >
          >
          >
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
          > ---
          >
          > Checked by AVG anti-virus system (http://www.grisoft.com).
          > Version: 6.0.778 / Virus Database: 525 - Release Date: 15/10/2004
          >

          ---

          Checked by AVG anti-virus system (http://www.grisoft.com).
          Version: 6.0.786 / Virus Database: 532 - Release Date: 29/10/2004
        • Michael Stahl
          ... hm, i believe my grammar should not throw annotations away ever. that would be a bug. do you have a testcase? ... mine would have a TYPE if the argument
          Message 4 of 6 , Nov 9, 2004
          • 0 Attachment
            On Mon, 25 Oct 2004 20:25:45 +0000, atripp54321 wrote:
            > I went to update my JavaEmitter code for the new JDK1.5 grammar,
            > and I see we actually have two JDK1.5 grammars listed at antlr.org:
            > one by Michael Studman and another by Michael Stahl.
            > My code depends on the "shape" of the Java AST produced
            > by the grammar, and I'm sure eventually one of these two will
            > need to be chosen to be included with ANTLR as the "official" java.g.
            >
            > So I tried out these two grammars on the
            > various new 1.5 features, and here are my notes on
            > the ASTs that each of these grammars produce.
            > For reference, here's the Sun proposed Java 1.5 grammar:
            > http://java.sun.com/docs/books/jls/jls-proposed-changes.html
            >
            > 1) Annotations
            > Neither grammar stores annotations in the AST.
            > This seems right to me, as we don't store comments in the AST either.
            > Anyone who's annoyed that comments are not stored in the AST
            > will now be even more annoyed :)

            hm, i believe my grammar should not throw annotations away ever.
            that would be a bug.
            do you have a testcase?

            > 2) Generics:
            > Given this code:
            > public Vector(Collection<? extends E> c) {
            >
            > Studman's produces this:
            > TYPE
            > IDENT Collection
            > TYPE_ARGUMENTS
            > TYPE_ARGUMENT
            > WILDCARD_TYPE
            > TYPE_UPPER_BOUNDS
            > IDENT E
            >
            > And Stahl's produces this:
            > TYPE
            > IDENT Collection
            > TYPE_ARGS
            > WILDCARD
            > LITERAL_extends
            > TYPE
            > IDENT E
            > TYPE_ARGS
            >
            > a) One places the TYPE subtree as a child IDENT, the other as a sibling.
            > I prefer Stahl's...seems strange for IDENT to have a child.
            > b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.

            mine would have a TYPE if the argument were not a WILDCARD,
            maybe i should have called it WILDCARD_ARG...
            i would say that the extra TYPE_ARGUMENT is superfluous in this case,
            since you can only have exactly one TYPE or exactly one WILDCARD
            within it anyway.

            > c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
            > but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
            > "TYPE_LOWER_BOUNDS"
            > to "TYPE_SUPER").
            > d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I think)

            that's not a bug, that's a feature :)
            my TYPE nodes always come with a TYPE_ARGS nested within, even if
            there aren't any type args. i thought it makes more sense this way,
            it is similar to e.g. MODIFIERS.

            > 2) For-each loop:
            > Given this code:
            > for (Integer i : integers) {
            > }
            >
            > Studman's produces this:
            > LITERAL_for
            > FOR_EACH_CLAUSE
            > PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > IDENT Integer
            > IDENT i
            > EXPR
            > IDENT integers
            > SLIST
            >
            > And Stahl's produces this:
            > LITERAL_for
            > PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > IDENT Integer
            > TYPE_ARGS
            > IDENT i
            > EXPR
            > IDENT integers
            > SLIST
            >
            > I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
            > "FOR_INIT",
            > "FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.

            oh, i just noticed that i have forgotten this.
            my whitespace-preserving parser puts a ENHANCED_FOR there, right
            where the FOR_EACH_CLAUSE goes, but the one i put up on antlr.org
            does not.

            > 3) Enums:
            > Given this code:
            > enum Rank2 implements whatever {ONE, TWO, THREE}
            > Studman's produces this:
            > ENUM_DEF
            > MODIFIERS
            > IDENT Rank2
            > IMPLEMENTS_CLAUSE
            > IDENT whatever
            > OBJBLOCK
            > ENUM_CONSTANT_DEF
            > ANNOTATIONS
            > IDENT ONE
            > ENUM_CONSTANT_DEF
            > ANNOTATIONS
            > IDENT TWO
            > ENUM_CONSTANT_DEF
            > ANNOTATIONS
            > IDENT THREE
            >
            > Stahl's failed with "unexpected token" exception.
            >
            > Given a full enum definitions, Studman's produced an AST that's identical
            > to a class definition, but with ENUM_DEF in place of CLASS_DEF.
            > Stahl's failed on this one too.

            oh, that would be because you forgot to turn on the enum keyword
            in the lexer. it is off by default, as it is not backwards compatible
            with java 1.4 code. just call the enableEnum() method of the lexer
            and try again.

            > 4) Varargs:
            > Given this code:
            > void test(int i, String... strings)
            >
            > Studman's produces this:
            > PARAMETERS
            > PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > LITERAL_int
            > IDENT i
            > VARIABLE_PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > IDENT String
            > IDENT strings
            >
            > And Stahl's produces this:
            > PARAMETERS
            > PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > LITERAL_int
            > IDENT i
            > PARAMETER_DEF
            > MODIFIERS
            > TYPE
            > IDENT String
            > TYPE_ARGS
            > ELLIPSIS
            > IDENT strings
            >
            > I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.

            hm... i think my ELLIPSIS node there sucks, no idea why i put it
            there :)

            > 5) Static imports:
            > Given this code:
            > import static java.lang.Math.PI;
            >
            > Studman's produces this:
            > STATIC_IMPORT
            > DOT
            > DOT
            > DOT
            > IDENT java
            > IDENT lang
            > IDENT Math
            > IDENT PI
            >
            > And Stahl's produces this:
            > IMPORT
            > LITERAL_static
            > DOT
            > DOT
            > DOT
            > IDENT java
            > IDENT lang
            > IDENT Math
            > IDENT PI
            >
            > I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
            > import"
            > is just an "import" that happens to have a "static" modifier
            > (as when a variable is static),
            > or whether it's a new type of thing (in the way that a "static block"
            > differs
            > from a regular block).

            hm, that's what i was asking myself...
            will java 1.6 have a "import private", to do away with that
            pesky information hiding?
            or maybe "import final", for when you're _really_ sure you need
            something? "import volatile" when you're not so sure?
            oh, sorry i am blathering nonsense.
            of course, there will be no java 1.6, they'll call it java 6.0 instead.

            > Summary:
            > Given that these two both correctly parse Java 1.5 code (which they seem
            > to except for the enum problem noted above), choosing one of these to
            > be the "official" java.g comes down to which produces a "better" AST.
            > I've listed the differences and it looks to me like Studman's AST's
            > look like they're more consistent with the ASTs we get today.
            >
            > And of course, some guru should look closely at the grammar to make
            > sure that it matches the "official" grammar in the JLS, and comments as
            > needed, make sure token names are consistent, etc.

            i have already checked that my grammar matches the p-f-d (or was it
            f-p-d?) of the relevant jsrs that were published in july/august
            iirc. excepting some things that could be done in the parser, but
            which are better checked in a semantic pass imho.
            i hope they haven't changed the syntax yet again since then...

            thanks for looking at things :)

            michael stahl
          Your message has been successfully submitted and would be delivered to recipients shortly.