Re: Tokens and context
- One minor addendum to Monty's comments: also look at the
"qualifiedID" rule in antlr.g.
The one problem with the "qualifiedID" type of solution is that it
ignores whitespace (a.b.c.d and a . b.c.d are not the same). For your
problem, you may prefer to have a "STAR_IDENTIFIER" in addition to
STAR and IDENTIFIER: "*" followed by whitespace is a "STAR", while
"*"(<identifier>)? or <identifier>("*"(<identifier2>)?)+ is a
STAR_IDENTIFIER. ANTLR lexers can make the distinction, and probably
only two lexer rules are affected.
- Thanks so much for sending this link along. That chapter rocks (I loved the maze metaphor). For the time being, I'm going to stick with using "any" as a token (I'm quite flexible in the design of my language).
If I have cycles later to revisit this problem I'll see if I can figure out how to deal with "*" in a cleaner fashion. Using a TokenStreamFilter just might be the ticket as well; I can replace earlier "*"''s with the correct tokens.
From: mzukowski@... [mailto:mzukowski@...]
Sent: Mon 7/15/2002 5:16 PM
Subject: RE: [antlr-interest] Tokens and context
As with everything else in antlr, think about how you would do this by hand,
especially after reading Ter's chapter on "Building Translators by Hand"
Antlr doesn't have a good way of wildcarding things like that. It can deal
with ambiguous keywords but it takes some work, mostly with syntactic
predicates. See http://www.jguru.com/faq/view.jsp?EID=140.
It will mostly come down to dealing with the ambiguities. '*' can't morph
into the appropriate token by context, antlr just can't deal with it like
that. But you can put in syntactic predicates to elimiate the ambiguities.
You'll have to hoist them by hand though.
If you have enough context you could write a filter that changes '*' into
the right thing via a TokenStreamFilter. See
How complete is your language and what kind of performance do you need? For
small examples that will be typed into a search panel as given you could
iterate through parsing with '*' as identifier, modifier, expression, etc.
> -----Original Message-----Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
> From: John Lam [mailto:jlam@...]
> Sent: Monday, July 15, 2002 12:16 PM
> To: Antlrfirstname.lastname@example.org
> Subject: [antlr-interest] Tokens and context
> I've been scratching my head over this one. Here's the
> general problem:
> I want to be able to use "*" as a wildcarding character. The
> problem is
> that it can be treated as either a token or as part of an identifier.
> public void System.Foo();
> I would also like to find this method using the following expressions:
>  * void System.Foo();
>  public void *();
>  public void System.*();
>  public void S*.*();
> The problem is that * is treated as a token in  and as part of the
> identifier in cases 2-4.
> Is there a general solution to this problem? Currently I'm using a
> really crufty hack that involves creating a new token ("any") to
> represent * in the accessibility modifiers.
> Your use of Yahoo! Groups is subject to