Loading ...
Sorry, an error occurred while loading the content.

Re: [json] Stoppable SAX-like interface for streaming input of JSON text

Expand Messages
  • Fang Yidong
    ... I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex: Yylex lexer
    Message 1 of 12 , Feb 5, 2009
    • 0 Attachment
      On Wed, Feb 4, 2009 at 11:28 PM, Fang Yidong <fangyidong@yahoo. com.cn> wrote:

      > > Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.



      > Not really. You may want to read a bit on Stax API for Java: I assume

      > you are familiar with SAX API.

      > Both operate at roughly same level of abstraction, and handle parser

      > tasks of ensuring proper structure of the data format regarding

      > nesting (ordering of parsed tokens).


      I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex:

      Yylex lexer = new Yylex(in);
      Yytoken token;
      while((token = lexer.yylex()) != null){
      ...
      }

      But your StAX parser keeps the states so that the user is able to check if the current token is a field name, while the lexer does not. So I said it's a higher abstraction (over a lexer). But except the field name, other tokens (start/end of a object, start/end of an array) are similar to ones that a lexer return.

      Although it's quite low level, I agree it's good to standardize it and StAX has its advantages in some applications.

      > > It's true that it's convenient to control in simple case. But in a slightly more complex scenario, such as retrieving data in some desired location (for example,

      > > '/store/book[ 1]/title' in XPath expression), I don't think the code using a SAX(-like) parser is much more complex than using a StAX(-like) parser.



      > That is debatable (I think event-handling approach is inherently less

      > intuitive personally, others disagree).

      > Path expressions can obviously run on either push or pull mode, so I

      > would agree in that convenience factor is not as big as when comparing

      > to higher abstractions (tree model, data binding).



      > But I was just pointing out that the example does not show much benefits.

      > So it would be good to showcase something where it does actually help?



      > Or perhaps it's just that json.simple exposes push interface and this

      > is an incremental improvement to the current way of doing this?



      > Besides easier to pipeline, a SAX(-like) parser requires smaller memory

      > footprint and is faster, and the stoppable SAX-like interface introduced by



      > No and no. Why would this be the case? Both are streaming, need to

      > keep limited amount of state. These are not true for xml pull vs push

      > parsers, and I haven't observed this with json parsers either

      > (including cases where a single parsers exposes both types of

      > interfaces).

      > My experience has been that in this regard approaches are roughly

      > equivalent, differences are between parsers, not between APIs.

      I agree. I was using the data from JSON.simple's implementation to compare with a StAX parser.

      > JSON.simple avoids the drawback that a traditional SAX parser requires the

      > entire document to be parsed to get a simple data.



      > Right. That is a potentially useful incremental improvement. The

      > example pointed to made it sound like there were other benefits, or

      > that it was an optimal way of handling the task.

      > For what it's worth, push interface could be used to build a simple

      > solution too, just stop on match, don't worry about nesting etc. Or,

      > pull alternative changed to handle additional constraints.



      > > I think different applications require different abstraction levels. JSON.simple' s



      > But there is no different abstraction level here. Push (SAX) and pull

      > streaming interfaces work at same level of abstraction. Higher levels

      > would be tree models and data binding, both of which can be

      > implemented on top of either of these lower level approaches.

      I mean users may choose among DOM-like, SAX-like and StAX-like parsers.


      > > stoppable SAX-like interface provides a new option to the user.

      > > It's your choice of adopting it or not.



      > Sure. But as part of advocating that, isn't it good to show why would

      > it make sense to use it?



      > As to stoppability of push parsing, the usual method so far has been

      > to throw an exception from the event handler. Would that not have

      > worked?

      Not really. Here stoppable also means it's resumable. That is, the user can pause at a point, doing other works, and then resume parsing or stop. Please refer the example for detail:

      http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_5_-_Stoppable_SAX-like_content_handler

      Actually, JSR 173(jsr173_07.pdf) argues it has advantages over SAX because:

      One drawback to the SAX API is that the programmer must keep track of the current
      state of the document in the code each time they process an XML document and thus cannot
      iteratively process it. Another drawback to SAX is that the entire document needs to be
      parsed at one time.

      The first drawback is partly true but in complex scenarios, a user with StAX parser may also need to keep track of states such as nesting levels, parent-child relationships and so on.

      The purpose of JSON.simple's stoppable SAX-like interface is to help relieve such issues.

      Yidong Fang


      ___________________________________________________________
      好玩贺卡等你发,邮箱贺卡全新上线!
      http://card.mail.cn.yahoo.com/
    • Tatu Saloranta
      ... Ah ok. Maybe I misread your comment: if you meant it s lower level of abstraction than a tree model, yes. I meant to say that SAX(-like) API is at similar
      Message 2 of 12 , Feb 6, 2009
      • 0 Attachment
        On Thu, Feb 5, 2009 at 4:52 PM, Fang Yidong <fangyidong@...> wrote:
        >
        > On Wed, Feb 4, 2009 at 11:28 PM, Fang Yidong <fangyidong@yahoo. com.cn> wrote:
        >
        >> > Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.

        >> Not really. You may want to read a bit on Stax API for Java: I assume
        >
        > I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex:

        Ah ok. Maybe I misread your comment: if you meant it's lower level of
        abstraction than a tree model, yes.
        I meant to say that SAX(-like) API is at similar level of abstraction
        as Stax(-like).
        ...
        > I mean users may choose among DOM-like, SAX-like and StAX-like parsers.

        Ok yes, that makes sense.

        (although DOM is technically not a parser but a tree model built on
        top of that, but many users still call it a parser)

        >> > stoppable SAX-like interface provides a new option to the user.
        >
        >> As to stoppability of push parsing, the usual method so far has been
        >> to throw an exception from the event handler. Would that not have
        >> worked?
        >
        > Not really. Here stoppable also means it's resumable. That is, the user can pause at a point, doing other works, and then resume parsing or stop. Please refer the example for detail:

        Ok. That makes more sense then, thank you for pointing this out.

        ...
        > Actually, JSR 173(jsr173_07.pdf) argues it has advantages over SAX because:
        >
        > One drawback to the SAX API is that the programmer must keep track of the
        ...
        > iteratively process it. Another drawback to SAX is that the entire document needs to be
        > parsed at one time.
        >
        > The first drawback is partly true but in complex scenarios, a user with StAX parser
        > may also need to keep track of states such as nesting levels, parent-child
        > relationships and so on.

        Maybe, but not necessarily, because this information if implicit
        within call stack (except for having to track end markers).
        That is, it's a recursive-descent kind of approach where you know
        where you came from, usually without additional tracking of location.
        Code branches based on constructs encountered.

        > The purpose of JSON.simple's stoppable SAX-like interface is to help relieve such
        > issues.

        Ok.

        -+ Tatu +-
      • Fang Yidong
        ... Yes, it s convenient. But I think it may result in a call stack based processor instead of a heap based one, right? The former will cause stack overflow
        Message 3 of 12 , Feb 6, 2009
        • 0 Attachment
          > > The first drawback is partly true but in complex scenarios, a user with StAX parser

          > > may also need to keep track of states such as nesting levels, parent-child

          > > relationships and so on.


          > Maybe, but not necessarily, because this information if implicit

          > within call stack (except for having to track end markers).

          > That is, it's a recursive-descent kind of approach where you know

          > where you came from, usually without additional tracking of location.

          > Code branches based on constructs encountered.


          Yes, it's convenient. But I think it may result in a call stack based processor instead of a heap based one, right? The former will cause stack overflow issues in a deep nesting level. Here's a heap based processor for building object graph with SAX-like interface:

          http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_6_-_Build_whole_object_graph_on_top_of_SAX-like_content



          ___________________________________________________________
          好玩贺卡等你发,邮箱贺卡全新上线!
          http://card.mail.cn.yahoo.com/
        • Mark Joseph
          I was reading over the StAX specification and BEA provides licenses to the API, but that license prevents sublicenses. This means I as a vendor cannot provide
          Message 4 of 12 , Feb 7, 2009
          • 0 Attachment
            I was reading over the StAX specification and BEA provides
            licenses to the API, but that license prevents
            sublicenses. This means I as a vendor cannot provide my
            own implementation and license that to customers. So if
            I am reading that right what is the point of that
            standard?
            We at P6R provide JSON and XML tools (amoung others), but
            if the standard has restrictions on it then its not a real
            standard that we can use.

            Mark
            P6R, Inc


            On Thu, 5 Feb 2009 15:28:41 +0800 (CST)
            Fang Yidong <fangyidong@...> wrote:
            > Well, if I am right, the parser in your example is
            >essentially a lexer, with slightly higher abstraction.
            >
            > It's true that it's convenient to control in simple
            >case. But in a slightly more complex scenario, such as
            >retrieving data in some desired location (for example,
            >'/store/book[1]/title' in XPath expression), I don't
            >think the code using a SAX(-like) parser is much more
            >complex than using a StAX(-like) parser.
            >
            > Besides easier to pipeline, a SAX(-like) parser requires
            >smaller memory footprint and is faster, and the stoppable
            >SAX-like interface introduced by JSON.simple avoids the
            >drawback that a traditional SAX parser requires the
            >entire document to be parsed to get a simple data.
            >
            > I think different applications require different
            >abstraction levels. JSON.simple's stoppable SAX-like
            >interface provides a new option to the user. It's your
            >choice of adopting it or not.
            > 发件人: Tatu Saloranta <tsaloranta@...>
            > 主题: Re: [json] Stoppable SAX-like interface for
            >streaming input of JSON text
            > 收件人: json@yahoogroups.com
            > 日期: 2009,25,周四,1:22上午
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            > On Tue, Feb 3, 2009 at 8:09 PM, Fang Yidong
            ><fangyidong@yahoo. com.cn> wrote:
            >
            >> JSON.simple introduces a simplified and stoppable
            >>SAX-like content handler to process JSON text stream.
            >>Please take a look if you are interested in it:
            >
            >>
            >
            >
            >
            > If you are interested in application code controlling
            >parsing, why not
            >
            > just use Stax(-like) pull interface? Code example given
            >would be quite
            >
            > a bit simpler with "pull" approach; essentially little
            >more than
            >
            > recursive descent, or with some interfaces, linear
            >iteration like:
            >
            >
            >
            > ---
            >
            > JsonParser jp = factory.createJsonP arser(input) ;
            >
            > JsonToken t;
            >
            >
            >
            > while ((t = jp.nextToken( )) != null) {
            >
            > if (t == JsonToken.FIELD_ NAME && "id".equals(
            >t.getCurrentName ())) {
            >
            > break;
            >
            > }
            >
            > }
            >
            > if (t != null) { // get value for the field
            >
            > t = jp.nextToken( );
            >
            > System.out.println( "found id, value: "+jp.getText( ));
            >
            > }
            >
            > ---
            >
            >
            >
            > And you could obviously built simpler abstractions for
            >matching on top of this.
            >
            >
            >
            > The main benefit of push-interface like SAX is that it
            >is easier to
            >
            > pipeline multiple processing stages. Otherwise it is
            >rather cumbersome
            >
            > and inconvenient way to process data that naturally
            >comes in
            >
            > well-defined and structured order.
            >
            >
            >
            > I am asking because oftentimes xml/json/whatever parser
            >writers use
            >
            > SAX-like approaches without knowing that it's only way
            >to slice and
            >
            > dice data, and often not the best.
            >
            >
            >
            > -+ Tatu +-
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            >
            > ___________________________________________________________
            > 好玩贺卡等你发,邮箱贺卡全新上线!
            > http://card.mail.cn.yahoo.com/
            >
            > [Non-text portions of this message have been removed]
            >

            -------------------------
            Mark Joseph, Ph.D.
            President and Secretary
            P6R, Inc.
            http://www.p6r.com
            408-205-0361
            Fax: 831-476-7490
            Skype: markjoseph_sc
            IM: (Yahoo) mjoseph8888
            (AIM) mjoseph8888
          • Tatu Saloranta
            ... I don t see why you would need a license to implement an API. Generally licensing governs usage of API itself, distributing it, modifying etc. None of
            Message 5 of 12 , Feb 7, 2009
            • 0 Attachment
              On Sat, Feb 7, 2009 at 4:25 PM, Mark Joseph <mark@...> wrote:
              > I was reading over the StAX specification and BEA provides
              > licenses to the API, but that license prevents
              > sublicenses. This means I as a vendor cannot provide my
              > own implementation and license that to customers. So if

              I don't see why you would need a license to implement an API.
              Generally licensing governs usage of API itself, distributing it, modifying etc.
              None of those are usually needed, because Stax is part of JDK 1.6.
              Or you point users to download API jar itself from whoever can provide it.

              Also: whatever stax specs download bundle claims is probably incorrect.

              But yes, clearly BEA screwed up licensing mentions and other parts.

              > I am reading that right what is the point of that
              > standard?
              > We at P6R provide JSON and XML tools (amoung others), but
              > if the standard has restrictions on it then its not a real
              > standard that we can use.

              Just to be clear: Stax API itself has little to do with Json. It is a
              Java xml processing API, and would be of little help for Json. There's
              no point in trying to implement it, due to fundamental differences
              between xml and json data formats.

              But similar style ("pull parsing") is useful.

              -+ Tatu +-
            • Tatu Saloranta
              ... Yes, if your document has nesting level of about million or so. :-D So I don t think that is a practical concern. If it happens to be, then one can
              Message 6 of 12 , Feb 7, 2009
              • 0 Attachment
                On Fri, Feb 6, 2009 at 5:33 PM, Fang Yidong <fangyidong@...> wrote:
                >
                >> Maybe, but not necessarily, because this information if implicit
                >> within call stack (except for having to track end markers).
                >
                >> That is, it's a recursive-descent kind of approach where you know
                >> where you came from, usually without additional tracking of location.
                >> Code branches based on constructs encountered.
                >
                > Yes, it's convenient. But I think it may result in a call stack based processor instead of a heap based
                > one, right? The former will cause stack overflow issues in a deep nesting level. Here's a heap

                Yes, if your document has nesting level of about million or so. :-D
                So I don't think that is a practical concern.

                If it happens to be, then one can construct explicit stack, similar to
                how one has to do it with SAX-like interfaces.

                > based processor for building object graph with SAX-like interface:
                > http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_6_-_Build_whole_object_graph_on_top_of_SAX-like_content

                Right: that builds "poor man's object binding", List/Map/primitive
                structure from Json.
                Most Json parsers offer that functionality via API, so it need not be
                built from low-level components (json.org and others).
                Code with pull API would be quite similar, although one could choose
                between recursion and iteration with explicit stack.

                -+ Tatu +-
              • Mark Joseph
                Ah I am sorry I was not clear we provide the JSON and XML tools to C++ users not Java. http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/ On
                Message 7 of 12 , Feb 7, 2009
                • 0 Attachment
                  Ah I am sorry I was not clear we provide the JSON and XML
                  tools to C++ users not Java.
                  http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/




                  On Sat, 7 Feb 2009 19:10:21 -0800
                  Tatu Saloranta <tsaloranta@...> wrote:
                  > On Sat, Feb 7, 2009 at 4:25 PM, Mark Joseph
                  ><mark@...> wrote:
                  >> I was reading over the StAX specification and BEA
                  >>provides
                  >> licenses to the API, but that license prevents
                  >> sublicenses. This means I as a vendor cannot provide my
                  >> own implementation and license that to customers. So
                  >>if
                  >
                  > I don't see why you would need a license to implement an
                  >API.
                  > Generally licensing governs usage of API itself,
                  >distributing it, modifying etc.
                  > None of those are usually needed, because Stax is part
                  >of JDK 1.6.
                  > Or you point users to download API jar itself from
                  >whoever can provide it.
                  >
                  > Also: whatever stax specs download bundle claims is
                  >probably incorrect.
                  >
                  > But yes, clearly BEA screwed up licensing mentions and
                  >other parts.
                  >
                  >> I am reading that right what is the point of that
                  >> standard?
                  >> We at P6R provide JSON and XML tools (amoung others),
                  >>but
                  >> if the standard has restrictions on it then its not a
                  >>real
                  >> standard that we can use.
                  >
                  > Just to be clear: Stax API itself has little to do with
                  >Json. It is a
                  > Java xml processing API, and would be of little help for
                  >Json. There's
                  > no point in trying to implement it, due to fundamental
                  >differences
                  > between xml and json data formats.
                  >
                  > But similar style ("pull parsing") is useful.
                  >
                  > -+ Tatu +-

                  -------------------------
                  Mark Joseph, Ph.D.
                  President and Secretary
                  P6R, Inc.
                  http://www.p6r.com
                  408-205-0361
                  Fax: 831-476-7490
                  Skype: markjoseph_sc
                  IM: (Yahoo) mjoseph8888
                  (AIM) mjoseph8888
                • Tatu Saloranta
                  ... Ok that explains it. I shouldn t have assume it s for Java either. And it is true that for products that cover both xml and json, it is advantageous to use
                  Message 8 of 12 , Feb 7, 2009
                  • 0 Attachment
                    On Sat, Feb 7, 2009 at 8:35 PM, Mark Joseph <mark@...> wrote:
                    > Ah I am sorry I was not clear we provide the JSON and XML
                    > tools to C++ users not Java.
                    > http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/

                    Ok that explains it. I shouldn't have assume it's for Java either.
                    And it is true that for products that cover both xml and json, it is
                    advantageous to use same or similar interfaces too. There are some
                    java libraries that do something similar, such as jettison that
                    exposes json through java xml interfaces (stax in this case).

                    -+ Tatu +-
                  Your message has been successfully submitted and would be delivered to recipients shortly.