Loading ...
Sorry, an error occurred while loading the content.

Re: [json] Stoppable SAX-like interface for streaming input of JSON text

Expand Messages
  • Fang Yidong
    Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction. It s true that it s convenient to control in simple
    Message 1 of 12 , Feb 4, 2009
    • 0 Attachment
      Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.

      It's true that it's convenient to control in simple case. But in a slightly more complex scenario, such as retrieving data in some desired location (for example, '/store/book[1]/title' in XPath expression), I don't think the code using a SAX(-like) parser is much more complex than using a StAX(-like) parser.

      Besides easier to pipeline, a SAX(-like) parser requires smaller memory footprint and is faster, and the stoppable SAX-like interface introduced by JSON.simple avoids the drawback that a traditional SAX parser requires the entire document to be parsed to get a simple data.

      I think different applications require different abstraction levels. JSON.simple's stoppable SAX-like interface provides a new option to the user. It's your choice of adopting it or not.
      发件人: Tatu Saloranta <tsaloranta@...>
      主题: Re: [json] Stoppable SAX-like interface for streaming input of JSON text
      收件人: json@yahoogroups.com
      日期: 2009,25,周四,1:22上午











      On Tue, Feb 3, 2009 at 8:09 PM, Fang Yidong <fangyidong@yahoo. com.cn> wrote:

      > JSON.simple introduces a simplified and stoppable SAX-like content handler to process JSON text stream. Please take a look if you are interested in it:

      >



      If you are interested in application code controlling parsing, why not

      just use Stax(-like) pull interface? Code example given would be quite

      a bit simpler with "pull" approach; essentially little more than

      recursive descent, or with some interfaces, linear iteration like:



      ---

      JsonParser jp = factory.createJsonP arser(input) ;

      JsonToken t;



      while ((t = jp.nextToken( )) != null) {

      if (t == JsonToken.FIELD_ NAME && "id".equals( t.getCurrentName ())) {

      break;

      }

      }

      if (t != null) { // get value for the field

      t = jp.nextToken( );

      System.out.println( "found id, value: "+jp.getText( ));

      }

      ---



      And you could obviously built simpler abstractions for matching on top of this.



      The main benefit of push-interface like SAX is that it is easier to

      pipeline multiple processing stages. Otherwise it is rather cumbersome

      and inconvenient way to process data that naturally comes in

      well-defined and structured order.



      I am asking because oftentimes xml/json/whatever parser writers use

      SAX-like approaches without knowing that it's only way to slice and

      dice data, and often not the best.



      -+ Tatu +-

























      ___________________________________________________________
      好玩贺卡等你发,邮箱贺卡全新上线!
      http://card.mail.cn.yahoo.com/

      [Non-text portions of this message have been removed]
    • Tatu Saloranta
      ... Not really. You may want to read a bit on Stax API for Java: I assume you are familiar with SAX API. Both operate at roughly same level of abstraction, and
      Message 2 of 12 , Feb 5, 2009
      • 0 Attachment
        On Wed, Feb 4, 2009 at 11:28 PM, Fang Yidong <fangyidong@...> wrote:
        > Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.

        Not really. You may want to read a bit on Stax API for Java: I assume
        you are familiar with SAX API.
        Both operate at roughly same level of abstraction, and handle parser
        tasks of ensuring proper structure of the data format regarding
        nesting (ordering of parsed tokens).

        > It's true that it's convenient to control in simple case. But in a slightly more complex scenario, such as retrieving data in some desired location (for example,
        > '/store/book[1]/title' in XPath expression), I don't think the code using a SAX(-like) parser is much more complex than using a StAX(-like) parser.

        That is debatable (I think event-handling approach is inherently less
        intuitive personally, others disagree).
        Path expressions can obviously run on either push or pull mode, so I
        would agree in that convenience factor is not as big as when comparing
        to higher abstractions (tree model, data binding).

        But I was just pointing out that the example does not show much benefits.
        So it would be good to showcase something where it does actually help?

        Or perhaps it's just that json.simple exposes push interface and this
        is an incremental improvement to the current way of doing this?

        > Besides easier to pipeline, a SAX(-like) parser requires smaller memory
        > footprint and is faster, and the stoppable SAX-like interface introduced by

        No and no. Why would this be the case? Both are streaming, need to
        keep limited amount of state. These are not true for xml pull vs push
        parsers, and I haven't observed this with json parsers either
        (including cases where a single parsers exposes both types of
        interfaces).
        My experience has been that in this regard approaches are roughly
        equivalent, differences are between parsers, not between APIs.

        > JSON.simple avoids the drawback that a traditional SAX parser requires the
        > entire document to be parsed to get a simple data.

        Right. That is a potentially useful incremental improvement. The
        example pointed to made it sound like there were other benefits, or
        that it was an optimal way of handling the task.
        For what it's worth, push interface could be used to build a simple
        solution too, just stop on match, don't worry about nesting etc. Or,
        pull alternative changed to handle additional constraints.

        > I think different applications require different abstraction levels. JSON.simple's

        But there is no different abstraction level here. Push (SAX) and pull
        streaming interfaces work at same level of abstraction. Higher levels
        would be tree models and data binding, both of which can be
        implemented on top of either of these lower level approaches.

        > stoppable SAX-like interface provides a new option to the user.
        > It's your choice of adopting it or not.

        Sure. But as part of advocating that, isn't it good to show why would
        it make sense to use it?

        As to stoppability of push parsing, the usual method so far has been
        to throw an exception from the event handler. Would that not have
        worked?

        -+ Tatu +-
      • Fang Yidong
        ... I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex: Yylex lexer
        Message 3 of 12 , Feb 5, 2009
        • 0 Attachment
          On Wed, Feb 4, 2009 at 11:28 PM, Fang Yidong <fangyidong@yahoo. com.cn> wrote:

          > > Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.



          > Not really. You may want to read a bit on Stax API for Java: I assume

          > you are familiar with SAX API.

          > Both operate at roughly same level of abstraction, and handle parser

          > tasks of ensuring proper structure of the data format regarding

          > nesting (ordering of parsed tokens).


          I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex:

          Yylex lexer = new Yylex(in);
          Yytoken token;
          while((token = lexer.yylex()) != null){
          ...
          }

          But your StAX parser keeps the states so that the user is able to check if the current token is a field name, while the lexer does not. So I said it's a higher abstraction (over a lexer). But except the field name, other tokens (start/end of a object, start/end of an array) are similar to ones that a lexer return.

          Although it's quite low level, I agree it's good to standardize it and StAX has its advantages in some applications.

          > > It's true that it's convenient to control in simple case. But in a slightly more complex scenario, such as retrieving data in some desired location (for example,

          > > '/store/book[ 1]/title' in XPath expression), I don't think the code using a SAX(-like) parser is much more complex than using a StAX(-like) parser.



          > That is debatable (I think event-handling approach is inherently less

          > intuitive personally, others disagree).

          > Path expressions can obviously run on either push or pull mode, so I

          > would agree in that convenience factor is not as big as when comparing

          > to higher abstractions (tree model, data binding).



          > But I was just pointing out that the example does not show much benefits.

          > So it would be good to showcase something where it does actually help?



          > Or perhaps it's just that json.simple exposes push interface and this

          > is an incremental improvement to the current way of doing this?



          > Besides easier to pipeline, a SAX(-like) parser requires smaller memory

          > footprint and is faster, and the stoppable SAX-like interface introduced by



          > No and no. Why would this be the case? Both are streaming, need to

          > keep limited amount of state. These are not true for xml pull vs push

          > parsers, and I haven't observed this with json parsers either

          > (including cases where a single parsers exposes both types of

          > interfaces).

          > My experience has been that in this regard approaches are roughly

          > equivalent, differences are between parsers, not between APIs.

          I agree. I was using the data from JSON.simple's implementation to compare with a StAX parser.

          > JSON.simple avoids the drawback that a traditional SAX parser requires the

          > entire document to be parsed to get a simple data.



          > Right. That is a potentially useful incremental improvement. The

          > example pointed to made it sound like there were other benefits, or

          > that it was an optimal way of handling the task.

          > For what it's worth, push interface could be used to build a simple

          > solution too, just stop on match, don't worry about nesting etc. Or,

          > pull alternative changed to handle additional constraints.



          > > I think different applications require different abstraction levels. JSON.simple' s



          > But there is no different abstraction level here. Push (SAX) and pull

          > streaming interfaces work at same level of abstraction. Higher levels

          > would be tree models and data binding, both of which can be

          > implemented on top of either of these lower level approaches.

          I mean users may choose among DOM-like, SAX-like and StAX-like parsers.


          > > stoppable SAX-like interface provides a new option to the user.

          > > It's your choice of adopting it or not.



          > Sure. But as part of advocating that, isn't it good to show why would

          > it make sense to use it?



          > As to stoppability of push parsing, the usual method so far has been

          > to throw an exception from the event handler. Would that not have

          > worked?

          Not really. Here stoppable also means it's resumable. That is, the user can pause at a point, doing other works, and then resume parsing or stop. Please refer the example for detail:

          http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_5_-_Stoppable_SAX-like_content_handler

          Actually, JSR 173(jsr173_07.pdf) argues it has advantages over SAX because:

          One drawback to the SAX API is that the programmer must keep track of the current
          state of the document in the code each time they process an XML document and thus cannot
          iteratively process it. Another drawback to SAX is that the entire document needs to be
          parsed at one time.

          The first drawback is partly true but in complex scenarios, a user with StAX parser may also need to keep track of states such as nesting levels, parent-child relationships and so on.

          The purpose of JSON.simple's stoppable SAX-like interface is to help relieve such issues.

          Yidong Fang


          ___________________________________________________________
          好玩贺卡等你发,邮箱贺卡全新上线!
          http://card.mail.cn.yahoo.com/
        • Tatu Saloranta
          ... Ah ok. Maybe I misread your comment: if you meant it s lower level of abstraction than a tree model, yes. I meant to say that SAX(-like) API is at similar
          Message 4 of 12 , Feb 6, 2009
          • 0 Attachment
            On Thu, Feb 5, 2009 at 4:52 PM, Fang Yidong <fangyidong@...> wrote:
            >
            > On Wed, Feb 4, 2009 at 11:28 PM, Fang Yidong <fangyidong@yahoo. com.cn> wrote:
            >
            >> > Well, if I am right, the parser in your example is essentially a lexer, with slightly higher abstraction.

            >> Not really. You may want to read a bit on Stax API for Java: I assume
            >
            > I did go through JSR 173 before comparing. I mean, the StAX parser in your example is similar to a lexer such as org.json.simple.parser.Yylex:

            Ah ok. Maybe I misread your comment: if you meant it's lower level of
            abstraction than a tree model, yes.
            I meant to say that SAX(-like) API is at similar level of abstraction
            as Stax(-like).
            ...
            > I mean users may choose among DOM-like, SAX-like and StAX-like parsers.

            Ok yes, that makes sense.

            (although DOM is technically not a parser but a tree model built on
            top of that, but many users still call it a parser)

            >> > stoppable SAX-like interface provides a new option to the user.
            >
            >> As to stoppability of push parsing, the usual method so far has been
            >> to throw an exception from the event handler. Would that not have
            >> worked?
            >
            > Not really. Here stoppable also means it's resumable. That is, the user can pause at a point, doing other works, and then resume parsing or stop. Please refer the example for detail:

            Ok. That makes more sense then, thank you for pointing this out.

            ...
            > Actually, JSR 173(jsr173_07.pdf) argues it has advantages over SAX because:
            >
            > One drawback to the SAX API is that the programmer must keep track of the
            ...
            > iteratively process it. Another drawback to SAX is that the entire document needs to be
            > parsed at one time.
            >
            > The first drawback is partly true but in complex scenarios, a user with StAX parser
            > may also need to keep track of states such as nesting levels, parent-child
            > relationships and so on.

            Maybe, but not necessarily, because this information if implicit
            within call stack (except for having to track end markers).
            That is, it's a recursive-descent kind of approach where you know
            where you came from, usually without additional tracking of location.
            Code branches based on constructs encountered.

            > The purpose of JSON.simple's stoppable SAX-like interface is to help relieve such
            > issues.

            Ok.

            -+ Tatu +-
          • Fang Yidong
            ... Yes, it s convenient. But I think it may result in a call stack based processor instead of a heap based one, right? The former will cause stack overflow
            Message 5 of 12 , Feb 6, 2009
            • 0 Attachment
              > > The first drawback is partly true but in complex scenarios, a user with StAX parser

              > > may also need to keep track of states such as nesting levels, parent-child

              > > relationships and so on.


              > Maybe, but not necessarily, because this information if implicit

              > within call stack (except for having to track end markers).

              > That is, it's a recursive-descent kind of approach where you know

              > where you came from, usually without additional tracking of location.

              > Code branches based on constructs encountered.


              Yes, it's convenient. But I think it may result in a call stack based processor instead of a heap based one, right? The former will cause stack overflow issues in a deep nesting level. Here's a heap based processor for building object graph with SAX-like interface:

              http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_6_-_Build_whole_object_graph_on_top_of_SAX-like_content



              ___________________________________________________________
              好玩贺卡等你发,邮箱贺卡全新上线!
              http://card.mail.cn.yahoo.com/
            • Mark Joseph
              I was reading over the StAX specification and BEA provides licenses to the API, but that license prevents sublicenses. This means I as a vendor cannot provide
              Message 6 of 12 , Feb 7, 2009
              • 0 Attachment
                I was reading over the StAX specification and BEA provides
                licenses to the API, but that license prevents
                sublicenses. This means I as a vendor cannot provide my
                own implementation and license that to customers. So if
                I am reading that right what is the point of that
                standard?
                We at P6R provide JSON and XML tools (amoung others), but
                if the standard has restrictions on it then its not a real
                standard that we can use.

                Mark
                P6R, Inc


                On Thu, 5 Feb 2009 15:28:41 +0800 (CST)
                Fang Yidong <fangyidong@...> wrote:
                > Well, if I am right, the parser in your example is
                >essentially a lexer, with slightly higher abstraction.
                >
                > It's true that it's convenient to control in simple
                >case. But in a slightly more complex scenario, such as
                >retrieving data in some desired location (for example,
                >'/store/book[1]/title' in XPath expression), I don't
                >think the code using a SAX(-like) parser is much more
                >complex than using a StAX(-like) parser.
                >
                > Besides easier to pipeline, a SAX(-like) parser requires
                >smaller memory footprint and is faster, and the stoppable
                >SAX-like interface introduced by JSON.simple avoids the
                >drawback that a traditional SAX parser requires the
                >entire document to be parsed to get a simple data.
                >
                > I think different applications require different
                >abstraction levels. JSON.simple's stoppable SAX-like
                >interface provides a new option to the user. It's your
                >choice of adopting it or not.
                > 发件人: Tatu Saloranta <tsaloranta@...>
                > 主题: Re: [json] Stoppable SAX-like interface for
                >streaming input of JSON text
                > 收件人: json@yahoogroups.com
                > 日期: 2009,25,周四,1:22上午
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                > On Tue, Feb 3, 2009 at 8:09 PM, Fang Yidong
                ><fangyidong@yahoo. com.cn> wrote:
                >
                >> JSON.simple introduces a simplified and stoppable
                >>SAX-like content handler to process JSON text stream.
                >>Please take a look if you are interested in it:
                >
                >>
                >
                >
                >
                > If you are interested in application code controlling
                >parsing, why not
                >
                > just use Stax(-like) pull interface? Code example given
                >would be quite
                >
                > a bit simpler with "pull" approach; essentially little
                >more than
                >
                > recursive descent, or with some interfaces, linear
                >iteration like:
                >
                >
                >
                > ---
                >
                > JsonParser jp = factory.createJsonP arser(input) ;
                >
                > JsonToken t;
                >
                >
                >
                > while ((t = jp.nextToken( )) != null) {
                >
                > if (t == JsonToken.FIELD_ NAME && "id".equals(
                >t.getCurrentName ())) {
                >
                > break;
                >
                > }
                >
                > }
                >
                > if (t != null) { // get value for the field
                >
                > t = jp.nextToken( );
                >
                > System.out.println( "found id, value: "+jp.getText( ));
                >
                > }
                >
                > ---
                >
                >
                >
                > And you could obviously built simpler abstractions for
                >matching on top of this.
                >
                >
                >
                > The main benefit of push-interface like SAX is that it
                >is easier to
                >
                > pipeline multiple processing stages. Otherwise it is
                >rather cumbersome
                >
                > and inconvenient way to process data that naturally
                >comes in
                >
                > well-defined and structured order.
                >
                >
                >
                > I am asking because oftentimes xml/json/whatever parser
                >writers use
                >
                > SAX-like approaches without knowing that it's only way
                >to slice and
                >
                > dice data, and often not the best.
                >
                >
                >
                > -+ Tatu +-
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                >
                > ___________________________________________________________
                > 好玩贺卡等你发,邮箱贺卡全新上线!
                > http://card.mail.cn.yahoo.com/
                >
                > [Non-text portions of this message have been removed]
                >

                -------------------------
                Mark Joseph, Ph.D.
                President and Secretary
                P6R, Inc.
                http://www.p6r.com
                408-205-0361
                Fax: 831-476-7490
                Skype: markjoseph_sc
                IM: (Yahoo) mjoseph8888
                (AIM) mjoseph8888
              • Tatu Saloranta
                ... I don t see why you would need a license to implement an API. Generally licensing governs usage of API itself, distributing it, modifying etc. None of
                Message 7 of 12 , Feb 7, 2009
                • 0 Attachment
                  On Sat, Feb 7, 2009 at 4:25 PM, Mark Joseph <mark@...> wrote:
                  > I was reading over the StAX specification and BEA provides
                  > licenses to the API, but that license prevents
                  > sublicenses. This means I as a vendor cannot provide my
                  > own implementation and license that to customers. So if

                  I don't see why you would need a license to implement an API.
                  Generally licensing governs usage of API itself, distributing it, modifying etc.
                  None of those are usually needed, because Stax is part of JDK 1.6.
                  Or you point users to download API jar itself from whoever can provide it.

                  Also: whatever stax specs download bundle claims is probably incorrect.

                  But yes, clearly BEA screwed up licensing mentions and other parts.

                  > I am reading that right what is the point of that
                  > standard?
                  > We at P6R provide JSON and XML tools (amoung others), but
                  > if the standard has restrictions on it then its not a real
                  > standard that we can use.

                  Just to be clear: Stax API itself has little to do with Json. It is a
                  Java xml processing API, and would be of little help for Json. There's
                  no point in trying to implement it, due to fundamental differences
                  between xml and json data formats.

                  But similar style ("pull parsing") is useful.

                  -+ Tatu +-
                • Tatu Saloranta
                  ... Yes, if your document has nesting level of about million or so. :-D So I don t think that is a practical concern. If it happens to be, then one can
                  Message 8 of 12 , Feb 7, 2009
                  • 0 Attachment
                    On Fri, Feb 6, 2009 at 5:33 PM, Fang Yidong <fangyidong@...> wrote:
                    >
                    >> Maybe, but not necessarily, because this information if implicit
                    >> within call stack (except for having to track end markers).
                    >
                    >> That is, it's a recursive-descent kind of approach where you know
                    >> where you came from, usually without additional tracking of location.
                    >> Code branches based on constructs encountered.
                    >
                    > Yes, it's convenient. But I think it may result in a call stack based processor instead of a heap based
                    > one, right? The former will cause stack overflow issues in a deep nesting level. Here's a heap

                    Yes, if your document has nesting level of about million or so. :-D
                    So I don't think that is a practical concern.

                    If it happens to be, then one can construct explicit stack, similar to
                    how one has to do it with SAX-like interfaces.

                    > based processor for building object graph with SAX-like interface:
                    > http://code.google.com/p/json-simple/wiki/DecodingExamples#Example_6_-_Build_whole_object_graph_on_top_of_SAX-like_content

                    Right: that builds "poor man's object binding", List/Map/primitive
                    structure from Json.
                    Most Json parsers offer that functionality via API, so it need not be
                    built from low-level components (json.org and others).
                    Code with pull API would be quite similar, although one could choose
                    between recursion and iteration with explicit stack.

                    -+ Tatu +-
                  • Mark Joseph
                    Ah I am sorry I was not clear we provide the JSON and XML tools to C++ users not Java. http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/ On
                    Message 9 of 12 , Feb 7, 2009
                    • 0 Attachment
                      Ah I am sorry I was not clear we provide the JSON and XML
                      tools to C++ users not Java.
                      http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/




                      On Sat, 7 Feb 2009 19:10:21 -0800
                      Tatu Saloranta <tsaloranta@...> wrote:
                      > On Sat, Feb 7, 2009 at 4:25 PM, Mark Joseph
                      ><mark@...> wrote:
                      >> I was reading over the StAX specification and BEA
                      >>provides
                      >> licenses to the API, but that license prevents
                      >> sublicenses. This means I as a vendor cannot provide my
                      >> own implementation and license that to customers. So
                      >>if
                      >
                      > I don't see why you would need a license to implement an
                      >API.
                      > Generally licensing governs usage of API itself,
                      >distributing it, modifying etc.
                      > None of those are usually needed, because Stax is part
                      >of JDK 1.6.
                      > Or you point users to download API jar itself from
                      >whoever can provide it.
                      >
                      > Also: whatever stax specs download bundle claims is
                      >probably incorrect.
                      >
                      > But yes, clearly BEA screwed up licensing mentions and
                      >other parts.
                      >
                      >> I am reading that right what is the point of that
                      >> standard?
                      >> We at P6R provide JSON and XML tools (amoung others),
                      >>but
                      >> if the standard has restrictions on it then its not a
                      >>real
                      >> standard that we can use.
                      >
                      > Just to be clear: Stax API itself has little to do with
                      >Json. It is a
                      > Java xml processing API, and would be of little help for
                      >Json. There's
                      > no point in trying to implement it, due to fundamental
                      >differences
                      > between xml and json data formats.
                      >
                      > But similar style ("pull parsing") is useful.
                      >
                      > -+ Tatu +-

                      -------------------------
                      Mark Joseph, Ph.D.
                      President and Secretary
                      P6R, Inc.
                      http://www.p6r.com
                      408-205-0361
                      Fax: 831-476-7490
                      Skype: markjoseph_sc
                      IM: (Yahoo) mjoseph8888
                      (AIM) mjoseph8888
                    • Tatu Saloranta
                      ... Ok that explains it. I shouldn t have assume it s for Java either. And it is true that for products that cover both xml and json, it is advantageous to use
                      Message 10 of 12 , Feb 7, 2009
                      • 0 Attachment
                        On Sat, Feb 7, 2009 at 8:35 PM, Mark Joseph <mark@...> wrote:
                        > Ah I am sorry I was not clear we provide the JSON and XML
                        > tools to C++ users not Java.
                        > http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/

                        Ok that explains it. I shouldn't have assume it's for Java either.
                        And it is true that for products that cover both xml and json, it is
                        advantageous to use same or similar interfaces too. There are some
                        java libraries that do something similar, such as jettison that
                        exposes json through java xml interfaces (stax in this case).

                        -+ Tatu +-
                      Your message has been successfully submitted and would be delivered to recipients shortly.