Loading ...
Sorry, an error occurred while loading the content.

Re: JSX and string serialization

Expand Messages
  • Brendan Macmillan
    Hi Bent, I think Xrayrivet might suit your needs better. It is based on the JSX engine, but does not alias Strings. The XML is also much simpler. I would
    Message 1 of 4 , Mar 1, 2004
    View Source
    • 0 Attachment
      Hi Bent,

      I think Xrayrivet might suit your needs better. It is based on the JSX
      engine, but does not alias Strings. The XML is also much simpler. I would
      appreciate if you would try it out, and tell me what problems you have. I
      can then iterate quickly, to make it work for you. How does that sound?
      http://www.jsx.org/xrayrivet/xrayrivet.html

      It does not mention your usage, but it sounds like a good match.

      Warning: It is *very* alpha, but it's easy for me to improve it quickly
      because the completeness of JSX is right there under the hood. It's more a
      question of what to do first.


      JSX:
      You are right that Strings can't form circular references, but there can be
      multiple references to the same String (as you saw). JSX needs to work this
      way to be correct for serialization: Strings are aliased so that == works
      correctly if there are multiple references to the same String.

      BTW: here's how to tell if references are circular: keep a stack of the
      objects above you (from root to the current node). A reference is circular
      if and only if it refers to an object in that stack.

      But I'm more interested in making useful products; and I think your usage of
      JSX is a popular one.


      Cheers,
      Brendan

      > Hello,
      >
      > I have tried out JSX2 and found the following minor annoyance. While I
      > realize that it works like this because of issues caused by circular
      > data structures (and for all I know, perhaps JOS _requires_ it to be
      > like this), I still wanted to mention it.
      >
      > If I serialize an object graph that has a number of empty strings in
      > it, then the first string will be serialized with its value and the
      > rest will just idref it. This proves inconvenient when I later want to
      > edit the XML files and input real strings. In particular, if I want to
      > edit the one string that everyone else is referencing, I have to
      > remember to move the reference string to somewhere else or all other
      > strings will also have changed.
      >
      > It would have been much more edit-friendly if all strings were always
      > serialized with their full values regardless of whether or not other
      > strings in the object graph happen to have the same value.
      >
      > While the same argumentation holds for all other data types as well, I
      > imagine it is non-trivial to determine if any given object would cause
      > a loop. Strings, however, are guaranteed not to cause loops since they
      > don't actually refer to other objects and they can't be subclassed to
      > do so either.
      >
      > Again, I'm not sure if this approach would break other aspects of
      > serialization. That's for you to know and for me to speculate over :-)
      >
      > Cheers
      > Bent D
      > --
      > Bent Dalager - bcd@... - http://www.pvv.org/~bcd
      > powered by emacs
      >
    • Brendan Macmillan
      Hi Bent, Note: I ve cc ed this to the list, because I think others may be interested. ... would ... I ... It should work on primitives (unless within an array
      Message 2 of 4 , Mar 2, 2004
      View Source
      • 0 Attachment
        Hi Bent,

        Note: I've cc'ed this to the list, because I think others may be interested.

        > > I think Xrayrivet might suit your needs better. It is based on the JSX
        > > engine, but does not alias Strings. The XML is also much simpler. I
        would
        > > appreciate if you would try it out, and tell me what problems you have.
        I
        > > can then iterate quickly, to make it work for you. How does that sound?
        > > http://www.jsx.org/xrayrivet/xrayrivet.html
        >
        > I gave it a quick test run and found that I need more mappings than
        > what it currently has. Specifically, it failed on primitives.

        It should work on primitives (unless within an array - it doens't handle
        arrays at the moment).

        > I am using JSX as a component in a development tool we're using
        > internally, in which we work with moderately complex data structures
        > (they're basically deeply nested structures generated from IDL struct
        > declarations). As it is, it will at least need to support primitives,
        > nested objects and nested arrays for it to be of any immediate use. I
        > don't think our already pressed developers will have much patience
        > with xrayrivet if it proves to have a lot of problems. (I'd say that
        > we could try it out in a quiet period, but there aren't any :-)

        It doesn't support arrays at the moment.
        Primitives should be OK - less of course, they are within an array :-)
        Nested objects should be OK, as well as primitive values.

        I need to ask: do you have *any* cyclic or multiple references (apart from
        the Strings), that you need preserved?

        The goal of the project is to be able to map any Java object graph to any
        XML document, and
        most XML documents don't provide a way to represent such references, and
        introducing some mechanism (such as JSX's approach) would be an error in
        terms of the XML document.

        Arrays present two more problems for this goal:
        (1). Arrays have a runtime length - but this can't always be recorded
        explicitly, because many XML documents don't have an explicit length for
        lists. In general, you also can't solve this by storing the lenght in a
        separate mapping (or binding) document, because it can vary at runtime. The
        "obvious" solution is to record the runtime length implicitly, in terms of
        the components of the array. Just count them.

        It's a little bit of work to implement this, because you have to do it for
        each primitive type separately(mostly cut and paste tho, simulating
        generics).

        (2). Null values are needed by arrays of objects (for example, as <null/>) -
        but many XML documents don't use a null element. The truth is, to map to
        such a document, any null values found in the objects would be an error,
        because there is nothing to map them to. Unfortunately, arrays of objects
        quite commonly have an unused portion, of trailing nulls.

        Of course, these considerations don't apply to your case, because you aren't
        mapping to a target XML document. You just want to be able to enter Strings
        by hand (IIUC)


        > If, on the other hand, there is hope for getting it up to speed
        > relatively easily, it is a somewhat more promising proposition. I am
        > sure they _will_ appreciate the ease-of-edit they might be getting
        > once it's ready for prime time. I've just finished a basic GUI-based
        > editor for these data structures, though, and if they fall in love
        > with that (I can only hope :-), they might not see the benefit of
        > xrayrivet.
        >
        > I would personally like to have the "just use emacs" fallback though,
        > so I'll try to pitch it to them and see what they say. An
        > easier-to-edit XML format would very handy after all.
        >
        > How will licensing work for xrayrivet?

        I'm not sure about this at the moment, but it would probably be the same as
        for JSX.


        > > It does not mention your usage, but it sounds like a good match.
        > >
        > > Warning: It is *very* alpha, but it's easy for me to improve it quickly
        > > because the completeness of JSX is right there under the hood. It's more
        a
        > > question of what to do first.
        >
        > Is it "just" a question of writing the XSL scripts for it?

        No - there is a declarative mapping that you write once, and which is used
        for mapping in both directions (with XSL, you'd have to write two scripts).
        Plus, the mapping is specifically for Java and XML, so it is much simpler
        for this specific task. It's "XML databinding".

        > Using XSL
        > to morph JSX's output _does_ immediately strike me as a good idea, but
        > I am somewhat wary of the complexity that might be involved in the XSL
        > scripts. How readable do they become?

        You can do it, but they aren't very readable. It depends on what you need to
        do. There are example scripts in the JSX manual (towards the end) for
        evolving classes; and other example for XML databinding on the front page
        (www.jsx.org). You can get some kind of a sense of the complexity.

        > > JSX:
        > > You are right that Strings can't form circular references, but there can
        be
        > > multiple references to the same String (as you saw). JSX needs to work
        this
        > > way to be correct for serialization: Strings are aliased so that ==
        works
        > > correctly if there are multiple references to the same String.
        >
        > I can see that you don't want the serialize->deserialize cycle to
        > break the == operator. Many data structures may rely on it after
        > all. Reading between the lines (and extrapolating a bit), I am
        > guesssing that I may get a long way if I put new String("") into my
        > data structures rather than just "" ... I take it the refids only get
        > inserted if string1==string2 and not necessarily if
        > string1.equals(string2) ? (I build default instances of the IDL
        > structs myself using reflection, so I control what initially goes into
        > them.)

        Yes. I almost sent you a follow up last night, suggesting that; I'm glad my
        explanation was clear enough for you to be able to put it to use right away.
        :-)

        > I suppose I should just try it and see what happens :-)


        Bottom line: if you do fit within the target goal above, then I estimate
        1-2 weeks maximum until it's ready to go. But if you need some references
        (cyclic or multiple), then it's a conflict with the above goal, and much as
        I regret it, I can't do it as part of this particular project.

        So, let me know. :-)


        Cheers,
        Brendan
      • Brendan Macmillan
        Hi Bent, I ll reply to your other comments in a separate email. ... to ... (which ... you ... Cool. ... primitives?) ... Interesting, thanks for the
        Message 3 of 4 , Mar 3, 2004
        View Source
        • 0 Attachment
          Hi Bent,

          I'll reply to your other comments in a separate email.

          > > Perhaps a low-risk way for you to proceed is for neither of us to commit
          to
          > > anything? If I implement something that you need, you could check it
          (which
          > > I think is pretty quick?), iterating around this loop until it does what
          you
          > > want.
          >
          > Yes, this seems quite doable.

          Cool.

          > > Questions:
          > > 1. Why do you have arrays instead of collections? (are they of
          primitives?)
          >
          > I have a number of IDL files defining structs containing, among other
          > things, IDL sequences. These get converted into Java classes by an
          > IDL-to-Java compiler that we have no control over. The end result uses
          > arrays and not collections to represent sequences (this may be
          > required by the CORBA-to-Java mapping specification for all I know). I
          > cannot change these classes. It is these IDL-originated structures
          > that we want to edit in order to build arbitrary CORBA objects to send
          > across the network for testing purposes.
          >
          > The arrays we use can hold primitives and they can hold objects.
          >
          > The primary reason I am using JSX in the first place is that the IDL
          > compiler doesn't support making the generated classes Serializable and
          > since I can't change the resulting code myself (well, I could, but it
          > would be a nightmare) I needed something that could serialize any old
          > object.

          Interesting, thanks for the background!

          JSX is non-intrusive, which is a great strength when you have to (or prefer
          to) not change existing code.

          > As we discussed previously, I believe that I can relax that particular
          > requirement to "any old object with a non-cyclic member hierarchy".
          >
          > As a matter of interest, if you _do_ pass a cyclic hierarchy to
          > xrayrivet, how will it react? Will it identify the problem and throw
          > an exception?

          Hmm... it is driven by JSX internally, so that these would be passed as
          references. At the moment, xrayrivet just ignores references, but for a
          final implementation, it should thrown an exception (which would be
          switchable on/off). IOW, this is polish, which is easy to deal with later.


          > > 2. Do you have nulls in your object arrays? (even trailing)?
          >
          > There certainly can be. While our CORBA implementation doesn't support
          > sending null values, there can be null values in the structures while
          > the developer is building them and he might very well decide he wants
          > to save such an unfinished structure to file to continue work on it
          > later.

          Just to be 100% clear pedantic (because it makes a big difference later): in
          structures, yes; but would there be nulls in *arrays*?

          > Now, truth be told, I am somewhat ambivalent about letting the
          > developer put nulls into the structures since it would be a mistake to
          > have them there when trying to send the object over CORBA (and that is
          > the whole point after all). While we may decide to remove this
          > possibility in the future (after the developers have some experience
          > with using the tool), it will likely stay in for at least several
          > weeks.

          So you only need it for development. OK - we'll see how this goes.


          Additional Requirements Summary:
          - arrays of primitives
          - arrays of Objects
          - structs with nulls in them
          - arrays of Objects, with nulls in them (?)


          Cheers,
          Brendan
        • Brendan Macmillan
          Hi Bent, ... It s from the Artima article, he mentions casting, and also that the inefficientcy of autoboxing: # Anders Hejlsberg: # For example, with Java
          Message 4 of 4 , Mar 3, 2004
          View Source
          • 0 Attachment
            Hi Bent,

            > > BTW: The microsoft C# guy criticises Java's generics for being
            > > inefficient in this way IIRC, but probably in many cases it just
            > > doesn't matter. I mean, if you really want efficiency, use C. But
            > > computers are just absurdly fast these days, so it usually makes
            > > no discernable difference.
            >
            > I expect he's criticising the implicit casting that goes on in Java
            > generics. In theory, casting is expensive, but I'm not convinced that
            > this is the case in a single-inheritance system such as Java. Checking
            > the correctness of a cast in Java should really be quite cheap if
            > you're clever about it.

            It's from the Artima article, he mentions casting, and also that the
            inefficientcy of autoboxing:

            # Anders Hejlsberg:
            # For example, with Java generics, you don't actually get any of
            # the execution efficiency that I talked about, because when you
            # compile a generic class in Java, the compiler takes away the
            # type parameter and substitutes Object everywhere. So the
            # compiled image for List<T> is like a List where you use
            # the type Object everywhere. Of course, if you now try
            # to make a List<int>, you get boxing of all the ints.
            # So there's a bunch of overhead there.
            http://www.artima.com/intv/generics2.html


            > Anyway, as you say, one really has to measure to find out for sure if
            > it's an issue for any particular application. In your case, just
            > copying the algorithm multiple times is probably a more effecient
            > approach than profiling both solutions and then choosing one :-)

            :-) Yes, I think so. I kind of like the possibility of discovering I'm
            wrong, when I do profile it in future.


            > > JSX and xrayrivet would be sold as separate products, so if you wanted
            both,
            > > you would need two licenses. If you only wanted JSX or xrayrivet, then
            it
            > > would be one license.
            >
            > As a practical issue, if you have xrayrivet won't you also effectively
            > have JSX bundled with it? How would you prevent xrayrivet users from
            > calling the JSX API directly? Does Java offer a solution for this in
            > its security mechanisms (I don't think so, but am not entirely up to
            > date on JAR features) or would you make a custom JSX (with everything
            > having package scope in stead of public, for instance) for bundling
            > that was effectively uncallable from the outside?

            I had the idea of a making a single hidden version of JSX, that was
            common to both, and having an extra wrapper class for JSX, with
            public methods, that would only be present in the "JSX" jar.
            This makes it easy to control at configuration time in ant. But it
            does add an extra layer of complexity. Now that you raise it,
            I think a runtime check is simpler and efficient (it's only checked
            once per object graph). But I haven't given it much thought yet.

            Thanks very much for thinking on this! :-)

            hehehe I estimate that open source, without any worry at all
            about security and business-related issues, is at least three
            times easier than commercial software. If you aren't making
            a reusable component, then it is three times easier again. And if
            you forget ease of use, then it's yet another three times easier
            (as esr claims often is the case). By this reckoning, such an
            open source is 27 times easier than commercial software.


            Cheers,
            Brendan
          Your message has been successfully submitted and would be delivered to recipients shortly.