Loading ...
Sorry, an error occurred while loading the content.

Re: [JSX] JSX on Java 1.7: issue with java.util.Locale

Expand Messages
  • Brendan Macmillan
    Hi Joël, Thanks for being so understanding, and having a specific time frame. That s helpful for scheduling. I thought it would be useful to record the
    Message 1 of 12 , Oct 4, 2012
      Hi Joël,

      Thanks for being so understanding, and having a specific time frame.
      That's helpful for scheduling.

      I thought it would be useful to record the problem and cause below, to
      help clarify it.
      You might find it interesting - but please don't feel any pressure to
      read it. It's long.


      I'd like to explain the issue, in these steps:
      1. the evolution of Locale between Java 1.6 and 1.7
      2. how JOS handles that class evolution
      3. how JSX handled it differently from JOS - that is, the bug
      4. finally, why this problem hasn't come up before in the 10
      commercial years of JSX

      1. Class evolution of Locale:
      Two fields were added to Locale (script and extensions) - but not in
      the way you would expect...

      The old Locale had 4 serializable fields (that is, non-static and

      But the new Locale doesn't have any such fields! Instead, it defines 6
      serializable fields, like this:
      private static final ObjectStreamField[] serialPersistentFields = {
      new ObjectStreamField("language", String.class),
      new ObjectStreamField("country", String.class),
      new ObjectStreamField("variant", String.class),
      new ObjectStreamField("hashcode", int.class),
      new ObjectStreamField("script", String.class), // <-- new
      new ObjectStreamField("extensions", String.class), // <-- new

      I think of these as "virtual fields" (meaning fake, faux, pretend),
      but that's not an official name. They are serialized by JOS and JSX
      exactly as if they were actual fields, so you can't tell they are
      virtual just by looking at the XML. They enable a class to pretend to
      have different fields. They are a compatibility layer, enabling you to
      change the actual fields of class, while still making it act as if it
      had the old fields. In other words, the serialized form is a kind of
      public API or interface; these virtual fields give you a loose
      coupling between that public interface and the implementation details.
      It's a kind of "information hiding", as in encapsulation, Abstract
      Data Types, and Parnas' paper on module decomposition

      [ BTW: So, if Locale lacks these fields, how does it store its
      information? It uses a BaseLocale object (sun.util.locale.BaseLocale)
      in a transient field, so it's not serialized. Being in the "sun"
      hierarchy, I thought source wasn't available, but here's the openJDK
      http://www.docjar.com/html/api/sun/util/locale/BaseLocale.java.html ]

      2. JOS and class evolution
      The problem occurs when XML serialized from the 1.6 Locale, is
      deserialized into the 1.7 Locale.

      Although the new Locale doesn't have any actual instance fields, it
      has specified its virtual fields with spf (above). That is, Locale is
      pretending that it has those 6 fields listed above. Note that these 6
      fields are stating what fields Locale has - not necessarily the fields
      in the serialized data.

      Before looking at evolution, let's consider the simpler case, where
      all 6 expected fields were serialized. How can Locale deserialize
      them, when it doesn't actually have those fields, to deserialize those
      values into? It has a special "readObject" method that JOS calls, in
      which these lines read in the serialized values:
      ObjectOutputStream.PutField fields = out.putFields();
      String language = (String)fields.get("language", "");
      String script = (String)fields.get("script", "");
      String country = (String)fields.get("country", "");
      String variant = (String)fields.get("variant", "");
      String extStr = (String)fields.get("extensions", "");

      For example, the first line get()s the "language" value from the
      serialized data, and puts it into a local variable, also named

      The second argument of "get()" is a default value, which "get()" will
      return if that field was not present in the serialized data.

      [ BTW: You might notice there are only 5 fields here, not 6 as in the
      spf. Although both new fields are present ("script" and "extension"),
      "hashcode" is not. This is because the 1.7 Locale doesn't use
      "hashcode", and so just ignores it. This does no harm, because the
      "putFields()" has already parsed all the serialized fields, into
      fields (which acts like a Map). So the get()s are just getting from a
      Map, and not parsing, thus it doesn't matter whether it read all or
      any of the values. (The reason "hashcode" is in spf is so it is
      written out in serialization. This enables old versions of Locale to
      get the data they need - it's maintaining the contract of the public
      API, for back-compatibility). ]

      Now let's consider evolution. In this case, the evolution is that the
      new Locale has two fields ("script" and "extension") that aren't
      present in the serialized data. It is exactly as you would expect:
      "get()" instead returns the default value of its second argument.

      3. JSX's bug
      JSX implements most of this correctly, so that "get()" will retrieve
      the serialized fields. It also implements the default correctly: if
      the requested field isn't present in the serialized data, the default
      is returned instead. The problem is in an extra check JSX does.

      In the situation, the JOS specification requires that if you attempt
      to "get()" a field that is not in the spf, an exception is thrown.
      Although this is a bit fussy, it does ensure that the spf accurate
      represents the fields that you can read - it enforces the field API.
      JSX also checks this correctly.

      The bug is one step further back: JSX wrongly determines what the
      fields are. From my comments, I thought that all fields in the spf had
      to *also* be actual instance fields - which, having read this far,
      you'll know is incorrect. JSX actually explicitly tested each virtual
      field in spf, and if it's not an actual field, JSX ignores it.

      So, when the 1.7 Locale tries to "get()" the serialized "script"
      field, JSX sees that it's not present, but because returning the
      default value, it checks that "script" is in spf. But, alas, because
      "script" is not an actual field, it was ignored - and so the check
      fails, and the exception is thrown.

      "How could I make such a mistake?" you ask. Well, I think it's because
      spf are also used in another way: in simple default deserialization,
      when you don't have any special custom code, spf is used to determine
      which subset of actual fields have values read into them. It is an
      alternative to transient: instead of marking which fields should not
      be serialized with transient, you specify which fields should be
      serialized. This is convenient if you have many instance fields but
      only a few are serialized. In that case, JSX's check is sensible.
      However, the way JOS seems to work is it just tries to do simple
      default deserialization, If a field is not present in the serialized
      data, it just ignores the issue. It's the same bahaviour as when spf
      is not used, and the evolution is a simple adding of actual fields.
      When trying to those fields are not present in the serialized data,
      but JOS just silently moves on, leavin
      g the instance field at its default value for its type (e.g. null for
      reference fields; 0 for ints etc)

      4. Why hasn't this bug come up before?
      It seems to be a rare case. There's no bug for class evolution where
      spf isn't used. And there's no bug for class evolution where spf is
      used, but the fields it specifies have not changed. This is an
      important use of spf: the specification talks about spf being used to
      keep the serial format the same, despite changes in the internal
      implementation. That's also how it is illustrated in the
      specification's example (File.java on the last page). This is
      information hiding, where the public interface of an API doesn't
      change, but the implementation does.

      The bug only occurs when spf is used, and the fields it specifies
      differ from those that were serialized - that is, it's an evolution of
      the "API" or interface that spf defines, rather than only an evolution
      of the internal details. This is reasonable - it's analogous to a
      public API having new methods added to it. It's certainly a valid and
      important use of the JOS specification that JSX should handle; it's
      just that it doesn't seem to be that commonly used.

      Locale in Java 1.7 might be the only class to use spf to evolve its
      fields, and so trigger this bug.

      Note that JSX has no problem deserializing to a 1.7 Locale, when it
      was also serialized by 1.7 Locale - the bug only affects data
      serialized from a previous version of Locale. Thus, transient
      persistence and transmission to another copy of the same version would
      not have triggered this bug.

      Still, JSX has been used commercially for 10 years (since Oct 2002),
      so it does seem a little surprising that it hasn't come up before.
    Your message has been successfully submitted and would be delivered to recipients shortly.