Loading ...
Sorry, an error occurred while loading the content.

Encoding problem!

Expand Messages
  • mattsns
    Hi! I have a problem with the Transfer API. Indeed, I can specify any encoding I want, using retrieveDocument will always return an UTF-8 encoded XML Document.
    Message 1 of 5 , May 19, 2004
    • 0 Attachment
      Hi!
      I have a problem with the Transfer API. Indeed, I can specify any
      encoding I want, using retrieveDocument will always return an UTF-8
      encoded XML Document.

      configprops.setProperty("ParserUtilsClass",
      "org.xmlmiddleware.xmlutils.external.ParserUtilsXerces");
      configprops.setProperty("Encoding", "ISO-8859-1");
      ...
      trans = new Transfer(configprops);
      trans.setDatabaseProperties(dbprops);
      xmlString = trans.retrieveDocument(configprops, mapFile.toString(),
      ftrFile.toString(), params);

      I don't have this problem when using the command line Transfer tool.

      Thanks for help.
    • xmldbms
      The problem is that you are calling the version of retrieveDocument that returns a String. In Java, Strings are always Unicode. Therefore, this method ignores
      Message 2 of 5 , May 22, 2004
      • 0 Attachment
        The problem is that you are calling the version of retrieveDocument
        that returns a String. In Java, Strings are always Unicode. Therefore,
        this method ignores the value of the encoding property.

        What you need to do instead is call the version of retrieveDocument
        that writes the XML document to a file. (This explains why Transfer
        works when called from the command line -- it calls the correct
        version of retrieveDocument.)

        Note that writing the String to a file and handling the encoding
        yourself will not work. The problem is that the encoding declaration
        in the XML document will be incorrect.

        -- Ron

        --- In xml-dbms@yahoogroups.com, "mattsns" <matts@z...> wrote:
        > Hi!
        > I have a problem with the Transfer API. Indeed, I can specify any
        > encoding I want, using retrieveDocument will always return an UTF-8
        > encoded XML Document.
        >
        > configprops.setProperty("ParserUtilsClass",
        > "org.xmlmiddleware.xmlutils.external.ParserUtilsXerces");
        > configprops.setProperty("Encoding", "ISO-8859-1");
        > ...
        > trans = new Transfer(configprops);
        > trans.setDatabaseProperties(dbprops);
        > xmlString = trans.retrieveDocument(configprops, mapFile.toString(),
        > ftrFile.toString(), params);
        >
        > I don't have this problem when using the command line Transfer tool.
        >
        > Thanks for help.
      • mattsns
        Actually, I don t really care about how things are encoded as the retrieved XML data are used directly into a Java GUI (a DOM tree is constructed from the XML,
        Message 3 of 5 , May 25, 2004
        • 0 Attachment
          Actually, I don't really care about how things are encoded as the
          retrieved XML data are used directly into a Java GUI (a DOM tree is
          constructed from the XML, and then mapped to a JTree). The problem is
          that these data are in french, with chars such as : éèà... And those
          chars are always bad-encoded. According to the encoding declaration
          returned by the retrieveDocument method, the String is UTF-8, and
          should handle accents and special chars.

          I've tried to convert the String in any encoding, but with no result,
          as it seems that the chars are corrupted from the beginning.

          May it come from the database itself? It is an MS Access 97 database,
          but if it works when transferring to a file, it should also works with
          a String, right? Moreover, it works when inserting a document.

          I think Java stores Strings in a slighty different Unicode than the
          standard one. May it come from there?

          Finally, I can the String is returned via RMI, but I think it may not
          be a problem, as it worked the same (bad) way before I used RMI.

          Thanx,
          Matt's


          --- In xml-dbms@yahoogroups.com, "xmldbms" <rpbourret@h...> wrote:
          > The problem is that you are calling the version of retrieveDocument
          > that returns a String. In Java, Strings are always Unicode. Therefore,
          > this method ignores the value of the encoding property.
          >
          > What you need to do instead is call the version of retrieveDocument
          > that writes the XML document to a file. (This explains why Transfer
          > works when called from the command line -- it calls the correct
          > version of retrieveDocument.)
          >
          > Note that writing the String to a file and handling the encoding
          > yourself will not work. The problem is that the encoding declaration
          > in the XML document will be incorrect.
          >
          > -- Ron
          >
          > --- In xml-dbms@yahoogroups.com, "mattsns" <matts@z...> wrote:
          > > Hi!
          > > I have a problem with the Transfer API. Indeed, I can specify any
          > > encoding I want, using retrieveDocument will always return an UTF-8
          > > encoded XML Document.
          > >
          > > configprops.setProperty("ParserUtilsClass",
          > > "org.xmlmiddleware.xmlutils.external.ParserUtilsXerces");
          > > configprops.setProperty("Encoding", "ISO-8859-1");
          > > ...
          > > trans = new Transfer(configprops);
          > > trans.setDatabaseProperties(dbprops);
          > > xmlString = trans.retrieveDocument(configprops, mapFile.toString(),
          > > ftrFile.toString(), params);
          > >
          > > I don't have this problem when using the command line Transfer tool.
          > >
          > > Thanks for help.
        • mattsns
          I ve solved my encoding problem. Here s what I ve done : - As Ron suggested, I use the retrieveDocument(toFile) method instead of retrieveDocument(toString).
          Message 4 of 5 , May 25, 2004
          • 0 Attachment
            I've solved my encoding problem.

            Here's what I've done :

            - As Ron suggested, I use the retrieveDocument(toFile) method instead
            of retrieveDocument(toString). I've got a temporary file with the
            right encoding.
            - I read this temporary file to build a nicely-encoded String.
            - I delete the temporary file.

            But this method introduced a new problem : due to the file identation,
            I got many white nodes (text node with no name and only white chars).
            To avoid this, I set outputFormat.setIndenting to false in the
            ParserUtilsXerces file as described in this post :
            http://groups.yahoo.com/group/xml-dbms/message/3466

            It's a bit silly to write a file only to read and delete it, but it's
            the only working solution I see so far.

            What I still don't understand is that UTF-8 should handle accentued
            chars...???

            Matt's

            --- In xml-dbms@yahoogroups.com, "mattsns" <matts@z...> wrote:
            > Actually, I don't really care about how things are encoded as the
            > retrieved XML data are used directly into a Java GUI (a DOM tree is
            > constructed from the XML, and then mapped to a JTree). The problem is
            > that these data are in french, with chars such as : éèà... And those
            > chars are always bad-encoded. According to the encoding declaration
            > returned by the retrieveDocument method, the String is UTF-8, and
            > should handle accents and special chars.
            >
            > I've tried to convert the String in any encoding, but with no result,
            > as it seems that the chars are corrupted from the beginning.
            >
            > May it come from the database itself? It is an MS Access 97 database,
            > but if it works when transferring to a file, it should also works with
            > a String, right? Moreover, it works when inserting a document.
            >
            > I think Java stores Strings in a slighty different Unicode than the
            > standard one. May it come from there?
            >
            > Finally, I can the String is returned via RMI, but I think it may not
            > be a problem, as it worked the same (bad) way before I used RMI.
            >
            > Thanx,
            > Matt's
            >
            >
            > --- In xml-dbms@yahoogroups.com, "xmldbms" <rpbourret@h...> wrote:
            > > The problem is that you are calling the version of retrieveDocument
            > > that returns a String. In Java, Strings are always Unicode. Therefore,
            > > this method ignores the value of the encoding property.
            > >
            > > What you need to do instead is call the version of retrieveDocument
            > > that writes the XML document to a file. (This explains why Transfer
            > > works when called from the command line -- it calls the correct
            > > version of retrieveDocument.)
            > >
            > > Note that writing the String to a file and handling the encoding
            > > yourself will not work. The problem is that the encoding declaration
            > > in the XML document will be incorrect.
            > >
            > > -- Ron
            > >
            > > --- In xml-dbms@yahoogroups.com, "mattsns" <matts@z...> wrote:
            > > > Hi!
            > > > I have a problem with the Transfer API. Indeed, I can specify any
            > > > encoding I want, using retrieveDocument will always return an UTF-8
            > > > encoded XML Document.
            > > >
            > > > configprops.setProperty("ParserUtilsClass",
            > > > "org.xmlmiddleware.xmlutils.external.ParserUtilsXerces");
            > > > configprops.setProperty("Encoding", "ISO-8859-1");
            > > > ...
            > > > trans = new Transfer(configprops);
            > > > trans.setDatabaseProperties(dbprops);
            > > > xmlString = trans.retrieveDocument(configprops, mapFile.toString(),
            > > > ftrFile.toString(), params);
            > > >
            > > > I don't have this problem when using the command line Transfer tool.
            > > >
            > > > Thanks for help.
          • rpbourret@rpbourret.com
            1) I don t know why UTF-8 would have problems with accented characters. Access stores text as single byte characters, so it should be able to handle accented
            Message 5 of 5 , May 25, 2004
            • 0 Attachment
              1) I don't know why UTF-8 would have problems with accented characters.

              Access stores text as single byte characters, so it should be able to handle
              accented characters. However, it appears you need to set the code page for
              Access through Tools/Options/General tab/Sort order field. (Not sure if this is
              exactly how it is done -- the Access documentation is not good.) You should make
              sure you can store/receive accented characters without using XML-DBMS just to
              verify this.

              I doubt RMI would cause this problem. Since RMI is designed to work with Java
              and Java is designed to use Unicode, it would be a very big design error if RMI
              couldn't handle accented characters.

              2) There is a better solution for your situation. Hopefully, it will solve the
              encoding problem as well. Since you read the XML as a DOM tree, then convert to
              a JTree, it is better for XML-DBMS to return a DOM tree to you directly. This
              means you won't have to write the XML to a file or reparse the XML from the file.

              To get a DOM tree from XML-DBMS, you will need to call DBMSToDOM directly,
              rather than having Transfer call it. For information on how to do this, see:

              a) The readme file.

              b) The JavaDocs for DBMSToDOM, FilterCompiler, etc.

              c) The code in Transfer. Most of the code in Transfer is pretty generic, but you
              should be able to pull out the pieces you need to build your application.

              -- Ron

              > I've solved my encoding problem.
              >
              > Here's what I've done :
              >
              > - As Ron suggested, I use the retrieveDocument(toFile) method instead
              > of retrieveDocument(toString). I've got a temporary file with the
              > right encoding.
              > - I read this temporary file to build a nicely-encoded String.
              > - I delete the temporary file.
              >
              > But this method introduced a new problem : due to the file identation,
              > I got many white nodes (text node with no name and only white chars).
              > To avoid this, I set outputFormat.setIndenting to false in the
              > ParserUtilsXerces file as described in this post :
              > http://groups.yahoo.com/group/xml-dbms/message/3466
              >
              > It's a bit silly to write a file only to read and delete it, but it's
              > the only working solution I see so far.
              >
              > What I still don't understand is that UTF-8 should handle accentued
              > chars...???
              >
              > Matt's
            Your message has been successfully submitted and would be delivered to recipients shortly.