Loading ...
Sorry, an error occurred while loading the content.

Possible memory leak in EncodingFactory ?

Expand Messages
  • chiss14
    Hello everyone, I ran into a problem using JayBird v2.0.1 with the embedded version of FireBird. It could very well be that I m not using the right technique
    Message 1 of 23 , Jun 2, 2006
    • 0 Attachment
      Hello everyone,

      I ran into a problem using JayBird v2.0.1 with the embedded version
      of FireBird. It could very well be that I'm not using the right
      technique to query the db, I'm still quite new with this.

      Anyway, a part of my application was doing a lot of plain SQL select
      and I was always running out of memory.

      Using NetBeans's profiler, I found that there was close to 7000
      instances of Encoding_cp1252 ... So I went to get the source code,
      found the beta 2.1.0 and traced my problem to the EncodingFactory
      class. It seems like its always returning new instances of
      Encoding_xxxx (which I guess in normal circumstances are getting
      cleaned up by the GC, but this wasn't happening in mine ;-) ).

      So to make a long story short, I looked what the Encoding_xxx classes
      were doing, assumed they only perform "translation" work, i.e. could be
      reused easily. So I did the following quick hack :

      private static HashMap encodingCache = new HashMap();

      public static Encoding getEncoding(String encoding){

      if (encoding == null) encoding = defaultEncoding;

      if (!encodingCache.containsKey(encoding))
      encodingCache.put(encoding,createEncoding(encoding));

      return (Encoding)encodingCache.get(encoding);

      //return createEncoding(encoding); }



      ...and this works like a charm! My (quite limited) tests looked ok, no
      adverse effects were observed. My memory usage is down (like, waaaaay
      down), which is all I wanted. Maybe this cache should go in a HashTable
      instead of an HashMap in a multithreaded context, I just don't know
      enough about JayBird's code.

      Any thoughts ?

      Francis



      [Non-text portions of this message have been removed]
    • Rick Debay
      Encoding_OneByte would have to be changed, probably for the better. Buffers are created that would be stepped on in a multithreaded environment, and they
      Message 2 of 23 , Jun 5, 2006
      • 0 Attachment
        Encoding_OneByte would have to be changed, probably for the better. Buffers are created that would be stepped on in a multithreaded environment, and they aren't needed. The method encodeToCharset(String) should be doing the work in the result buffer that's created, and decodeFromCharset(byte[]) should just create one on the stack.

        Oh, and the quick hack would need to be cleaned up:

        private static Map encodingCache = new HashMap();
        public static syncronized Encoding getEncoding(String encoding)
        {
        Encoding e;

        if ( encoding == null )
        {
        encoding = defaultEncoding;
        }

        e = encodingCache.get(encoding);
        if ( e == null )
        {
        e = createEncoding(encoding);
        encodingCache.put(encoding, e);
        }
        return e;

        //return createEncoding(encoding);
        }

        -----Original Message-----
        From: Firebird-Java@yahoogroups.com [mailto:Firebird-Java@yahoogroups.com] On Behalf Of chiss14
        Sent: Friday, June 02, 2006 3:26 PM
        To: Firebird-Java@yahoogroups.com
        Subject: [Firebird-Java] Possible memory leak in EncodingFactory ?


        Hello everyone,

        I ran into a problem using JayBird v2.0.1 with the embedded version of FireBird. It could very well be that I'm not using the right technique to query the db, I'm still quite new with this.

        Anyway, a part of my application was doing a lot of plain SQL select and I was always running out of memory.

        Using NetBeans's profiler, I found that there was close to 7000 instances of Encoding_cp1252 ... So I went to get the source code, found the beta 2.1.0 and traced my problem to the EncodingFactory class. It seems like its always returning new instances of Encoding_xxxx (which I guess in normal circumstances are getting cleaned up by the GC, but this wasn't happening in mine ;-) ).

        So to make a long story short, I looked what the Encoding_xxx classes were doing, assumed they only perform "translation" work, i.e. could be reused easily. So I did the following quick hack :

        private static HashMap encodingCache = new HashMap();

        public static Encoding getEncoding(String encoding){

        if (encoding == null) encoding = defaultEncoding;

        if (!encodingCache.containsKey(encoding))
        encodingCache.put(encoding,createEncoding(encoding));

        return (Encoding)encodingCache.get(encoding);

        //return createEncoding(encoding); }



        ...and this works like a charm! My (quite limited) tests looked ok, no adverse effects were observed. My memory usage is down (like, waaaaay down), which is all I wanted. Maybe this cache should go in a HashTable instead of an HashMap in a multithreaded context, I just don't know enough about JayBird's code.

        Any thoughts ?

        Francis



        [Non-text portions of this message have been removed]





        ------------------------ Yahoo! Groups Sponsor --------------------~--> Home is just a click away.  Make Yahoo! your home page now.
        http://us.click.yahoo.com/DHchtC/3FxNAA/yQLSAA/saFolB/TM
        --------------------------------------------------------------------~->


        Yahoo! Groups Links
      • Roman Rokytskyy
        Hi, ... Sorry for not replying with more precise information, but I remember that there was an attempt to add caching to the EncodingFactory, but the price for
        Message 3 of 23 , Jun 5, 2006
        • 0 Attachment
          Hi,

          > Encoding_OneByte would have to be changed, probably for the better.
          > Buffers are created that would be stepped on in a multithreaded
          > environment, and they aren't needed. The method encodeToCharset(String)
          > should be doing the work in the result buffer that's created, and
          > decodeFromCharset(byte[]) should just create one on the stack.

          Sorry for not replying with more precise information, but I remember that
          there was an attempt to add caching to the EncodingFactory, but the price
          for it was either instable or slow work in multithreaded env. and the fix
          was rolled back.

          Please check the CVS history for more details.

          Roman
        • Rick Debay
          Well, the memory problem could also be solved by removing the buffers (bufferB byte[128] and bufferC char[128]) from Encoding_OneByte. None of the classes in
          Message 4 of 23 , Jun 5, 2006
          • 0 Attachment
            Well, the memory problem could also be solved by removing the buffers (bufferB byte[128] and bufferC char[128]) from Encoding_OneByte.

            None of the classes in the hierarchy have a constructor, so the only overhead is standard object creation. So the question is if Concurrent.synchronizedMap's get() is more expensive than creation.
            Since there is no difference between the objects, there is no need to synchronize the whole function. If N threads all race to create a Cp1252 object, it matters not a bit which one ultimately puts in in the Map.
            Heck, then you could use Class.forName to create the encoder, and no one would need to maintain that huge switch statement.

            BTW, I don't have CVS access, I was using Koders.com to look at the code.

            -----Original Message-----
            From: Firebird-Java@yahoogroups.com [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Roman Rokytskyy
            Sent: Monday, June 05, 2006 3:13 PM
            To: Firebird-Java@yahoogroups.com
            Subject: Re: [Firebird-Java] Possible memory leak in EncodingFactory ?

            Hi,

            > Encoding_OneByte would have to be changed, probably for the better.
            > Buffers are created that would be stepped on in a multithreaded
            > environment, and they aren't needed. The method
            > encodeToCharset(String) should be doing the work in the result buffer
            > that's created, and
            > decodeFromCharset(byte[]) should just create one on the stack.

            Sorry for not replying with more precise information, but I remember that there was an attempt to add caching to the EncodingFactory, but the price for it was either instable or slow work in multithreaded env. and the fix was rolled back.

            Please check the CVS history for more details.

            Roman



            ------------------------ Yahoo! Groups Sponsor --------------------~--> Everything you need is one click away.  Make Yahoo! your home page now.
            http://us.click.yahoo.com/AHchtC/4FxNAA/yQLSAA/saFolB/TM
            --------------------------------------------------------------------~->


            Yahoo! Groups Links
          • Roman Rokytskyy
            ... You can always use the viewcvs service from SourceForge.net: See the comment to the version 1.10 - removed encoder caching as it leads to stream
            Message 5 of 23 , Jun 5, 2006
            • 0 Attachment
              > Well, the memory problem could also be solved by removing the buffers
              > (bufferB byte[128] and bufferC char[128]) from Encoding_OneByte.
              >
              > None of the classes in the hierarchy have a constructor, so the only
              > overhead is standard object creation. So the question is if
              > Concurrent.synchronizedMap's get() is more expensive than creation. Since
              > there is no difference between the objects, there is no need to
              > synchronize the whole function. If N threads all race to create a Cp1252
              > object, it matters not a bit which one ultimately puts in in the Map.
              > Heck, then you could use Class.forName to create the encoder, and no one
              > would need to maintain that huge switch statement.
              >
              > BTW, I don't have CVS access, I was using Koders.com to look at the code.

              You can always use the viewcvs service from SourceForge.net:


              See the comment to the version 1.10 - "removed encoder caching as it leads
              to stream corruption". The reason for this is the way bufferB and bufferC
              are used - they are used to keep intermediate results of computation. This
              prevents caching of encoders, since each object has a state. I think this
              can be improved, but did not check further yet.

              If someone comes with a patch that solves this problem in an efficient
              manner - I will be more than grateful for the fix.

              Roman
            • Roman Rokytskyy
              (Sorry for repost - forgot the URL) ... You can always use the viewcvs service from SourceForge.net: http://firebird.cvs.sourceforge.net/firebird/client-java/
              Message 6 of 23 , Jun 5, 2006
              • 0 Attachment
                (Sorry for repost - forgot the URL)

                > Well, the memory problem could also be solved by removing the buffers
                > (bufferB byte[128] and bufferC char[128]) from Encoding_OneByte.
                >
                > None of the classes in the hierarchy have a constructor, so the only
                > overhead is standard object creation. So the question is if
                > Concurrent.synchronizedMap's get() is more expensive than creation. Since
                > there is no difference between the objects, there is no need to
                > synchronize the whole function. If N threads all race to create a Cp1252
                > object, it matters not a bit which one ultimately puts in in the Map.
                > Heck, then you could use Class.forName to create the encoder, and no one
                > would need to maintain that huge switch statement.
                >
                > BTW, I don't have CVS access, I was using Koders.com to look at the code.

                You can always use the viewcvs service from SourceForge.net:

                http://firebird.cvs.sourceforge.net/firebird/client-java/

                See the comment to the version 1.10 - "removed encoder caching as it leads
                to stream corruption". The reason for this is the way bufferB and bufferC
                are used - they are used to keep intermediate results of computation. This
                prevents caching of encoders, since each object has a state. I think this
                can be improved, but did not check further yet.

                If someone comes with a patch that solves this problem in an efficient
                manner - I will be more than grateful for the fix.

                Roman
              • Rick Debay
                The buffers bufferB and bufferC are not used to store intermediate results. They are used as a work space for the encoding. In a previous post I suggested
                Message 7 of 23 , Jun 6, 2006
                • 0 Attachment
                  The buffers bufferB and bufferC are not used to store intermediate
                  results. They are used as a work space for the encoding.
                  In a previous post I suggested that bufferB be removed and the new
                  result array be used as the working space, and bufferC be allocated off
                  of the stack.
                  Getting rid of these instance variables makes the class follow the
                  flyweight pattern, so caching becomes a possibility (if testing supports
                  it).

                  // original code
                  public byte[] encodeToCharset(String str) {
                  if (bufferB.length < str.length())
                  bufferB = new byte[str.length()];
                  int length = encodeToCharset(str.toCharArray(), 0, str.length(),
                  bufferB);
                  byte[] result = new byte[length];
                  System.arraycopy(bufferB, 0, result, 0, length);
                  return result;
                  }

                  // simple change
                  public byte[] encodeToCharset(String str)
                  {
                  byte[] result = new byte[str.length];
                  /*int length =*/ encodeToCharset(str.toCharArray(), 0, str.length(),
                  result);
                  return result;
                  }

                  // original code
                  public String decodeFromCharset(byte[] in){
                  if (bufferC.length < in.length)
                  bufferC = new char[in.length];
                  int length = decodeFromCharset(in, 0, in.length, bufferC);
                  return new String(bufferC, 0, length);
                  }

                  // simple change
                  public String decodeFromCharset(byte[] in)
                  {
                  byte[] bufferC = new char[in.length];
                  /*int length =*/ decodeFromCharset(in, 0, in.length, bufferC);
                  return new String(bufferC, 0, length);
                  }


                  -----Original Message-----
                  From: Firebird-Java@yahoogroups.com
                  [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Roman Rokytskyy
                  Sent: Tuesday, June 06, 2006 2:45 AM
                  To: Firebird-Java@yahoogroups.com
                  Subject: Re: [Firebird-Java] Possible memory leak in EncodingFactory ?

                  (Sorry for repost - forgot the URL)

                  > Well, the memory problem could also be solved by removing the buffers
                  > (bufferB byte[128] and bufferC char[128]) from Encoding_OneByte.
                  >
                  > None of the classes in the hierarchy have a constructor, so the only
                  > overhead is standard object creation. So the question is if
                  > Concurrent.synchronizedMap's get() is more expensive than creation.
                  > Since there is no difference between the objects, there is no need to
                  > synchronize the whole function. If N threads all race to create a
                  > Cp1252 object, it matters not a bit which one ultimately puts in in
                  the Map.
                  > Heck, then you could use Class.forName to create the encoder, and no
                  > one would need to maintain that huge switch statement.
                  >
                  > BTW, I don't have CVS access, I was using Koders.com to look at the
                  code.

                  You can always use the viewcvs service from SourceForge.net:

                  http://firebird.cvs.sourceforge.net/firebird/client-java/

                  See the comment to the version 1.10 - "removed encoder caching as it
                  leads to stream corruption". The reason for this is the way bufferB and
                  bufferC are used - they are used to keep intermediate results of
                  computation. This prevents caching of encoders, since each object has a
                  state. I think this can be improved, but did not check further yet.

                  If someone comes with a patch that solves this problem in an efficient
                  manner - I will be more than grateful for the fix.

                  Roman
                • Roman Rokytskyy
                  ... I will apply these changes to the CVS and test it with our AS3AP benchmark. Thanks! Roman
                  Message 8 of 23 , Jun 6, 2006
                  • 0 Attachment
                    > The buffers bufferB and bufferC are not used to store intermediate
                    > results. They are used as a work space for the encoding.
                    > In a previous post I suggested that bufferB be removed and the new
                    > result array be used as the working space, and bufferC be allocated off
                    > of the stack.
                    > Getting rid of these instance variables makes the class follow the
                    > flyweight pattern, so caching becomes a possibility (if testing supports
                    > it).

                    I will apply these changes to the CVS and test it with our AS3AP benchmark.

                    Thanks!
                    Roman
                  • Mathew Joseph
                    Hi There, I have been writing a program to read mails and store the content into a firebird database. It consumes a large amount of heap memory and the tool
                    Message 9 of 23 , Jun 14, 2006
                    • 0 Attachment
                      Hi There,
                      I have been writing a program to read mails and store the content
                      into a firebird database. It consumes a large amount of heap memory
                      and the tool that i am using shows me memory leak problems with the
                      Encoding_NotOnebyte.encodeTocharset function. I came across this
                      response from you and was wondering whether you have put changes into
                      the CVS regarding this problem.
                      Regds
                      Mathew


                      --- In Firebird-Java@yahoogroups.com, "Roman Rokytskyy"
                      <rrokytskyy@...> wrote:
                      >
                      > > The buffers bufferB and bufferC are not used to store intermediate
                      > > results. They are used as a work space for the encoding.
                      > > In a previous post I suggested that bufferB be removed and the new
                      > > result array be used as the working space, and bufferC be
                      allocated off
                      > > of the stack.
                      > > Getting rid of these instance variables makes the class follow the
                      > > flyweight pattern, so caching becomes a possibility (if testing
                      supports
                      > > it).
                      >
                      > I will apply these changes to the CVS and test it with our AS3AP
                      benchmark.
                      >
                      > Thanks!
                      > Roman
                      >
                    • Rick Debay
                      Here is the code you are writing about: No chance of a leak in this code, unless String.getBytes() allocates off of the heap and your garbage collection thread
                      Message 10 of 23 , Jun 15, 2006
                      • 0 Attachment
                        Here is the code you are writing about:

                        No chance of a leak in this code, unless String.getBytes() allocates off
                        of the heap and your garbage collection thread is falling behind.

                        public byte[] encodeToCharset(String in){
                        byte[] result = null;
                        try {
                        result = in.getBytes(encoding);
                        }
                        catch (UnsupportedEncodingException uee){
                        }
                        return result;
                        }

                        If the garbage collection thread falls behind, the new String() may
                        cause heap memory to grow.

                        public int encodeToCharset(char[] in, int off, int len, byte[] out){
                        byte[] by = null;
                        try {
                        by = new String(in).getBytes(encoding);
                        System.arraycopy(by, 0, out, 0, by.length);
                        }
                        catch (UnsupportedEncodingException uee){
                        }
                        return by.length;
                        }

                        If caching is ever implemented, the equals() and hash() methods could be
                        added to Encoding_NotOneByte and a new one would not need to be
                        generated every time a column needs to be encoded or decoded. I'd also
                        make the encoding final and add a getEncoding() method just because I'm
                        anal about stuff like that. However, it would only reduce your problem
                        slightly as the member variable that contains the charset name is very
                        small compared to the text you're encoding.

                        What charset are you encoding from/to? Is it a multi-byte character
                        set?

                        BTW, does anyone know how Jaybird is supposed to behave if an
                        UnsupportedEncodingException occurs? As far as I can tell, it'll
                        silently send an empty buffer to the database.

                        Rick DeBay

                        -----Original Message-----
                        From: Firebird-Java@yahoogroups.com
                        [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Mathew Joseph
                        Sent: Thursday, June 15, 2006 12:36 AM
                        To: Firebird-Java@yahoogroups.com
                        Subject: [Firebird-Java] Re: Possible memory leak in EncodingFactory ?

                        Hi There,
                        I have been writing a program to read mails and store the content into
                        a firebird database. It consumes a large amount of heap memory and the
                        tool that i am using shows me memory leak problems with the
                        Encoding_NotOnebyte.encodeTocharset function. I came across this
                        response from you and was wondering whether you have put changes into
                        the CVS regarding this problem.
                        Regds
                        Mathew
                      • Rick Debay
                        Francis, what JVM are you using? ... From: Firebird-Java@yahoogroups.com [mailto:Firebird-Java@yahoogroups.com] On Behalf Of chiss14 Sent: Friday, June 02,
                        Message 11 of 23 , Jun 15, 2006
                        • 0 Attachment
                          Francis, what JVM are you using?

                          -----Original Message-----
                          From: Firebird-Java@yahoogroups.com [mailto:Firebird-Java@yahoogroups.com] On Behalf Of chiss14
                          Sent: Friday, June 02, 2006 3:26 PM
                          To: Firebird-Java@yahoogroups.com
                          Subject: [Firebird-Java] Possible memory leak in EncodingFactory ?


                          Hello everyone,

                          I ran into a problem using JayBird v2.0.1 with the embedded version of FireBird. It could very well be that I'm not using the right technique to query the db, I'm still quite new with this.

                          Anyway, a part of my application was doing a lot of plain SQL select and I was always running out of memory.

                          Using NetBeans's profiler, I found that there was close to 7000 instances of Encoding_cp1252 ... So I went to get the source code, found the beta 2.1.0 and traced my problem to the EncodingFactory class. It seems like its always returning new instances of Encoding_xxxx (which I guess in normal circumstances are getting cleaned up by the GC, but this wasn't happening in mine ;-) ).

                          So to make a long story short, I looked what the Encoding_xxx classes were doing, assumed they only perform "translation" work, i.e. could be reused easily. So I did the following quick hack :

                          private static HashMap encodingCache = new HashMap();

                          public static Encoding getEncoding(String encoding){

                          if (encoding == null) encoding = defaultEncoding;

                          if (!encodingCache.containsKey(encoding))
                          encodingCache.put(encoding,createEncoding(encoding));

                          return (Encoding)encodingCache.get(encoding);

                          //return createEncoding(encoding); }



                          ...and this works like a charm! My (quite limited) tests looked ok, no adverse effects were observed. My memory usage is down (like, waaaaay down), which is all I wanted. Maybe this cache should go in a HashTable instead of an HashMap in a multithreaded context, I just don't know enough about JayBird's code.

                          Any thoughts ?

                          Francis



                          [Non-text portions of this message have been removed]





                          ------------------------ Yahoo! Groups Sponsor --------------------~--> Home is just a click away.  Make Yahoo! your home page now.
                          http://us.click.yahoo.com/DHchtC/3FxNAA/yQLSAA/saFolB/TM
                          --------------------------------------------------------------------~->


                          Yahoo! Groups Links
                        • Roman Rokytskyy
                          ... Sorry for a late reply - was bit busy at work. I have added your code and tested it with AS3AP single user tests. It shows slight degradation of the
                          Message 12 of 23 , Jun 19, 2006
                          • 0 Attachment
                            > I will apply these changes to the CVS and test it with our AS3AP
                            > benchmark.

                            Sorry for a late reply - was bit busy at work.

                            I have added your code and tested it with AS3AP single user tests. It shows
                            slight degradation of the performance (before it took ~34 sec. for the test,
                            now it is ~38 sec).

                            However, since there is possible advantage to the people that prefer smaller
                            memory footprint compared to slightly better performance, I have added a
                            system property "jaybird.encoding.cache" (not connection property, but the
                            one that can be passed using -Djaybird.encoding.cache=true to JVM at
                            startup).

                            This week I will publish Release Candidate version of Jaybird 2.1, so
                            interested parties can check the solution.

                            Roman
                          • Rick Debay
                            Can you send me a diff? I m curious how removing the working buffer slowed it down that much, primitive memory allocation isn t that bad in Java. Also it
                            Message 13 of 23 , Jun 20, 2006
                            • 0 Attachment
                              Can you send me a diff? I'm curious how removing the working buffer
                              slowed it down that much, primitive memory allocation isn't that bad in
                              Java. Also it should have removed the need for one buffer. What JVM
                              were you using?

                              -----Original Message-----
                              From: Firebird-Java@yahoogroups.com
                              [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Roman Rokytskyy
                              Sent: Tuesday, June 20, 2006 2:39 AM
                              To: Firebird-Java@yahoogroups.com
                              Subject: Re: [Firebird-Java] Possible memory leak in EncodingFactory ?

                              > I will apply these changes to the CVS and test it with our AS3AP
                              > benchmark.

                              Sorry for a late reply - was bit busy at work.

                              I have added your code and tested it with AS3AP single user tests. It
                              shows slight degradation of the performance (before it took ~34 sec. for
                              the test, now it is ~38 sec).

                              However, since there is possible advantage to the people that prefer
                              smaller memory footprint compared to slightly better performance, I have
                              added a system property "jaybird.encoding.cache" (not connection
                              property, but the one that can be passed using
                              -Djaybird.encoding.cache=true to JVM at startup).

                              This week I will publish Release Candidate version of Jaybird 2.1, so
                              interested parties can check the solution.

                              Roman



                              ------------------------ Yahoo! Groups Sponsor --------------------~-->
                              Check out the new improvements in Yahoo! Groups email.
                              http://us.click.yahoo.com/6pRQfA/fOaOAA/yQLSAA/saFolB/TM
                              --------------------------------------------------------------------~->


                              Yahoo! Groups Links
                            • Roman Rokytskyy
                              ... That s what I thought some time ago... I have managed to slow down Jaybird twice by doing new byte[16384] in each operation (I wanted to ensure that the
                              Message 14 of 23 , Jun 20, 2006
                              • 0 Attachment
                                > Can you send me a diff? I'm curious how removing the working buffer
                                > slowed it down that much, primitive memory allocation isn't that bad in
                                > Java.

                                That's what I thought some time ago... I have managed to slow down Jaybird
                                twice by doing new byte[16384] in each operation (I wanted to ensure that
                                the communication buffer is not overwritten - I was looking for a bug and
                                that was the easiest solution to that problem). It took me few days in
                                profiler until I found the reason.

                                > Also it should have removed the need for one buffer. What JVM were you
                                > using?

                                Sun 1.4.2 - the one I use to build official releases.

                                The other thing wonders me more - the Encoding classes were not cached in
                                previous case, so I doubt that it reused the buffer a lot. It looks like
                                memory allocation during class instantiation (or with the constant length)
                                is less expensive than dynamic memory allocation...


                                The diff:

                                Index: EncodingFactory.java
                                ===================================================================
                                RCS file:
                                /cvsroot/firebird/client-java/src/main/org/firebirdsql/encodings/EncodingFactory.java,v
                                retrieving revision 1.13
                                retrieving revision 1.14
                                diff -r1.13 -r1.14
                                27a28,29
                                > public static final boolean USE_ENCODING_CACHING =
                                > Boolean.getBoolean("jaybird.encoding.cache");
                                >
                                116a119
                                > private static HashMap mainCache = new HashMap();
                                117a121,132
                                > if (USE_ENCODING_CACHING) {
                                > Encoding result = (Encoding)mainCache.get(encoding);
                                > if (result == null) {
                                > result = createEncodingInternal(encoding);
                                > mainCache.put(encoding, result);
                                > }
                                > return result;
                                > } else
                                > return createEncodingInternal(encoding);
                                > }
                                >
                                > public static Encoding createEncodingInternal(String encoding) {
                                214a230,231
                                > private static HashMap translatorCache = new HashMap();
                                >
                                215a233,245
                                > if (USE_ENCODING_CACHING) {
                                > Encoding result = (Encoding)translatorCache.get(encoding);
                                >
                                > if (result == null) {
                                > result = getEncodingInternal(encoding, charMapping);
                                > translatorCache.put(encoding, charMapping);
                                > }
                                >
                                > return result;
                                > } else
                                > return getEncodingInternal(encoding, charMapping);
                                > }
                                > public static Encoding getEncodingInternal(String encoding, char[]
                                > charMapping){
                                Index: Encoding_OneByte.java
                                ===================================================================
                                RCS file:
                                /cvsroot/firebird/client-java/src/main/org/firebirdsql/encodings/Encoding_OneByte.java,v
                                retrieving revision 1.4
                                retrieving revision 1.5
                                diff -r1.4 -r1.5
                                23a24,26
                                > * Revision 1.5 2006/06/20 06:34:00 rrokytskyy
                                > * added encoding caching that can be enabled via jaybird.encoding.cache
                                > property
                                > *
                                65,66c68,69
                                < byte[] bufferB = new byte[128];
                                < char[] bufferC = new char[128];
                                ---
                                > byte[] sharedBufferB = new byte[128];
                                > char[] sharedBufferC = new char[128];
                                70,75c73,86
                                < if (bufferB.length < str.length())
                                < bufferB = new byte[str.length()];
                                < int length = encodeToCharset(str.toCharArray(), 0, str.length(),
                                bufferB);
                                < byte[] result = new byte[length];
                                < System.arraycopy(bufferB, 0, result, 0, length);
                                < return result;
                                ---
                                > if (EncodingFactory.USE_ENCODING_CACHING) {
                                > byte[] result = new byte[str.length()];
                                > encodeToCharset(str.toCharArray(), 0, str.length(), result);
                                > return result;
                                > } else {
                                > if (sharedBufferB.length < str.length())
                                > sharedBufferB = new byte[str.length()];
                                >
                                > int length = encodeToCharset(str.toCharArray(), 0,
                                > str.length(), sharedBufferB);
                                >
                                > byte[] result = new byte[length];
                                > System.arraycopy(sharedBufferB, 0, result, 0, length);
                                > return result;
                                > }
                                88,91c99,108
                                < if (bufferC.length < in.length)
                                < bufferC = new char[in.length];
                                < int length = decodeFromCharset(in, 0, in.length, bufferC);
                                < return new String(bufferC, 0, length);
                                ---
                                > if (EncodingFactory.USE_ENCODING_CACHING) {
                                > char[] bufferC = new char[in.length];
                                > int length = decodeFromCharset(in, 0, in.length, bufferC);
                                > return new String(bufferC, 0, length);
                                > } else {
                                > if (sharedBufferC.length < in.length)
                                > sharedBufferC = new char[in.length];
                                > int length = decodeFromCharset(in, 0, in.length,
                                > sharedBufferC);
                                > return new String(sharedBufferC, 0, length);
                                > }

                                Roman
                              • Rick Debay
                                What happens when you roll back to the original EncodingFactory.java? The first change to be tested should be in Encoding_OneByte. I doubt pooling using a
                                Message 15 of 23 , Jun 20, 2006
                                • 0 Attachment
                                  What happens when you roll back to the original EncodingFactory.java?
                                  The first change to be tested should be in Encoding_OneByte. I doubt
                                  pooling using a HashMap would help much (so I say after rereading my
                                  original argument).

                                  The new encodeToCharset has less code than the original, and the same
                                  dynamic allocation of byte[]. The new decodeFromCharset has slightly
                                  less code, but one dynamic allocation of char[], which would occur in
                                  the original whenever more than 128 one byte characters were passed.

                                  So I reason the slow-down must be in the pooling. It would get even
                                  slower when changed to a SynchronizedHashMap or ConcurrentHashMap.
                                  Let's test with just the memory reduction for now and see what those
                                  numbers generate.

                                  Once my home computer is working better than it is now, I promise to
                                  contribute patches directly :-)

                                  -----Original Message-----
                                  From: Firebird-Java@yahoogroups.com
                                  [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Roman Rokytskyy
                                  Sent: Tuesday, June 20, 2006 12:16 PM
                                  To: Firebird-Java@yahoogroups.com
                                  Subject: Re: [Firebird-Java] Possible memory leak in EncodingFactory ?

                                  > Can you send me a diff? I'm curious how removing the working buffer
                                  > slowed it down that much, primitive memory allocation isn't that bad
                                  > in Java.

                                  That's what I thought some time ago... I have managed to slow down
                                  Jaybird twice by doing new byte[16384] in each operation (I wanted to
                                  ensure that the communication buffer is not overwritten - I was looking
                                  for a bug and that was the easiest solution to that problem). It took me
                                  few days in profiler until I found the reason.

                                  > Also it should have removed the need for one buffer. What JVM were
                                  > you using?

                                  Sun 1.4.2 - the one I use to build official releases.

                                  The other thing wonders me more - the Encoding classes were not cached
                                  in previous case, so I doubt that it reused the buffer a lot. It looks
                                  like memory allocation during class instantiation (or with the constant
                                  length) is less expensive than dynamic memory allocation...


                                  The diff:

                                  Index: EncodingFactory.java
                                  ===================================================================
                                  RCS file:
                                  /cvsroot/firebird/client-java/src/main/org/firebirdsql/encodings/Encodin
                                  gFactory.java,v
                                  retrieving revision 1.13
                                  retrieving revision 1.14
                                  diff -r1.13 -r1.14
                                  27a28,29
                                  > public static final boolean USE_ENCODING_CACHING =
                                  > Boolean.getBoolean("jaybird.encoding.cache");
                                  >
                                  116a119
                                  > private static HashMap mainCache = new HashMap();
                                  117a121,132
                                  > if (USE_ENCODING_CACHING) {
                                  > Encoding result = (Encoding)mainCache.get(encoding);
                                  > if (result == null) {
                                  > result = createEncodingInternal(encoding);
                                  > mainCache.put(encoding, result);
                                  > }
                                  > return result;
                                  > } else
                                  > return createEncodingInternal(encoding);
                                  > }
                                  >
                                  > public static Encoding createEncodingInternal(String encoding) {
                                  214a230,231
                                  > private static HashMap translatorCache = new HashMap();
                                  >
                                  215a233,245
                                  > if (USE_ENCODING_CACHING) {
                                  > Encoding result = (Encoding)translatorCache.get(encoding);
                                  >
                                  > if (result == null) {
                                  > result = getEncodingInternal(encoding, charMapping);
                                  > translatorCache.put(encoding, charMapping);
                                  > }
                                  >
                                  > return result;
                                  > } else
                                  > return getEncodingInternal(encoding, charMapping);
                                  > }
                                  > public static Encoding getEncodingInternal(String encoding, char[]

                                  > charMapping){
                                  Index: Encoding_OneByte.java
                                  ===================================================================
                                  RCS file:
                                  /cvsroot/firebird/client-java/src/main/org/firebirdsql/encodings/Encodin
                                  g_OneByte.java,v
                                  retrieving revision 1.4
                                  retrieving revision 1.5
                                  diff -r1.4 -r1.5
                                  23a24,26
                                  > * Revision 1.5 2006/06/20 06:34:00 rrokytskyy
                                  > * added encoding caching that can be enabled via
                                  > jaybird.encoding.cache property
                                  > *
                                  65,66c68,69
                                  < byte[] bufferB = new byte[128];
                                  < char[] bufferC = new char[128];
                                  ---
                                  > byte[] sharedBufferB = new byte[128];
                                  > char[] sharedBufferC = new char[128];
                                  70,75c73,86
                                  < if (bufferB.length < str.length())
                                  < bufferB = new byte[str.length()];
                                  < int length = encodeToCharset(str.toCharArray(), 0,
                                  str.length(),
                                  bufferB);
                                  < byte[] result = new byte[length];
                                  < System.arraycopy(bufferB, 0, result, 0, length);
                                  < return result;
                                  ---
                                  > if (EncodingFactory.USE_ENCODING_CACHING) {
                                  > byte[] result = new byte[str.length()];
                                  > encodeToCharset(str.toCharArray(), 0, str.length(),
                                  result);
                                  > return result;
                                  > } else {
                                  > if (sharedBufferB.length < str.length())
                                  > sharedBufferB = new byte[str.length()];
                                  >
                                  > int length = encodeToCharset(str.toCharArray(), 0,
                                  > str.length(), sharedBufferB);
                                  >
                                  > byte[] result = new byte[length];
                                  > System.arraycopy(sharedBufferB, 0, result, 0, length);
                                  > return result;
                                  > }
                                  88,91c99,108
                                  < if (bufferC.length < in.length)
                                  < bufferC = new char[in.length];
                                  < int length = decodeFromCharset(in, 0, in.length, bufferC);
                                  < return new String(bufferC, 0, length);
                                  ---
                                  > if (EncodingFactory.USE_ENCODING_CACHING) {
                                  > char[] bufferC = new char[in.length];
                                  > int length = decodeFromCharset(in, 0, in.length, bufferC);
                                  > return new String(bufferC, 0, length);
                                  > } else {
                                  > if (sharedBufferC.length < in.length)
                                  > sharedBufferC = new char[in.length];
                                  > int length = decodeFromCharset(in, 0, in.length,
                                  > sharedBufferC);
                                  > return new String(sharedBufferC, 0, length);
                                  > }

                                  Roman



                                  ------------------------ Yahoo! Groups Sponsor --------------------~-->
                                  Check out the new improvements in Yahoo! Groups email.
                                  http://us.click.yahoo.com/6pRQfA/fOaOAA/yQLSAA/saFolB/TM
                                  --------------------------------------------------------------------~->


                                  Yahoo! Groups Links
                                • Alexey Panchenko
                                  ... Shouldn t it be translatorCache.put(encoding, result); ? Or it should be cached at all - what is the role of charMapping argument - should it be include
                                  Message 16 of 23 , Jun 20, 2006
                                  • 0 Attachment
                                    Roman Rokytskyy wrote:

                                    >> if (USE_ENCODING_CACHING) {
                                    >> Encoding result = (Encoding)translatorCache.get(encoding);
                                    >>
                                    >> if (result == null) {
                                    >> result = getEncodingInternal(encoding, charMapping);
                                    >> translatorCache.put(encoding, charMapping);
                                    >> }

                                    Shouldn't it be "translatorCache.put(encoding, result);" ?
                                    Or it should be cached at all - what is the role of charMapping
                                    argument - should it be include in cache key ?

                                    >> if (EncodingFactory.USE_ENCODING_CACHING) {
                                    >> char[] bufferC = new char[in.length];
                                    >> int length = decodeFromCharset(in, 0, in.length, bufferC);
                                    >> return new String(bufferC, 0, length);

                                    The reason of slowdown is double memory allocations above:
                                    - first is "new char[]"
                                    - second is during "new String" - the char[] is copied into the String

                                    Without pooling the shared char[] is used and only new String is
                                    allocated.

                                    Probably we can use the String constructor:

                                    public String(byte bytes[], int offset, int length, String charsetName)

                                    Though it can be slower than internal implementation.

                                    --
                                    Best regards,
                                    Alexey mailto:alex+news@...
                                  • Roman Rokytskyy
                                    ... The number are comparable. I guess we stop here. The implementation of Encoding interface will be cached for the lifetime of the XSQLVAR object, won t
                                    Message 17 of 23 , Jun 20, 2006
                                    • 0 Attachment
                                      > So I reason the slow-down must be in the pooling. It would get even
                                      > slower when changed to a SynchronizedHashMap or ConcurrentHashMap.
                                      > Let's test with just the memory reduction for now and see what those
                                      > numbers generate.

                                      The number are comparable. I guess we stop here. The implementation of
                                      Encoding interface will be "cached" for the lifetime of the XSQLVAR object,
                                      won't have allocated memory except the one for the object itself.

                                      Thanks!
                                      Roman
                                    • Alexey Panchenko
                                      ... First of all, there is no need to have sharedBufferB in encodeToCharset() - any way result should be allocated. Index: Encoding_OneByte.java
                                      Message 18 of 23 , Jun 21, 2006
                                      • 0 Attachment
                                        Roman Rokytskyy wrote:

                                        > The number are comparable. I guess we stop here. The implementation of
                                        > Encoding interface will be "cached" for the lifetime of the XSQLVAR object,
                                        > won't have allocated memory except the one for the object itself.

                                        First of all, there is no need to have sharedBufferB in
                                        encodeToCharset() - any way result should be allocated.

                                        Index: Encoding_OneByte.java
                                        ===================================================================
                                        RCS file: /cvsroot/firebird/client-java/src/main/org/firebirdsql/encodings/Encoding_OneByte.java,v
                                        retrieving revision 1.5
                                        diff -u -r1.5 Encoding_OneByte.java
                                        --- Encoding_OneByte.java 20 Jun 2006 06:34:00 -0000 1.5
                                        +++ Encoding_OneByte.java 21 Jun 2006 06:54:26 -0000
                                        @@ -65,25 +65,13 @@
                                        }
                                        }

                                        - byte[] sharedBufferB = new byte[128];
                                        char[] sharedBufferC = new char[128];

                                        // encode
                                        public byte[] encodeToCharset(String str){
                                        - if (EncodingFactory.USE_ENCODING_CACHING) {
                                        byte[] result = new byte[str.length()];
                                        encodeToCharset(str.toCharArray(), 0, str.length(), result);
                                        return result;
                                        - } else {
                                        - if (sharedBufferB.length < str.length())
                                        - sharedBufferB = new byte[str.length()];
                                        -
                                        - int length = encodeToCharset(str.toCharArray(), 0, str.length(), sharedBufferB);
                                        -
                                        - byte[] result = new byte[length];
                                        - System.arraycopy(sharedBufferB, 0, result, 0, length);
                                        - return result;
                                        - }
                                        }

                                        public abstract int encodeToCharset(char[] in, int off, int len, byte[] out);

                                        Roman, where I can download the AS3AP benchmark test you used? I want
                                        to check the speed of encodeToCharset() passing the String instead of
                                        char[] - this way we can avoid new memory allocation in
                                        str.toCharArray().

                                        And lets continue about caching. I want to describe how it can be
                                        implemented and if we agree on the way - I will make the patch.

                                        1) Always use sharedBufferC in Encoding_OneByte

                                        2) Cache Encoding objects in GDSHelper, so all fields of the
                                        connection use the same object.

                                        3) Remove the

                                        protected String iscEncoding = null;
                                        protected String javaEncoding = null;
                                        protected String mappingPath = null;

                                        from each FBField - they are obtained from GDSHelper and are needed
                                        only for encoding/decoding the strings.

                                        4) Move methods encodeString/decodeString from XSQLVAR to FBField

                                        5) Implement in GDSHelper the caching method getEncoding() it would
                                        call non-caching implementaion in EncodingFactory and save obtained
                                        object for reuse.

                                        The problems:

                                        1) In the current implementation every field uses default database
                                        encoding - the field encoding is not used. Is it intentional
                                        optimization ?

                                        2) Is there any chance somebody will be using different objects
                                        belonging to the same connection from different threads?

                                        The code in decodeFromCharset is not thread safe intentionally, so if
                                        it is the common use case to work with connection from different
                                        threads (I personally always use each connection on the single thread)
                                        the Encoding caching should be done per Statement/ResultSet objects.

                                        What do you think ?

                                        --
                                        Best regards,
                                        Alexey mailto:alex+news@...
                                      • Roman Rokytskyy
                                        ... Yes, you re 100% right. The only thing is that I know only one client who is using this feature, so it was definitely not part of the testing. ... No, that
                                        Message 19 of 23 , Jun 21, 2006
                                        • 0 Attachment
                                          > Shouldn't it be "translatorCache.put(encoding, result);" ?
                                          > Or it should be cached at all - what is the role of charMapping
                                          > argument - should it be include in cache key ?

                                          Yes, you're 100% right. The only thing is that I know only one client who is
                                          using this feature, so it was definitely not part of the testing.

                                          > Probably we can use the String constructor:
                                          >
                                          > public String(byte bytes[], int offset, int length, String charsetName)
                                          >
                                          > Though it can be slower than internal implementation.

                                          No, that was the reason of creating the EncodingFactory at all - we tried to
                                          avoid that constructor.

                                          Roman
                                        • Roman Rokytskyy
                                          ... From our CVS - the module name is Benchmarks, there is implementation for C# and Java. ... Ok. ... Ok. ... Probably, but that will break the grouping of
                                          Message 20 of 23 , Jun 21, 2006
                                          • 0 Attachment
                                            > Roman, where I can download the AS3AP benchmark test you used?

                                            From our CVS - the module name is Benchmarks, there is implementation for C#
                                            and Java.

                                            > And lets continue about caching. I want to describe how it can be
                                            > implemented and if we agree on the way - I will make the patch.

                                            > 1) Always use sharedBufferC in Encoding_OneByte

                                            Ok.

                                            > 2) Cache Encoding objects in GDSHelper, so all fields of the
                                            > connection use the same object.
                                            >
                                            > 3) Remove the
                                            >
                                            > protected String iscEncoding = null;
                                            > protected String javaEncoding = null;
                                            > protected String mappingPath = null;
                                            >
                                            > from each FBField - they are obtained from GDSHelper and are needed
                                            > only for encoding/decoding the strings.

                                            Ok.

                                            > 4) Move methods encodeString/decodeString from XSQLVAR to FBField

                                            Probably, but that will break the grouping of the encoding/decoding methods
                                            in XSQLVAR, so unless you present really good reason, I'd prefer not to do
                                            this.

                                            > 5) Implement in GDSHelper the caching method getEncoding() it would
                                            > call non-caching implementaion in EncodingFactory and save obtained
                                            > object for reuse.

                                            See below.

                                            > The problems:

                                            > 1) In the current implementation every field uses default database
                                            > encoding - the field encoding is not used. Is it intentional
                                            > optimization ?

                                            That is not optimization, but the only possible correct implementation. When
                                            we say Firebird our connection encoding (the isc_dpb_lc_ctype property), all
                                            strings should be sent in that encoding. If we start sending each field in
                                            its own encoding, we will corrupt the contents of the strings.

                                            > 2) Is there any chance somebody will be using different objects
                                            > belonging to the same connection from different threads?

                                            Oh yes! That is one of the requirements of the JDBC specification and people
                                            use it.

                                            > The code in decodeFromCharset is not thread safe intentionally, so if
                                            > it is the common use case to work with connection from different
                                            > threads (I personally always use each connection on the single thread)
                                            > the Encoding caching should be done per Statement/ResultSet objects.

                                            But that is what happens right now - each XSQLVAR (the lowest abstraction
                                            for the database field) contains it's personal Encoding instance with its
                                            internal buffers (sharedBufferB and sharedBufferC) allocated to the
                                            appropriate size. When the PreparedStatement or ResultSet accesses the
                                            XSQLVAR object, only the XSQLVAR.sqldata field is updated, the rest fields
                                            remain the same.

                                            This was reported as a problem, since the XSQLVAR objects remain in memory
                                            until Statement or ResultSet objects are garbage collected and each XSQLVAR
                                            contains an allocated buffers. Personally I do not see this as a problem,
                                            since that happens one per field per statement/result set and totally that
                                            should be few kilobytes of memory. But that is the theory, I did not check
                                            that in profiler.

                                            Rick proposed a change where we removed the need in sharedBufferB and
                                            sharedBufferC, the Encoding instances remain cached on XSQLVAR level.

                                            So, considering all said above, do you still think we need another caching?

                                            Roman
                                          • Alexey Panchenko
                                            ... I do not like double memory allocation in decodeFromCharset(). It is used in reading the data from database. Usually the number of reads is greater then
                                            Message 21 of 23 , Jun 21, 2006
                                            • 0 Attachment
                                              Roman Rokytskyy wrote:

                                              > Rick proposed a change where we removed the need in sharedBufferB and
                                              > sharedBufferC, the Encoding instances remain cached on XSQLVAR level.

                                              > So, considering all said above, do you still think we need another caching?

                                              I do not like double memory allocation in decodeFromCharset().

                                              It is used in reading the data from database. Usually the number of
                                              reads is greater then writes, and I do not like the slowdown of major
                                              number database operations.

                                              What about keeping the shared char[] buffer and creating single
                                              Encoding object per Statement/ResultSet.

                                              I do not see a point in simultaneously reading different fields of
                                              ResultSet from different threads :-) So it should be safe enough.

                                              --
                                              Best regards,
                                              Alexey mailto:alex+news@...
                                            • Roman Rokytskyy
                                              ... I fully support you - this part should be optimized as much as possible. ... As far as I understand the specification there is no requirement to provide
                                              Message 22 of 23 , Jun 21, 2006
                                              • 0 Attachment
                                                > I do not like double memory allocation in decodeFromCharset().
                                                >
                                                > It is used in reading the data from database. Usually the number of
                                                > reads is greater then writes, and I do not like the slowdown of major
                                                > number database operations.

                                                I fully support you - this part should be optimized as much as possible.

                                                > What about keeping the shared char[] buffer and creating single
                                                > Encoding object per Statement/ResultSet.
                                                >
                                                > I do not see a point in simultaneously reading different fields of
                                                > ResultSet from different threads :-) So it should be safe enough.

                                                As far as I understand the specification there is no requirement to provide
                                                multithreaded access to the objects other than Connection. But I will check
                                                the specification.

                                                Then I would suggest you to fetch the latest version of the driver from CVS
                                                directly, so you create correct diffs.

                                                Roman
                                              • Rick Debay
                                                If you keep the caching, you must use SynchronizedHashMap or ConcurrentHashMap. The cache as it stands is not thread safe. Multithreaded tests need to be run
                                                Message 23 of 23 , Jun 21, 2006
                                                • 0 Attachment
                                                  If you keep the caching, you must use SynchronizedHashMap or
                                                  ConcurrentHashMap. The cache as it stands is not thread safe.
                                                  Multithreaded tests need to be run to find out if performance degrades
                                                  due to contention.

                                                  -----Original Message-----
                                                  From: Firebird-Java@yahoogroups.com
                                                  [mailto:Firebird-Java@yahoogroups.com] On Behalf Of Roman Rokytskyy
                                                  Sent: Wednesday, June 21, 2006 1:39 AM
                                                  To: Firebird-Java@yahoogroups.com
                                                  Subject: Re: [Firebird-Java] Possible memory leak in EncodingFactory ?

                                                  > So I reason the slow-down must be in the pooling. It would get even
                                                  > slower when changed to a SynchronizedHashMap or ConcurrentHashMap.
                                                  > Let's test with just the memory reduction for now and see what those
                                                  > numbers generate.

                                                  The number are comparable. I guess we stop here. The implementation of
                                                  Encoding interface will be "cached" for the lifetime of the XSQLVAR
                                                  object, won't have allocated memory except the one for the object
                                                  itself.

                                                  Thanks!
                                                  Roman



                                                  ------------------------ Yahoo! Groups Sponsor --------------------~-->
                                                  Yahoo! Groups gets a make over. See the new email design.
                                                  http://us.click.yahoo.com/XISQkA/lOaOAA/yQLSAA/saFolB/TM
                                                  --------------------------------------------------------------------~->


                                                  Yahoo! Groups Links
                                                Your message has been successfully submitted and would be delivered to recipients shortly.