Loading ...
Sorry, an error occurred while loading the content.

duplicate entry error with Mysql

Expand Messages
  • clonefan98
    Hi, I m trying to use the XML-DBMS package to parse several thousand XML files into a database. I d like to be able to run more than one instance of the
    Message 1 of 6 , Nov 17, 2006
    • 0 Attachment
      Hi,

      I'm trying to use the XML-DBMS package to parse several thousand XML
      files into a database. I'd like to be able to run more than one
      instance of the parser at once, but I am running into problems.

      I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
      version 5.0.26.

      What is happening is that when I run more than one instance of the
      parser I eventually run into duplicate key errors. Here is an
      example of one of the errors:
      Duplicate entry '9740' for key 1 at ./XmlRetrieve.pl line 36

      If I run only one parser over the same set of files it works fine, I
      only run into this when I have more than one running at a time. So
      there seems to be some contention between the multiple instances.
      And it doesn't fail in the same place every time. It seems to me
      that it's random chance if one of the instances get its "toes stepped
      on" so to speak and errors out.

      Do you have any ideas? Let me know if there is any other info you
      need.

      Thanks,
      Chris
    • Ronald Bourret
      Hello, First, I didn t write the Perl version of the code and I don t think the author is on this list any more. So I ll try to help, but might not be able to
      Message 2 of 6 , Nov 17, 2006
      • 0 Attachment
        Hello,

        First, I didn't write the Perl version of the code and I don't think the
        author is on this list any more. So I'll try to help, but might not be
        able to answer your question.

        How are you running multiple parsers? Are these separate processes, each
        of which is calling the Perl version of XML-DBMS as a separate instance?
        Or are you trying to use multi-threading to run multiple threads through
        the same instance of XML-DBMS?

        If you are using multi-threading, I suspect you will have problems. The
        Java version of XML-DBMS is not designed for multi-threading and I would
        be very surprised if the Perl version supported multi-threading.

        My other question is whether your map document tells XML-DBMS to
        generate keys. This strikes me as the most likely cause of the problem,
        although I will need to look into the code as to how exactly this might
        be happening.

        -- Ron

        clonefan98 wrote:

        > Hi,
        >
        > I'm trying to use the XML-DBMS package to parse several thousand XML
        > files into a database. I'd like to be able to run more than one
        > instance of the parser at once, but I am running into problems.
        >
        > I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
        > version 5.0.26.
        >
        > What is happening is that when I run more than one instance of the
        > parser I eventually run into duplicate key errors. Here is an
        > example of one of the errors:
        > Duplicate entry '9740' for key 1 at ./XmlRetrieve.pl line 36
        >
        > If I run only one parser over the same set of files it works fine, I
        > only run into this when I have more than one running at a time. So
        > there seems to be some contention between the multiple instances.
        > And it doesn't fail in the same place every time. It seems to me
        > that it's random chance if one of the instances get its "toes stepped
        > on" so to speak and errors out.
        >
        > Do you have any ideas? Let me know if there is any other info you
        > need.
      • Chris Van Oosbree
        Yes, I am running the parsers in parallel by just having multiple processes run in parallel. No multi-threading or anything like that. Also, yes, I am asking
        Message 3 of 6 , Nov 19, 2006
        • 0 Attachment
          Yes, I am running the parsers in parallel by just having multiple processes run in parallel. No multi-threading or anything like that.

          Also, yes, I am asking XML-DBMS to generate the keys for me. I did look at the Perl module and I think I see that the issue is indeed caused by this. I'll elaborate.

          The key generation algorithm looks like it just does a lookup of the max ID currently in a table, and the next available ID would be the key it generates. Here are a few lines of code from the generateKey subroutine:
          my $selectString = "select max($columnName) from $tableName";
          my $maxValue = $self->{DBh}->selectrow_array($selectString);
          my $newValue = $maxValue + 1;

          The problem I'm having occurs when two processes both look at the same table and see the same max ID and come up with the same next key value and try to insert that row at more or less the same time, hence the duplicate key error.

          I have all my tables in MySQL setup to have auto-incrementing keys, so I guess I don't really need the code to generate the keys for me. But when a parent element is inserted I need the code to get the ID for the row that was just inserted so it can put that in as a foreign key in the child elements. Can the 1.x version of XML-DBMS do this? I see a feature for 2.0 called "database-generated keys". Is this what I'm looking for?

          Thanks for your help!
          Chris


          ----- Original Message ----
          From: Ronald Bourret <rpbourret@...>
          To: xml-dbms@yahoogroups.com
          Sent: Friday, November 17, 2006 4:25:23 PM
          Subject: Re: [xml-dbms] duplicate entry error with Mysql

          Hello,

          First, I didn't write the Perl version of the code and I don't think the
          author is on this list any more. So I'll try to help, but might not be
          able to answer your question.

          How are you running multiple parsers? Are these separate processes, each
          of which is calling the Perl version of XML-DBMS as a separate instance?
          Or are you trying to use multi-threading to run multiple threads through
          the same instance of XML-DBMS?

          If you are using multi-threading, I suspect you will have problems. The
          Java version of XML-DBMS is not designed for multi-threading and I would
          be very surprised if the Perl version supported multi-threading.

          My other question is whether your map document tells XML-DBMS to
          generate keys. This strikes me as the most likely cause of the problem,
          although I will need to look into the code as to how exactly this might
          be happening.

          -- Ron

          clonefan98 wrote:

          > Hi,
          >
          > I'm trying to use the XML-DBMS package to parse several thousand XML
          > files into a database. I'd like to be able to run more than one
          > instance of the parser at once, but I am running into problems.
          >
          > I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
          > version 5.0.26.
          >
          > What is happening is that when I run more than one instance of the
          > parser I eventually run into duplicate key errors. Here is an
          > example of one of the errors:
          > Duplicate entry '9740' for key 1 at ./XmlRetrieve. pl line 36
          >
          > If I run only one parser over the same set of files it works fine, I
          > only run into this when I have more than one running at a time. So
          > there seems to be some contention between the multiple instances.
          > And it doesn't fail in the same place every time. It seems to me
          > that it's random chance if one of the instances get its "toes stepped
          > on" so to speak and errors out.
          >
          > Do you have any ideas? Let me know if there is any other info you
          > need.






          ____________________________________________________________________________________
          Sponsored Link

          Mortgage rates near 39yr lows.
          $420k for $1,399/mo. Calculate new payment!
          www.LowerMyBills.com/lre

          [Non-text portions of this message have been removed]
        • Ronald Bourret
          Hello, Sorry it took so long for me to reply. Deadlines, deadlines, deadlines... I also looked at the generateKey subroutine. One thing I don t understand is
          Message 4 of 6 , Nov 29, 2006
          • 0 Attachment
            Hello,

            Sorry it took so long for me to reply. Deadlines, deadlines, deadlines...

            I also looked at the generateKey subroutine. One thing I don't
            understand is where it updates the table so that each subsequent call to
            generateKey gets a different value. Then again, I don't understand PERL,
            so the following line makes no sense to me:

            $newValue = ' 'x(length($maxValue)-length($newValue)) . $newValue;

            In any case, you have several options:

            a) Rewrite the code in generateKey so that both reading the table and
            updating it are enclosed in a single transaction. Even if you do this,
            I'm not entirely convinced that it will work. Depending on how the
            database implements transactions around select statements, it seems
            possible that two calling applications could both read the same result
            -- the first read might not lock the table -- but have their updates
            correctly serialized.

            b) Replace generateKey with code to use a different key generation
            mechanism. For example, you could get the current system time, cast it
            as a string, and generate an integer hash code from the string to get a
            unique key. (This might not work, as two separate processes could
            conceivably retrieve the same time. You could solve this by adding
            something else to the string, such as a process ID, before computing the
            hash code.)

            c) Use the Java XML-DBMS 2.0, which supports keys generated by the
            database, such as MySQL's AUTO_INCREMENT columns.

            d) Modify the PERL version of XML-DBMS 1.x to use MySQL's AUTO_INCREMENT
            columns. Note that at least one DBI driver for PERL does support
            retrieving AUTO_INCREMENT values [1]. I'm not sure how difficult this
            would be, as I haven't looked at the Java XML-DBMS 1.x code in a long
            time and haven't ever looked closely at the PERL code.

            I'm guessing that you would need to modify the insertRow subroutine to
            immediately retrieve the AUTO_INCREMENT value with the mysql_insertid
            method, then place this in the $row object. I'm guessing that the rest
            of the code would then correctly retrieve the value from the $row object
            and use it when linking to other rows. Obviously, you would need to
            check the code to see if this happens.

            Hope this helps,

            -- Ron

            [1] http://mysqld.active-venture.com/Perl_DBI_Class.html

            Chris Van Oosbree wrote:

            > Yes, I am running the parsers in parallel by just having
            > multiple processes run in parallel. No multi-threading
            > or anything like that.
            >
            > Also, yes, I am asking XML-DBMS to generate the keys for me.
            > I did look at the Perl module and I think I see that the issue
            > is indeed caused by this. I'll elaborate.
            >
            > The key generation algorithm looks like it just does a lookup
            > of the max ID currently in a table, and the next available ID
            > would be the key it generates. Here are a few lines of code
            > from the generateKey subroutine:
            >
            > my $selectString = "select max($columnName) from $tableName";
            > my $maxValue = $self->{DBh}->selectrow_array($selectString);
            > my $newValue = $maxValue + 1;
            >
            > The problem I'm having occurs when two processes both look at
            > the same table and see the same max ID and come up with the
            > same next key value and try to insert that row at more or less
            > the same time, hence the duplicate key error.
            >
            > I have all my tables in MySQL setup to have auto-incrementing
            > keys, so I guess I don't really need the code to generate the
            > keys for me. But when a parent element is inserted I need the
            > code to get the ID for the row that was just inserted so it can
            > put that in as a foreign key in the child elements. Can the 1.x
            > version of XML-DBMS do this? I see a feature for 2.0 called
            > "database-generated keys". Is this what I'm looking for?
            >
            > Thanks for your help!
            > Chris
            >
            >
            > ----- Original Message ----
            > From: Ronald Bourret <rpbourret@...>
            > To: xml-dbms@yahoogroups.com
            > Sent: Friday, November 17, 2006 4:25:23 PM
            > Subject: Re: [xml-dbms] duplicate entry error with Mysql
            >
            > Hello,
            >
            > First, I didn't write the Perl version of the code and I don't think the
            > author is on this list any more. So I'll try to help, but might not be
            > able to answer your question.
            >
            > How are you running multiple parsers? Are these separate processes, each
            > of which is calling the Perl version of XML-DBMS as a separate instance?
            > Or are you trying to use multi-threading to run multiple threads through
            > the same instance of XML-DBMS?
            >
            > If you are using multi-threading, I suspect you will have problems. The
            > Java version of XML-DBMS is not designed for multi-threading and I would
            > be very surprised if the Perl version supported multi-threading.
            >
            > My other question is whether your map document tells XML-DBMS to
            > generate keys. This strikes me as the most likely cause of the problem,
            > although I will need to look into the code as to how exactly this might
            > be happening.
            >
            > -- Ron
            >
            > clonefan98 wrote:
            >
            >
            >>Hi,
            >>
            >>I'm trying to use the XML-DBMS package to parse several thousand XML
            >>files into a database. I'd like to be able to run more than one
            >>instance of the parser at once, but I am running into problems.
            >>
            >>I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
            >>version 5.0.26.
            >>
            >>What is happening is that when I run more than one instance of the
            >>parser I eventually run into duplicate key errors. Here is an
            >>example of one of the errors:
            >>Duplicate entry '9740' for key 1 at ./XmlRetrieve. pl line 36
            >>
            >>If I run only one parser over the same set of files it works fine, I
            >>only run into this when I have more than one running at a time. So
            >>there seems to be some contention between the multiple instances.
            >>And it doesn't fail in the same place every time. It seems to me
            >>that it's random chance if one of the instances get its "toes stepped
            >>on" so to speak and errors out.
            >>
            >>Do you have any ideas? Let me know if there is any other info you
            >>need.
            >
            >
            >
            >
            >
            >
            >
            > ____________________________________________________________________________________
            > Sponsored Link
            >
            > Mortgage rates near 39yr lows.
            > $420k for $1,399/mo. Calculate new payment!
            > www.LowerMyBills.com/lre
            >
            > [Non-text portions of this message have been removed]
            >
            >
            >
            > To post a message, send it to: xml-dbms@yahoogroups.com
            > To unsubscribe, send a blank message to: xml-dbms-unsubscribe@yahoogroups.com
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
          • Chris Van Oosbree
            Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn t
            Message 5 of 6 , Nov 30, 2006
            • 0 Attachment
              Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn't have much luck though. I think my best bet will be to use the newer Java version as you suggested.

              Thanks,
              Chris


              ----- Original Message ----
              From: Ronald Bourret <rpbourret@...>
              To: xml-dbms@yahoogroups.com
              Sent: Wednesday, November 29, 2006 10:54:42 PM
              Subject: Re: [xml-dbms] duplicate entry error with Mysql

              Hello,

              Sorry it took so long for me to reply. Deadlines, deadlines, deadlines...

              I also looked at the generateKey subroutine. One thing I don't
              understand is where it updates the table so that each subsequent call to
              generateKey gets a different value. Then again, I don't understand PERL,
              so the following line makes no sense to me:

              $newValue = ' 'x(length($maxValue )-length( $newValue) ) . $newValue;

              In any case, you have several options:

              a) Rewrite the code in generateKey so that both reading the table and
              updating it are enclosed in a single transaction. Even if you do this,
              I'm not entirely convinced that it will work. Depending on how the
              database implements transactions around select statements, it seems
              possible that two calling applications could both read the same result
              -- the first read might not lock the table -- but have their updates
              correctly serialized.

              b) Replace generateKey with code to use a different key generation
              mechanism. For example, you could get the current system time, cast it
              as a string, and generate an integer hash code from the string to get a
              unique key. (This might not work, as two separate processes could
              conceivably retrieve the same time. You could solve this by adding
              something else to the string, such as a process ID, before computing the
              hash code.)

              c) Use the Java XML-DBMS 2.0, which supports keys generated by the
              database, such as MySQL's AUTO_INCREMENT columns.

              d) Modify the PERL version of XML-DBMS 1.x to use MySQL's AUTO_INCREMENT
              columns. Note that at least one DBI driver for PERL does support
              retrieving AUTO_INCREMENT values [1]. I'm not sure how difficult this
              would be, as I haven't looked at the Java XML-DBMS 1.x code in a long
              time and haven't ever looked closely at the PERL code.

              I'm guessing that you would need to modify the insertRow subroutine to
              immediately retrieve the AUTO_INCREMENT value with the mysql_insertid
              method, then place this in the $row object. I'm guessing that the rest
              of the code would then correctly retrieve the value from the $row object
              and use it when linking to other rows. Obviously, you would need to
              check the code to see if this happens.

              Hope this helps,

              -- Ron

              [1] http://mysqld. active-venture. com/Perl_ DBI_Class. html

              Chris Van Oosbree wrote:

              > Yes, I am running the parsers in parallel by just having
              > multiple processes run in parallel. No multi-threading
              > or anything like that.
              >
              > Also, yes, I am asking XML-DBMS to generate the keys for me.
              > I did look at the Perl module and I think I see that the issue
              > is indeed caused by this. I'll elaborate.
              >
              > The key generation algorithm looks like it just does a lookup
              > of the max ID currently in a table, and the next available ID
              > would be the key it generates. Here are a few lines of code
              > from the generateKey subroutine:
              >
              > my $selectString = "select max($columnName) from $tableName";
              > my $maxValue = $self->{DBh} ->selectrow_ array($selectStr ing);
              > my $newValue = $maxValue + 1;
              >
              > The problem I'm having occurs when two processes both look at
              > the same table and see the same max ID and come up with the
              > same next key value and try to insert that row at more or less
              > the same time, hence the duplicate key error.
              >
              > I have all my tables in MySQL setup to have auto-incrementing
              > keys, so I guess I don't really need the code to generate the
              > keys for me. But when a parent element is inserted I need the
              > code to get the ID for the row that was just inserted so it can
              > put that in as a foreign key in the child elements. Can the 1.x
              > version of XML-DBMS do this? I see a feature for 2.0 called
              > "database-generated keys". Is this what I'm looking for?
              >
              > Thanks for your help!
              > Chris
              >
              >
              > ----- Original Message ----
              > From: Ronald Bourret <rpbourret@rpbourret .com>
              > To: xml-dbms@yahoogroup s.com
              > Sent: Friday, November 17, 2006 4:25:23 PM
              > Subject: Re: [xml-dbms] duplicate entry error with Mysql
              >
              > Hello,
              >
              > First, I didn't write the Perl version of the code and I don't think the
              > author is on this list any more. So I'll try to help, but might not be
              > able to answer your question.
              >
              > How are you running multiple parsers? Are these separate processes, each
              > of which is calling the Perl version of XML-DBMS as a separate instance?
              > Or are you trying to use multi-threading to run multiple threads through
              > the same instance of XML-DBMS?
              >
              > If you are using multi-threading, I suspect you will have problems. The
              > Java version of XML-DBMS is not designed for multi-threading and I would
              > be very surprised if the Perl version supported multi-threading.
              >
              > My other question is whether your map document tells XML-DBMS to
              > generate keys. This strikes me as the most likely cause of the problem,
              > although I will need to look into the code as to how exactly this might
              > be happening.
              >
              > -- Ron
              >
              > clonefan98 wrote:
              >
              >
              >>Hi,
              >>
              >>I'm trying to use the XML-DBMS package to parse several thousand XML
              >>files into a database. I'd like to be able to run more than one
              >>instance of the parser at once, but I am running into problems.
              >>
              >>I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
              >>version 5.0.26.
              >>
              >>What is happening is that when I run more than one instance of the
              >>parser I eventually run into duplicate key errors. Here is an
              >>example of one of the errors:
              >>Duplicate entry '9740' for key 1 at ./XmlRetrieve. pl line 36
              >>
              >>If I run only one parser over the same set of files it works fine, I
              >>only run into this when I have more than one running at a time. So
              >>there seems to be some contention between the multiple instances.
              >>And it doesn't fail in the same place every time. It seems to me
              >>that it's random chance if one of the instances get its "toes stepped
              >>on" so to speak and errors out.
              >>
              >>Do you have any ideas? Let me know if there is any other info you
              >>need.
              >
              >
              >
              >
              >
              >
              >
              > ____________ _________ _________ _________ _________ _________ _
              > Sponsored Link
              >
              > Mortgage rates near 39yr lows.
              > $420k for $1,399/mo. Calculate new payment!
              > www.LowerMyBills. com/lre
              >
              > [Non-text portions of this message have been removed]
              >
              >
              >
              > To post a message, send it to: xml-dbms@yahoogroup s.com
              > To unsubscribe, send a blank message to: xml-dbms-unsubscrib e@yahoogroups. com
              > Yahoo! Groups Links
              >
              >
              >
              >
              >
              >






              ____________________________________________________________________________________
              Do you Yahoo!?
              Everyone is raving about the all-new Yahoo! Mail beta.
              http://new.mail.yahoo.com

              [Non-text portions of this message have been removed]
            • Ronald Bourret
              Sounds good. You get the added advantage of all the 2.0 features as well -- updates, deletes, soft inserts, etc. -- Ron
              Message 6 of 6 , Nov 30, 2006
              • 0 Attachment
                Sounds good. You get the added advantage of all the 2.0 features as well
                -- updates, deletes, soft inserts, etc.

                -- Ron

                Chris Van Oosbree wrote:

                > Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn't have much luck though. I think my best bet will be to use the newer Java version as you suggested.
                >
                > Thanks,
                > Chris
              Your message has been successfully submitted and would be delivered to recipients shortly.