Loading ...
Sorry, an error occurred while loading the content.

Re: [xml-dbms] duplicate entry error with Mysql

Expand Messages
  • Ronald Bourret
    Hello, Sorry it took so long for me to reply. Deadlines, deadlines, deadlines... I also looked at the generateKey subroutine. One thing I don t understand is
    Message 1 of 6 , Nov 29, 2006
    • 0 Attachment
      Hello,

      Sorry it took so long for me to reply. Deadlines, deadlines, deadlines...

      I also looked at the generateKey subroutine. One thing I don't
      understand is where it updates the table so that each subsequent call to
      generateKey gets a different value. Then again, I don't understand PERL,
      so the following line makes no sense to me:

      $newValue = ' 'x(length($maxValue)-length($newValue)) . $newValue;

      In any case, you have several options:

      a) Rewrite the code in generateKey so that both reading the table and
      updating it are enclosed in a single transaction. Even if you do this,
      I'm not entirely convinced that it will work. Depending on how the
      database implements transactions around select statements, it seems
      possible that two calling applications could both read the same result
      -- the first read might not lock the table -- but have their updates
      correctly serialized.

      b) Replace generateKey with code to use a different key generation
      mechanism. For example, you could get the current system time, cast it
      as a string, and generate an integer hash code from the string to get a
      unique key. (This might not work, as two separate processes could
      conceivably retrieve the same time. You could solve this by adding
      something else to the string, such as a process ID, before computing the
      hash code.)

      c) Use the Java XML-DBMS 2.0, which supports keys generated by the
      database, such as MySQL's AUTO_INCREMENT columns.

      d) Modify the PERL version of XML-DBMS 1.x to use MySQL's AUTO_INCREMENT
      columns. Note that at least one DBI driver for PERL does support
      retrieving AUTO_INCREMENT values [1]. I'm not sure how difficult this
      would be, as I haven't looked at the Java XML-DBMS 1.x code in a long
      time and haven't ever looked closely at the PERL code.

      I'm guessing that you would need to modify the insertRow subroutine to
      immediately retrieve the AUTO_INCREMENT value with the mysql_insertid
      method, then place this in the $row object. I'm guessing that the rest
      of the code would then correctly retrieve the value from the $row object
      and use it when linking to other rows. Obviously, you would need to
      check the code to see if this happens.

      Hope this helps,

      -- Ron

      [1] http://mysqld.active-venture.com/Perl_DBI_Class.html

      Chris Van Oosbree wrote:

      > Yes, I am running the parsers in parallel by just having
      > multiple processes run in parallel. No multi-threading
      > or anything like that.
      >
      > Also, yes, I am asking XML-DBMS to generate the keys for me.
      > I did look at the Perl module and I think I see that the issue
      > is indeed caused by this. I'll elaborate.
      >
      > The key generation algorithm looks like it just does a lookup
      > of the max ID currently in a table, and the next available ID
      > would be the key it generates. Here are a few lines of code
      > from the generateKey subroutine:
      >
      > my $selectString = "select max($columnName) from $tableName";
      > my $maxValue = $self->{DBh}->selectrow_array($selectString);
      > my $newValue = $maxValue + 1;
      >
      > The problem I'm having occurs when two processes both look at
      > the same table and see the same max ID and come up with the
      > same next key value and try to insert that row at more or less
      > the same time, hence the duplicate key error.
      >
      > I have all my tables in MySQL setup to have auto-incrementing
      > keys, so I guess I don't really need the code to generate the
      > keys for me. But when a parent element is inserted I need the
      > code to get the ID for the row that was just inserted so it can
      > put that in as a foreign key in the child elements. Can the 1.x
      > version of XML-DBMS do this? I see a feature for 2.0 called
      > "database-generated keys". Is this what I'm looking for?
      >
      > Thanks for your help!
      > Chris
      >
      >
      > ----- Original Message ----
      > From: Ronald Bourret <rpbourret@...>
      > To: xml-dbms@yahoogroups.com
      > Sent: Friday, November 17, 2006 4:25:23 PM
      > Subject: Re: [xml-dbms] duplicate entry error with Mysql
      >
      > Hello,
      >
      > First, I didn't write the Perl version of the code and I don't think the
      > author is on this list any more. So I'll try to help, but might not be
      > able to answer your question.
      >
      > How are you running multiple parsers? Are these separate processes, each
      > of which is calling the Perl version of XML-DBMS as a separate instance?
      > Or are you trying to use multi-threading to run multiple threads through
      > the same instance of XML-DBMS?
      >
      > If you are using multi-threading, I suspect you will have problems. The
      > Java version of XML-DBMS is not designed for multi-threading and I would
      > be very surprised if the Perl version supported multi-threading.
      >
      > My other question is whether your map document tells XML-DBMS to
      > generate keys. This strikes me as the most likely cause of the problem,
      > although I will need to look into the code as to how exactly this might
      > be happening.
      >
      > -- Ron
      >
      > clonefan98 wrote:
      >
      >
      >>Hi,
      >>
      >>I'm trying to use the XML-DBMS package to parse several thousand XML
      >>files into a database. I'd like to be able to run more than one
      >>instance of the parser at once, but I am running into problems.
      >>
      >>I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
      >>version 5.0.26.
      >>
      >>What is happening is that when I run more than one instance of the
      >>parser I eventually run into duplicate key errors. Here is an
      >>example of one of the errors:
      >>Duplicate entry '9740' for key 1 at ./XmlRetrieve. pl line 36
      >>
      >>If I run only one parser over the same set of files it works fine, I
      >>only run into this when I have more than one running at a time. So
      >>there seems to be some contention between the multiple instances.
      >>And it doesn't fail in the same place every time. It seems to me
      >>that it's random chance if one of the instances get its "toes stepped
      >>on" so to speak and errors out.
      >>
      >>Do you have any ideas? Let me know if there is any other info you
      >>need.
      >
      >
      >
      >
      >
      >
      >
      > ____________________________________________________________________________________
      > Sponsored Link
      >
      > Mortgage rates near 39yr lows.
      > $420k for $1,399/mo. Calculate new payment!
      > www.LowerMyBills.com/lre
      >
      > [Non-text portions of this message have been removed]
      >
      >
      >
      > To post a message, send it to: xml-dbms@yahoogroups.com
      > To unsubscribe, send a blank message to: xml-dbms-unsubscribe@yahoogroups.com
      > Yahoo! Groups Links
      >
      >
      >
      >
      >
      >
    • Chris Van Oosbree
      Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn t
      Message 2 of 6 , Nov 30, 2006
      • 0 Attachment
        Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn't have much luck though. I think my best bet will be to use the newer Java version as you suggested.

        Thanks,
        Chris


        ----- Original Message ----
        From: Ronald Bourret <rpbourret@...>
        To: xml-dbms@yahoogroups.com
        Sent: Wednesday, November 29, 2006 10:54:42 PM
        Subject: Re: [xml-dbms] duplicate entry error with Mysql

        Hello,

        Sorry it took so long for me to reply. Deadlines, deadlines, deadlines...

        I also looked at the generateKey subroutine. One thing I don't
        understand is where it updates the table so that each subsequent call to
        generateKey gets a different value. Then again, I don't understand PERL,
        so the following line makes no sense to me:

        $newValue = ' 'x(length($maxValue )-length( $newValue) ) . $newValue;

        In any case, you have several options:

        a) Rewrite the code in generateKey so that both reading the table and
        updating it are enclosed in a single transaction. Even if you do this,
        I'm not entirely convinced that it will work. Depending on how the
        database implements transactions around select statements, it seems
        possible that two calling applications could both read the same result
        -- the first read might not lock the table -- but have their updates
        correctly serialized.

        b) Replace generateKey with code to use a different key generation
        mechanism. For example, you could get the current system time, cast it
        as a string, and generate an integer hash code from the string to get a
        unique key. (This might not work, as two separate processes could
        conceivably retrieve the same time. You could solve this by adding
        something else to the string, such as a process ID, before computing the
        hash code.)

        c) Use the Java XML-DBMS 2.0, which supports keys generated by the
        database, such as MySQL's AUTO_INCREMENT columns.

        d) Modify the PERL version of XML-DBMS 1.x to use MySQL's AUTO_INCREMENT
        columns. Note that at least one DBI driver for PERL does support
        retrieving AUTO_INCREMENT values [1]. I'm not sure how difficult this
        would be, as I haven't looked at the Java XML-DBMS 1.x code in a long
        time and haven't ever looked closely at the PERL code.

        I'm guessing that you would need to modify the insertRow subroutine to
        immediately retrieve the AUTO_INCREMENT value with the mysql_insertid
        method, then place this in the $row object. I'm guessing that the rest
        of the code would then correctly retrieve the value from the $row object
        and use it when linking to other rows. Obviously, you would need to
        check the code to see if this happens.

        Hope this helps,

        -- Ron

        [1] http://mysqld. active-venture. com/Perl_ DBI_Class. html

        Chris Van Oosbree wrote:

        > Yes, I am running the parsers in parallel by just having
        > multiple processes run in parallel. No multi-threading
        > or anything like that.
        >
        > Also, yes, I am asking XML-DBMS to generate the keys for me.
        > I did look at the Perl module and I think I see that the issue
        > is indeed caused by this. I'll elaborate.
        >
        > The key generation algorithm looks like it just does a lookup
        > of the max ID currently in a table, and the next available ID
        > would be the key it generates. Here are a few lines of code
        > from the generateKey subroutine:
        >
        > my $selectString = "select max($columnName) from $tableName";
        > my $maxValue = $self->{DBh} ->selectrow_ array($selectStr ing);
        > my $newValue = $maxValue + 1;
        >
        > The problem I'm having occurs when two processes both look at
        > the same table and see the same max ID and come up with the
        > same next key value and try to insert that row at more or less
        > the same time, hence the duplicate key error.
        >
        > I have all my tables in MySQL setup to have auto-incrementing
        > keys, so I guess I don't really need the code to generate the
        > keys for me. But when a parent element is inserted I need the
        > code to get the ID for the row that was just inserted so it can
        > put that in as a foreign key in the child elements. Can the 1.x
        > version of XML-DBMS do this? I see a feature for 2.0 called
        > "database-generated keys". Is this what I'm looking for?
        >
        > Thanks for your help!
        > Chris
        >
        >
        > ----- Original Message ----
        > From: Ronald Bourret <rpbourret@rpbourret .com>
        > To: xml-dbms@yahoogroup s.com
        > Sent: Friday, November 17, 2006 4:25:23 PM
        > Subject: Re: [xml-dbms] duplicate entry error with Mysql
        >
        > Hello,
        >
        > First, I didn't write the Perl version of the code and I don't think the
        > author is on this list any more. So I'll try to help, but might not be
        > able to answer your question.
        >
        > How are you running multiple parsers? Are these separate processes, each
        > of which is calling the Perl version of XML-DBMS as a separate instance?
        > Or are you trying to use multi-threading to run multiple threads through
        > the same instance of XML-DBMS?
        >
        > If you are using multi-threading, I suspect you will have problems. The
        > Java version of XML-DBMS is not designed for multi-threading and I would
        > be very surprised if the Perl version supported multi-threading.
        >
        > My other question is whether your map document tells XML-DBMS to
        > generate keys. This strikes me as the most likely cause of the problem,
        > although I will need to look into the code as to how exactly this might
        > be happening.
        >
        > -- Ron
        >
        > clonefan98 wrote:
        >
        >
        >>Hi,
        >>
        >>I'm trying to use the XML-DBMS package to parse several thousand XML
        >>files into a database. I'd like to be able to run more than one
        >>instance of the parser at once, but I am running into problems.
        >>
        >>I am using the Perl version of XML-DBMS (I don't know Java) and MySQL
        >>version 5.0.26.
        >>
        >>What is happening is that when I run more than one instance of the
        >>parser I eventually run into duplicate key errors. Here is an
        >>example of one of the errors:
        >>Duplicate entry '9740' for key 1 at ./XmlRetrieve. pl line 36
        >>
        >>If I run only one parser over the same set of files it works fine, I
        >>only run into this when I have more than one running at a time. So
        >>there seems to be some contention between the multiple instances.
        >>And it doesn't fail in the same place every time. It seems to me
        >>that it's random chance if one of the instances get its "toes stepped
        >>on" so to speak and errors out.
        >>
        >>Do you have any ideas? Let me know if there is any other info you
        >>need.
        >
        >
        >
        >
        >
        >
        >
        > ____________ _________ _________ _________ _________ _________ _
        > Sponsored Link
        >
        > Mortgage rates near 39yr lows.
        > $420k for $1,399/mo. Calculate new payment!
        > www.LowerMyBills. com/lre
        >
        > [Non-text portions of this message have been removed]
        >
        >
        >
        > To post a message, send it to: xml-dbms@yahoogroup s.com
        > To unsubscribe, send a blank message to: xml-dbms-unsubscrib e@yahoogroups. com
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
        >






        ____________________________________________________________________________________
        Do you Yahoo!?
        Everyone is raving about the all-new Yahoo! Mail beta.
        http://new.mail.yahoo.com

        [Non-text portions of this message have been removed]
      • Ronald Bourret
        Sounds good. You get the added advantage of all the 2.0 features as well -- updates, deletes, soft inserts, etc. -- Ron
        Message 3 of 6 , Nov 30, 2006
        • 0 Attachment
          Sounds good. You get the added advantage of all the 2.0 features as well
          -- updates, deletes, soft inserts, etc.

          -- Ron

          Chris Van Oosbree wrote:

          > Thanks for the reply. I did try messing around with the Perl code to see if I could change it such that it would retrieve the insertid and use it. I didn't have much luck though. I think my best bet will be to use the newer Java version as you suggested.
          >
          > Thanks,
          > Chris
        Your message has been successfully submitted and would be delivered to recipients shortly.