Loading ...
Sorry, an error occurred while loading the content.

Re: Taffy DB max size limitations?

Expand Messages
  • linuxdan
    I think it s a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1. I haven t tested on version 2.3.3 yet. I ll
    Message 1 of 15 , Sep 24, 2011
    View Source
    • 0 Attachment
      I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.

      I haven't tested on version 2.3.3 yet. I'll download it today.

      Should I see a difference on load time if I am using a JSON string load?

      When I say JSON string I mean like on your example page :
      // Create a new database using a JSON string
      var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')

      Except my array has 5000+ entries.

      I don't use the function you mentioned :
      JSON.parse()

      I had another question. How can I find redundant entries in my DB.
      My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?

      I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.

      Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
      Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?

      Thanks very much for all your help by the way :)

      Thanks
      Dan


      --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
      >
      > Hey Dan,
      >
      > Do you see any improvement by upgrading to the latest release?
      >
      > How many columns are you working with per row? I was testing with 9 no problem.
      >
      > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
      >
      > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
      >
      > var data = eJSON.pars(yourdata);
      > var db = TAFFY();
      > for(var x = 0;x<3;x++) {
      > setTImeout(function () {
      > db.insert(data.slice((x*2500),((x+1)*2500)));
      > });
      > }
      >
      > Ian
      >
      > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
      > >
      > > Hi,
      > >
      > > Thanks for the reply.
      > >
      > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
      > >
      > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
      > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
      > >
      > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
      > >
      > > Thanks
      > > Dan
      > >
      > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
      > > >
      > > > Hey Dean,
      > > >
      > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
      > > >
      > > > Upgrade and let us know how it goes:
      > > >
      > > > https://github.com/typicaljoe/taffydb
      > > >
      > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
      > > >
      > > > Ian
      > > >
      > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
      > > > >
      > > > > Hello,
      > > > >
      > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
      > > > >
      > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
      > > > >
      > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
      > > > >
      > > > > Thanks
      > > > > Dan
      > > > >
      > > >
      > >
      >
    • taffydb-owner@yahoogroups.com
      Hey Dan, 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading. At that size of data you are stretching the
      Message 2 of 15 , Sep 24, 2011
      View Source
      • 0 Attachment
        Hey Dan,

        2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.

        At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.

        I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.

        var db = TAFFY(yourdata);
        var unique = {};
        var removeThese = [];

        db().each(function (r) {
        var k = r.ssn + "_" + r.first + "_" + r.last;
        if (unique[k]) {
        // it is a dup! Kill it like a spider!
        removeThese.push(r.___id);
        } else {
        // it is unique! add it so we can spot dups
        unique[k] = true;
        }
        })

        // kill the dups
        db(removeThese).remove();

        Ian

        --- In taffydb@yahoogroups.com, linuxdan <no_reply@...> wrote:
        >
        >
        > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
        >
        > I haven't tested on version 2.3.3 yet. I'll download it today.
        >
        > Should I see a difference on load time if I am using a JSON string load?
        >
        > When I say JSON string I mean like on your example page :
        > // Create a new database using a JSON string
        > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
        >
        > Except my array has 5000+ entries.
        >
        > I don't use the function you mentioned :
        > JSON.parse()
        >
        > I had another question. How can I find redundant entries in my DB.
        > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
        >
        > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
        >
        > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
        > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
        >
        > Thanks very much for all your help by the way :)
        >
        > Thanks
        > Dan
        >
        >
        > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
        > >
        > > Hey Dan,
        > >
        > > Do you see any improvement by upgrading to the latest release?
        > >
        > > How many columns are you working with per row? I was testing with 9 no problem.
        > >
        > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
        > >
        > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
        > >
        > > var data = eJSON.pars(yourdata);
        > > var db = TAFFY();
        > > for(var x = 0;x<3;x++) {
        > > setTImeout(function () {
        > > db.insert(data.slice((x*2500),((x+1)*2500)));
        > > });
        > > }
        > >
        > > Ian
        > >
        > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
        > > >
        > > > Hi,
        > > >
        > > > Thanks for the reply.
        > > >
        > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
        > > >
        > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
        > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
        > > >
        > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
        > > >
        > > > Thanks
        > > > Dan
        > > >
        > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
        > > > >
        > > > > Hey Dean,
        > > > >
        > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
        > > > >
        > > > > Upgrade and let us know how it goes:
        > > > >
        > > > > https://github.com/typicaljoe/taffydb
        > > > >
        > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
        > > > >
        > > > > Ian
        > > > >
        > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
        > > > > >
        > > > > > Hello,
        > > > > >
        > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
        > > > > >
        > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
        > > > > >
        > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
        > > > > >
        > > > > > Thanks
        > > > > > Dan
        > > > > >
        > > > >
        > > >
        > >
        >
      • linuxdan
        Ok I downloaded version 2.3.3, I have to test it now. I think I forgot to answer one of your questions. My CSV file has 25 columns and 10,000 lines. But not
        Message 3 of 15 , Sep 24, 2011
        View Source
        • 0 Attachment
          Ok I downloaded version 2.3.3, I have to test it now.

          I think I forgot to answer one of your questions. My CSV file has 25 columns and 10,000 lines. But not all columns always have data, they are sometimes empty. (some of the columns)
          To give you an idea, the CSV txt file is 1.3 MB in size.

          Well, it's cool to test out TAFFY on some larger data sets :) gives you an idea of how far it can be pushed :)

          But this is obviously not your typical ajax/web2.0 use of the library.
          I could have written my data processing in php, perl, java ... but since I wanted to test out TAFFY, I figured I'd give it a shot in JS.

          Great little piece of software you wrote, by the way, and thanks for making it available to everyone :)

          Dan

          --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
          >
          > Hey Dan,
          >
          > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
          >
          > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
          >
          > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
          >
          > var db = TAFFY(yourdata);
          > var unique = {};
          > var removeThese = [];
          >
          > db().each(function (r) {
          > var k = r.ssn + "_" + r.first + "_" + r.last;
          > if (unique[k]) {
          > // it is a dup! Kill it like a spider!
          > removeThese.push(r.___id);
          > } else {
          > // it is unique! add it so we can spot dups
          > unique[k] = true;
          > }
          > })
          >
          > // kill the dups
          > db(removeThese).remove();
          >
          > Ian
          >
          > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
          > >
          > >
          > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
          > >
          > > I haven't tested on version 2.3.3 yet. I'll download it today.
          > >
          > > Should I see a difference on load time if I am using a JSON string load?
          > >
          > > When I say JSON string I mean like on your example page :
          > > // Create a new database using a JSON string
          > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
          > >
          > > Except my array has 5000+ entries.
          > >
          > > I don't use the function you mentioned :
          > > JSON.parse()
          > >
          > > I had another question. How can I find redundant entries in my DB.
          > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
          > >
          > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
          > >
          > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
          > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
          > >
          > > Thanks very much for all your help by the way :)
          > >
          > > Thanks
          > > Dan
          > >
          > >
          > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
          > > >
          > > > Hey Dan,
          > > >
          > > > Do you see any improvement by upgrading to the latest release?
          > > >
          > > > How many columns are you working with per row? I was testing with 9 no problem.
          > > >
          > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
          > > >
          > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
          > > >
          > > > var data = eJSON.pars(yourdata);
          > > > var db = TAFFY();
          > > > for(var x = 0;x<3;x++) {
          > > > setTImeout(function () {
          > > > db.insert(data.slice((x*2500),((x+1)*2500)));
          > > > });
          > > > }
          > > >
          > > > Ian
          > > >
          > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
          > > > >
          > > > > Hi,
          > > > >
          > > > > Thanks for the reply.
          > > > >
          > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
          > > > >
          > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
          > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
          > > > >
          > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
          > > > >
          > > > > Thanks
          > > > > Dan
          > > > >
          > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
          > > > > >
          > > > > > Hey Dean,
          > > > > >
          > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
          > > > > >
          > > > > > Upgrade and let us know how it goes:
          > > > > >
          > > > > > https://github.com/typicaljoe/taffydb
          > > > > >
          > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
          > > > > >
          > > > > > Ian
          > > > > >
          > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
          > > > > > >
          > > > > > > Hello,
          > > > > > >
          > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
          > > > > > >
          > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
          > > > > > >
          > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
          > > > > > >
          > > > > > > Thanks
          > > > > > > Dan
          > > > > > >
          > > > > >
          > > > >
          > > >
          > >
          >
        • linuxdan
          Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3 Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails
          Message 4 of 15 , Sep 24, 2011
          View Source
          • 0 Attachment
            Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3

            Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)

            Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.

            Any ideas?

            Dan


            --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
            >
            > Hey Dan,
            >
            > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
            >
            > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
            >
            > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
            >
            > var db = TAFFY(yourdata);
            > var unique = {};
            > var removeThese = [];
            >
            > db().each(function (r) {
            > var k = r.ssn + "_" + r.first + "_" + r.last;
            > if (unique[k]) {
            > // it is a dup! Kill it like a spider!
            > removeThese.push(r.___id);
            > } else {
            > // it is unique! add it so we can spot dups
            > unique[k] = true;
            > }
            > })
            >
            > // kill the dups
            > db(removeThese).remove();
            >
            > Ian
            >
            > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
            > >
            > >
            > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
            > >
            > > I haven't tested on version 2.3.3 yet. I'll download it today.
            > >
            > > Should I see a difference on load time if I am using a JSON string load?
            > >
            > > When I say JSON string I mean like on your example page :
            > > // Create a new database using a JSON string
            > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
            > >
            > > Except my array has 5000+ entries.
            > >
            > > I don't use the function you mentioned :
            > > JSON.parse()
            > >
            > > I had another question. How can I find redundant entries in my DB.
            > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
            > >
            > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
            > >
            > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
            > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
            > >
            > > Thanks very much for all your help by the way :)
            > >
            > > Thanks
            > > Dan
            > >
            > >
            > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
            > > >
            > > > Hey Dan,
            > > >
            > > > Do you see any improvement by upgrading to the latest release?
            > > >
            > > > How many columns are you working with per row? I was testing with 9 no problem.
            > > >
            > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
            > > >
            > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
            > > >
            > > > var data = eJSON.pars(yourdata);
            > > > var db = TAFFY();
            > > > for(var x = 0;x<3;x++) {
            > > > setTImeout(function () {
            > > > db.insert(data.slice((x*2500),((x+1)*2500)));
            > > > });
            > > > }
            > > >
            > > > Ian
            > > >
            > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
            > > > >
            > > > > Hi,
            > > > >
            > > > > Thanks for the reply.
            > > > >
            > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
            > > > >
            > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
            > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
            > > > >
            > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
            > > > >
            > > > > Thanks
            > > > > Dan
            > > > >
            > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
            > > > > >
            > > > > > Hey Dean,
            > > > > >
            > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
            > > > > >
            > > > > > Upgrade and let us know how it goes:
            > > > > >
            > > > > > https://github.com/typicaljoe/taffydb
            > > > > >
            > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
            > > > > >
            > > > > > Ian
            > > > > >
            > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
            > > > > > >
            > > > > > > Hello,
            > > > > > >
            > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
            > > > > > >
            > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
            > > > > > >
            > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
            > > > > > >
            > > > > > > Thanks
            > > > > > > Dan
            > > > > > >
            > > > > >
            > > > >
            > > >
            > >
            >
          • tacoman_cool
            Hmm...not off hand. Any errors or line numbers? I ve tested with both the normal and -min versions with good results. Here is my quick 10,000 records test if
            Message 5 of 15 , Sep 24, 2011
            View Source
            • 0 Attachment
              Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.

              Here is my quick 10,000 records test

              if (!console) {
              console = {
              log:function () {
              $("#console").append("<div>pass</div>")
              },
              warn:function () {
              $("#console").append("<div>fail</div>")
              }
              }
              }
              var asert = function (thing,shouldbe) {
              if (thing == shouldbe) {
              console.log(thing + " is " + shouldbe);
              } else {
              console.warn(thing + " is not " + shouldbe);
              }

              }
              var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
              var setupData = function () {
              for(var x = 0;x<10000;x++) {
              a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
              }
              a = JSON.stringify(a);
              }
              var load = function () {
              speed = TAFFY(a);

              asert(speed().count(),10000);

              }

              --- In taffydb@yahoogroups.com, linuxdan <no_reply@...> wrote:
              >
              > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
              >
              > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
              >
              > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
              >
              > Any ideas?
              >
              > Dan
              >
              >
              > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
              > >
              > > Hey Dan,
              > >
              > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
              > >
              > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
              > >
              > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
              > >
              > > var db = TAFFY(yourdata);
              > > var unique = {};
              > > var removeThese = [];
              > >
              > > db().each(function (r) {
              > > var k = r.ssn + "_" + r.first + "_" + r.last;
              > > if (unique[k]) {
              > > // it is a dup! Kill it like a spider!
              > > removeThese.push(r.___id);
              > > } else {
              > > // it is unique! add it so we can spot dups
              > > unique[k] = true;
              > > }
              > > })
              > >
              > > // kill the dups
              > > db(removeThese).remove();
              > >
              > > Ian
              > >
              > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
              > > >
              > > >
              > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
              > > >
              > > > I haven't tested on version 2.3.3 yet. I'll download it today.
              > > >
              > > > Should I see a difference on load time if I am using a JSON string load?
              > > >
              > > > When I say JSON string I mean like on your example page :
              > > > // Create a new database using a JSON string
              > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
              > > >
              > > > Except my array has 5000+ entries.
              > > >
              > > > I don't use the function you mentioned :
              > > > JSON.parse()
              > > >
              > > > I had another question. How can I find redundant entries in my DB.
              > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
              > > >
              > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
              > > >
              > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
              > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
              > > >
              > > > Thanks very much for all your help by the way :)
              > > >
              > > > Thanks
              > > > Dan
              > > >
              > > >
              > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
              > > > >
              > > > > Hey Dan,
              > > > >
              > > > > Do you see any improvement by upgrading to the latest release?
              > > > >
              > > > > How many columns are you working with per row? I was testing with 9 no problem.
              > > > >
              > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
              > > > >
              > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
              > > > >
              > > > > var data = eJSON.pars(yourdata);
              > > > > var db = TAFFY();
              > > > > for(var x = 0;x<3;x++) {
              > > > > setTImeout(function () {
              > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
              > > > > });
              > > > > }
              > > > >
              > > > > Ian
              > > > >
              > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
              > > > > >
              > > > > > Hi,
              > > > > >
              > > > > > Thanks for the reply.
              > > > > >
              > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
              > > > > >
              > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
              > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
              > > > > >
              > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
              > > > > >
              > > > > > Thanks
              > > > > > Dan
              > > > > >
              > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
              > > > > > >
              > > > > > > Hey Dean,
              > > > > > >
              > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
              > > > > > >
              > > > > > > Upgrade and let us know how it goes:
              > > > > > >
              > > > > > > https://github.com/typicaljoe/taffydb
              > > > > > >
              > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
              > > > > > >
              > > > > > > Ian
              > > > > > >
              > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
              > > > > > > >
              > > > > > > > Hello,
              > > > > > > >
              > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
              > > > > > > >
              > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
              > > > > > > >
              > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
              > > > > > > >
              > > > > > > > Thanks
              > > > > > > > Dan
              > > > > > > >
              > > > > > >
              > > > > >
              > > > >
              > > >
              > >
              >
            • linuxdan
              With Firefox 3.6.18 I can actually get results up to 5948 lines with the version 2.3.3 of TAFFY. But in Firefox 6.0.2 version 2.3.3 of TAFFY fails before 3000
              Message 6 of 15 , Sep 24, 2011
              View Source
              • 0 Attachment
                With Firefox 3.6.18 I can actually get results up to 5948 lines with the version 2.3.3 of TAFFY. But in Firefox 6.0.2 version 2.3.3 of TAFFY fails before 3000 lines.

                surprising huh?

                Dan


                --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@...> wrote:
                >
                > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                >
                > Here is my quick 10,000 records test
                >
                > if (!console) {
                > console = {
                > log:function () {
                > $("#console").append("<div>pass</div>")
                > },
                > warn:function () {
                > $("#console").append("<div>fail</div>")
                > }
                > }
                > }
                > var asert = function (thing,shouldbe) {
                > if (thing == shouldbe) {
                > console.log(thing + " is " + shouldbe);
                > } else {
                > console.warn(thing + " is not " + shouldbe);
                > }
                >
                > }
                > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                > var setupData = function () {
                > for(var x = 0;x<10000;x++) {
                > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                > }
                > a = JSON.stringify(a);
                > }
                > var load = function () {
                > speed = TAFFY(a);
                >
                > asert(speed().count(),10000);
                >
                > }
                >
                > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                > >
                > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                > >
                > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                > >
                > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                > >
                > > Any ideas?
                > >
                > > Dan
                > >
                > >
                > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                > > >
                > > > Hey Dan,
                > > >
                > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                > > >
                > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                > > >
                > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                > > >
                > > > var db = TAFFY(yourdata);
                > > > var unique = {};
                > > > var removeThese = [];
                > > >
                > > > db().each(function (r) {
                > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                > > > if (unique[k]) {
                > > > // it is a dup! Kill it like a spider!
                > > > removeThese.push(r.___id);
                > > > } else {
                > > > // it is unique! add it so we can spot dups
                > > > unique[k] = true;
                > > > }
                > > > })
                > > >
                > > > // kill the dups
                > > > db(removeThese).remove();
                > > >
                > > > Ian
                > > >
                > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                > > > >
                > > > >
                > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                > > > >
                > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                > > > >
                > > > > Should I see a difference on load time if I am using a JSON string load?
                > > > >
                > > > > When I say JSON string I mean like on your example page :
                > > > > // Create a new database using a JSON string
                > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                > > > >
                > > > > Except my array has 5000+ entries.
                > > > >
                > > > > I don't use the function you mentioned :
                > > > > JSON.parse()
                > > > >
                > > > > I had another question. How can I find redundant entries in my DB.
                > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                > > > >
                > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                > > > >
                > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                > > > >
                > > > > Thanks very much for all your help by the way :)
                > > > >
                > > > > Thanks
                > > > > Dan
                > > > >
                > > > >
                > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                > > > > >
                > > > > > Hey Dan,
                > > > > >
                > > > > > Do you see any improvement by upgrading to the latest release?
                > > > > >
                > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                > > > > >
                > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                > > > > >
                > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                > > > > >
                > > > > > var data = eJSON.pars(yourdata);
                > > > > > var db = TAFFY();
                > > > > > for(var x = 0;x<3;x++) {
                > > > > > setTImeout(function () {
                > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                > > > > > });
                > > > > > }
                > > > > >
                > > > > > Ian
                > > > > >
                > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                > > > > > >
                > > > > > > Hi,
                > > > > > >
                > > > > > > Thanks for the reply.
                > > > > > >
                > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                > > > > > >
                > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                > > > > > >
                > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                > > > > > >
                > > > > > > Thanks
                > > > > > > Dan
                > > > > > >
                > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                > > > > > > >
                > > > > > > > Hey Dean,
                > > > > > > >
                > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                > > > > > > >
                > > > > > > > Upgrade and let us know how it goes:
                > > > > > > >
                > > > > > > > https://github.com/typicaljoe/taffydb
                > > > > > > >
                > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                > > > > > > >
                > > > > > > > Ian
                > > > > > > >
                > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                > > > > > > > >
                > > > > > > > > Hello,
                > > > > > > > >
                > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                > > > > > > > >
                > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                > > > > > > > >
                > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                > > > > > > > >
                > > > > > > > > Thanks
                > > > > > > > > Dan
                > > > > > > > >
                > > > > > > >
                > > > > > >
                > > > > >
                > > > >
                > > >
                > >
                >
              • linuxdan
                In the Error Console, Firefox just tells me there is an error on the JSON.parse Dan
                Message 7 of 15 , Sep 24, 2011
                View Source
                • 0 Attachment
                  In the Error Console, Firefox just tells me there is an error on the JSON.parse

                  Dan


                  --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@...> wrote:
                  >
                  > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                  >
                  > Here is my quick 10,000 records test
                  >
                  > if (!console) {
                  > console = {
                  > log:function () {
                  > $("#console").append("<div>pass</div>")
                  > },
                  > warn:function () {
                  > $("#console").append("<div>fail</div>")
                  > }
                  > }
                  > }
                  > var asert = function (thing,shouldbe) {
                  > if (thing == shouldbe) {
                  > console.log(thing + " is " + shouldbe);
                  > } else {
                  > console.warn(thing + " is not " + shouldbe);
                  > }
                  >
                  > }
                  > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                  > var setupData = function () {
                  > for(var x = 0;x<10000;x++) {
                  > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                  > }
                  > a = JSON.stringify(a);
                  > }
                  > var load = function () {
                  > speed = TAFFY(a);
                  >
                  > asert(speed().count(),10000);
                  >
                  > }
                  >
                  > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                  > >
                  > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                  > >
                  > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                  > >
                  > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                  > >
                  > > Any ideas?
                  > >
                  > > Dan
                  > >
                  > >
                  > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                  > > >
                  > > > Hey Dan,
                  > > >
                  > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                  > > >
                  > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                  > > >
                  > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                  > > >
                  > > > var db = TAFFY(yourdata);
                  > > > var unique = {};
                  > > > var removeThese = [];
                  > > >
                  > > > db().each(function (r) {
                  > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                  > > > if (unique[k]) {
                  > > > // it is a dup! Kill it like a spider!
                  > > > removeThese.push(r.___id);
                  > > > } else {
                  > > > // it is unique! add it so we can spot dups
                  > > > unique[k] = true;
                  > > > }
                  > > > })
                  > > >
                  > > > // kill the dups
                  > > > db(removeThese).remove();
                  > > >
                  > > > Ian
                  > > >
                  > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                  > > > >
                  > > > >
                  > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                  > > > >
                  > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                  > > > >
                  > > > > Should I see a difference on load time if I am using a JSON string load?
                  > > > >
                  > > > > When I say JSON string I mean like on your example page :
                  > > > > // Create a new database using a JSON string
                  > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                  > > > >
                  > > > > Except my array has 5000+ entries.
                  > > > >
                  > > > > I don't use the function you mentioned :
                  > > > > JSON.parse()
                  > > > >
                  > > > > I had another question. How can I find redundant entries in my DB.
                  > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                  > > > >
                  > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                  > > > >
                  > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                  > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                  > > > >
                  > > > > Thanks very much for all your help by the way :)
                  > > > >
                  > > > > Thanks
                  > > > > Dan
                  > > > >
                  > > > >
                  > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                  > > > > >
                  > > > > > Hey Dan,
                  > > > > >
                  > > > > > Do you see any improvement by upgrading to the latest release?
                  > > > > >
                  > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                  > > > > >
                  > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                  > > > > >
                  > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                  > > > > >
                  > > > > > var data = eJSON.pars(yourdata);
                  > > > > > var db = TAFFY();
                  > > > > > for(var x = 0;x<3;x++) {
                  > > > > > setTImeout(function () {
                  > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                  > > > > > });
                  > > > > > }
                  > > > > >
                  > > > > > Ian
                  > > > > >
                  > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                  > > > > > >
                  > > > > > > Hi,
                  > > > > > >
                  > > > > > > Thanks for the reply.
                  > > > > > >
                  > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                  > > > > > >
                  > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                  > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                  > > > > > >
                  > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                  > > > > > >
                  > > > > > > Thanks
                  > > > > > > Dan
                  > > > > > >
                  > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                  > > > > > > >
                  > > > > > > > Hey Dean,
                  > > > > > > >
                  > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                  > > > > > > >
                  > > > > > > > Upgrade and let us know how it goes:
                  > > > > > > >
                  > > > > > > > https://github.com/typicaljoe/taffydb
                  > > > > > > >
                  > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                  > > > > > > >
                  > > > > > > > Ian
                  > > > > > > >
                  > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                  > > > > > > > >
                  > > > > > > > > Hello,
                  > > > > > > > >
                  > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                  > > > > > > > >
                  > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                  > > > > > > > >
                  > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                  > > > > > > > >
                  > > > > > > > > Thanks
                  > > > > > > > > Dan
                  > > > > > > > >
                  > > > > > > >
                  > > > > > >
                  > > > > >
                  > > > >
                  > > >
                  > >
                  >
                • Josh Powell
                  Send him the file you are using so you guys are comparing apples to apples. Dropbox it if it is too big to email. Josh Powell
                  Message 8 of 15 , Sep 24, 2011
                  View Source
                  • 0 Attachment
                    Send him the file you are using so you guys are comparing apples to apples.  Dropbox it if it is too big to email.

                    Josh Powell

                    On Sep 24, 2011, at 2:06 PM, linuxdan <no_reply@yahoogroups.com> wrote:

                     

                    In the Error Console, Firefox just tells me there is an error on the JSON.parse

                    Dan

                    --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@...> wrote:
                    >
                    > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                    >
                    > Here is my quick 10,000 records test
                    >
                    > if (!console) {
                    > console = {
                    > log:function () {
                    > $("#console").append("<div>pass</div>")
                    > },
                    > warn:function () {
                    > $("#console").append("<div>fail</div>")
                    > }
                    > }
                    > }
                    > var asert = function (thing,shouldbe) {
                    > if (thing == shouldbe) {
                    > console.log(thing + " is " + shouldbe);
                    > } else {
                    > console.warn(thing + " is not " + shouldbe);
                    > }
                    >
                    > }
                    > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                    > var setupData = function () {
                    > for(var x = 0;x<10000;x++) {
                    > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                    > }
                    > a = JSON.stringify(a);
                    > }
                    > var load = function () {
                    > speed = TAFFY(a);
                    >
                    > asert(speed().count(),10000);
                    >
                    > }
                    >
                    > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                    > >
                    > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                    > >
                    > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                    > >
                    > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                    > >
                    > > Any ideas?
                    > >
                    > > Dan
                    > >
                    > >
                    > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                    > > >
                    > > > Hey Dan,
                    > > >
                    > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                    > > >
                    > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                    > > >
                    > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                    > > >
                    > > > var db = TAFFY(yourdata);
                    > > > var unique = {};
                    > > > var removeThese = [];
                    > > >
                    > > > db().each(function (r) {
                    > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                    > > > if (unique[k]) {
                    > > > // it is a dup! Kill it like a spider!
                    > > > removeThese.push(r.___id);
                    > > > } else {
                    > > > // it is unique! add it so we can spot dups
                    > > > unique[k] = true;
                    > > > }
                    > > > })
                    > > >
                    > > > // kill the dups
                    > > > db(removeThese).remove();
                    > > >
                    > > > Ian
                    > > >
                    > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                    > > > >
                    > > > >
                    > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                    > > > >
                    > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                    > > > >
                    > > > > Should I see a difference on load time if I am using a JSON string load?
                    > > > >
                    > > > > When I say JSON string I mean like on your example page :
                    > > > > // Create a new database using a JSON string
                    > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                    > > > >
                    > > > > Except my array has 5000+ entries.
                    > > > >
                    > > > > I don't use the function you mentioned :
                    > > > > JSON.parse()
                    > > > >
                    > > > > I had another question. How can I find redundant entries in my DB.
                    > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                    > > > >
                    > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                    > > > >
                    > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                    > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                    > > > >
                    > > > > Thanks very much for all your help by the way :)
                    > > > >
                    > > > > Thanks
                    > > > > Dan
                    > > > >
                    > > > >
                    > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                    > > > > >
                    > > > > > Hey Dan,
                    > > > > >
                    > > > > > Do you see any improvement by upgrading to the latest release?
                    > > > > >
                    > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                    > > > > >
                    > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                    > > > > >
                    > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                    > > > > >
                    > > > > > var data = eJSON.pars(yourdata);
                    > > > > > var db = TAFFY();
                    > > > > > for(var x = 0;x<3;x++) {
                    > > > > > setTImeout(function () {
                    > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                    > > > > > });
                    > > > > > }
                    > > > > >
                    > > > > > Ian
                    > > > > >
                    > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                    > > > > > >
                    > > > > > > Hi,
                    > > > > > >
                    > > > > > > Thanks for the reply.
                    > > > > > >
                    > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                    > > > > > >
                    > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                    > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                    > > > > > >
                    > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                    > > > > > >
                    > > > > > > Thanks
                    > > > > > > Dan
                    > > > > > >
                    > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                    > > > > > > >
                    > > > > > > > Hey Dean,
                    > > > > > > >
                    > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                    > > > > > > >
                    > > > > > > > Upgrade and let us know how it goes:
                    > > > > > > >
                    > > > > > > > https://github.com/typicaljoe/taffydb
                    > > > > > > >
                    > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                    > > > > > > >
                    > > > > > > > Ian
                    > > > > > > >
                    > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                    > > > > > > > >
                    > > > > > > > > Hello,
                    > > > > > > > >
                    > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                    > > > > > > > >
                    > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                    > > > > > > > >
                    > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                    > > > > > > > >
                    > > > > > > > > Thanks
                    > > > > > > > > Dan
                    > > > > > > > >
                    > > > > > > >
                    > > > > > >
                    > > > > >
                    > > > >
                    > > >
                    > >
                    >

                  • linuxdan
                    The 10,000 lines are loading ok !!! :) There were a couple characters in the CSV file that caused some issues. The odd thing is that I looked for then via
                    Message 9 of 15 , Sep 25, 2011
                    View Source
                    • 0 Attachment
                      The 10,000 lines are loading ok !!! :)

                      There were a couple " characters in the CSV file that caused some issues. The odd thing is that I looked for then via notepad and it found none. Then using a better text editor, I did find them.

                      But that was not the only problem. The 10,000 lines load with version 2.3.3 of TAFFY with Firefox 3.6.18. But with Firefox 6.0.2, it still crashes on the JSON.parse. It complains about illegal control characters. But Firefox 3.6.18 does not.
                      So I am not sure if there is a bug in Firefox 6.0.2, or not, in the JSON parsing functions. It might be useful to report it to the Mozilla team... I'll look into it.

                      So in FF 3.6.18 with Taffy v2.3.3 it takes about 6 minutes (at 80% CPU) to load the DB from the JSON string and count the total items in the DB and count the distinct items based on the ssn column.

                      But this is running on a netbook laptop. So the CPU is pretty slow.

                      So now I have to tackle the "find the redundant rows" issue. I'll take a look at the code you suggested.

                      I'm also gonna put some "time stamps" in the code, to figure out what part of the script takes up the most time, in the 6 minutes. Is it the JSON loading or the "distinct" query. I'm guessing it's the JSON loading. But this way I'll be sure by having a timed version of the script.

                      Thanks again for all your help :)

                      Dan


                      --- In taffydb@yahoogroups.com, linuxdan <no_reply@...> wrote:
                      >
                      > In the Error Console, Firefox just tells me there is an error on the JSON.parse
                      >
                      > Dan
                      >
                      >
                      > --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@> wrote:
                      > >
                      > > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                      > >
                      > > Here is my quick 10,000 records test
                      > >
                      > > if (!console) {
                      > > console = {
                      > > log:function () {
                      > > $("#console").append("<div>pass</div>")
                      > > },
                      > > warn:function () {
                      > > $("#console").append("<div>fail</div>")
                      > > }
                      > > }
                      > > }
                      > > var asert = function (thing,shouldbe) {
                      > > if (thing == shouldbe) {
                      > > console.log(thing + " is " + shouldbe);
                      > > } else {
                      > > console.warn(thing + " is not " + shouldbe);
                      > > }
                      > >
                      > > }
                      > > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                      > > var setupData = function () {
                      > > for(var x = 0;x<10000;x++) {
                      > > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                      > > }
                      > > a = JSON.stringify(a);
                      > > }
                      > > var load = function () {
                      > > speed = TAFFY(a);
                      > >
                      > > asert(speed().count(),10000);
                      > >
                      > > }
                      > >
                      > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                      > > >
                      > > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                      > > >
                      > > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                      > > >
                      > > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                      > > >
                      > > > Any ideas?
                      > > >
                      > > > Dan
                      > > >
                      > > >
                      > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                      > > > >
                      > > > > Hey Dan,
                      > > > >
                      > > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                      > > > >
                      > > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                      > > > >
                      > > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                      > > > >
                      > > > > var db = TAFFY(yourdata);
                      > > > > var unique = {};
                      > > > > var removeThese = [];
                      > > > >
                      > > > > db().each(function (r) {
                      > > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                      > > > > if (unique[k]) {
                      > > > > // it is a dup! Kill it like a spider!
                      > > > > removeThese.push(r.___id);
                      > > > > } else {
                      > > > > // it is unique! add it so we can spot dups
                      > > > > unique[k] = true;
                      > > > > }
                      > > > > })
                      > > > >
                      > > > > // kill the dups
                      > > > > db(removeThese).remove();
                      > > > >
                      > > > > Ian
                      > > > >
                      > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                      > > > > >
                      > > > > >
                      > > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                      > > > > >
                      > > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                      > > > > >
                      > > > > > Should I see a difference on load time if I am using a JSON string load?
                      > > > > >
                      > > > > > When I say JSON string I mean like on your example page :
                      > > > > > // Create a new database using a JSON string
                      > > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                      > > > > >
                      > > > > > Except my array has 5000+ entries.
                      > > > > >
                      > > > > > I don't use the function you mentioned :
                      > > > > > JSON.parse()
                      > > > > >
                      > > > > > I had another question. How can I find redundant entries in my DB.
                      > > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                      > > > > >
                      > > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                      > > > > >
                      > > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                      > > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                      > > > > >
                      > > > > > Thanks very much for all your help by the way :)
                      > > > > >
                      > > > > > Thanks
                      > > > > > Dan
                      > > > > >
                      > > > > >
                      > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                      > > > > > >
                      > > > > > > Hey Dan,
                      > > > > > >
                      > > > > > > Do you see any improvement by upgrading to the latest release?
                      > > > > > >
                      > > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                      > > > > > >
                      > > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                      > > > > > >
                      > > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                      > > > > > >
                      > > > > > > var data = eJSON.pars(yourdata);
                      > > > > > > var db = TAFFY();
                      > > > > > > for(var x = 0;x<3;x++) {
                      > > > > > > setTImeout(function () {
                      > > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                      > > > > > > });
                      > > > > > > }
                      > > > > > >
                      > > > > > > Ian
                      > > > > > >
                      > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                      > > > > > > >
                      > > > > > > > Hi,
                      > > > > > > >
                      > > > > > > > Thanks for the reply.
                      > > > > > > >
                      > > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                      > > > > > > >
                      > > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                      > > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                      > > > > > > >
                      > > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                      > > > > > > >
                      > > > > > > > Thanks
                      > > > > > > > Dan
                      > > > > > > >
                      > > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                      > > > > > > > >
                      > > > > > > > > Hey Dean,
                      > > > > > > > >
                      > > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                      > > > > > > > >
                      > > > > > > > > Upgrade and let us know how it goes:
                      > > > > > > > >
                      > > > > > > > > https://github.com/typicaljoe/taffydb
                      > > > > > > > >
                      > > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                      > > > > > > > >
                      > > > > > > > > Ian
                      > > > > > > > >
                      > > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                      > > > > > > > > >
                      > > > > > > > > > Hello,
                      > > > > > > > > >
                      > > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                      > > > > > > > > >
                      > > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                      > > > > > > > > >
                      > > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                      > > > > > > > > >
                      > > > > > > > > > Thanks
                      > > > > > > > > > Dan
                      > > > > > > > > >
                      > > > > > > > >
                      > > > > > > >
                      > > > > > >
                      > > > > >
                      > > > >
                      > > >
                      > >
                      >
                    • linuxdan
                      As I mentioned in my last message, the data import now works for the entire file. But only in FF 3.6.18 not in FF 6.0.2. As for the redundant rows problem I
                      Message 10 of 15 , Sep 25, 2011
                      View Source
                      • 0 Attachment
                        As I mentioned in my last message, the data import now works for the entire file. But only in FF 3.6.18 not in FF 6.0.2.

                        As for the redundant rows problem I had to solve. Using the Array object sort() function, with an array containing the snn, first name and last name. I was able to then traverse that array with a for loop and test each entry with the next. And that algorithm is fast. The check of the entire 10000 item array takes a few ms. Whereas, when I tried to do a filter / distinct query on the Taffy DB object, the query took 95 seconds.
                        So I kept the first solution which is much faster.

                        Ex output :
                        Number of lines : 4593
                        Done splitting on ; Splits took 74ms
                        Done building tempText string, Building tempText string took 287ms
                        Created main Taffy DB : ok, Building main Taffy DB took 921ms
                        Number of items in Taffy DB : 4593
                        Finding redundant individual entries took 4ms

                        But now that I have my Taffy DB object loaded I am going be able to do the "business" queries that I need to carry out. Removing some records based on specific values in certain columns, count the number of rows that has specific values, update some rows, to remove some of the values, etc...

                        And at the end, generate a new CSV file without the rows I removed, and with the data I updated.

                        I think this is where Taffy is going to pay off (deletes, updates). At least I really hope so :)
                        Otherwise, I can write everything in 100% JS, using only Arrays, without Taffy. But that will require more JS code. The idea of using Taffy, for me, is exactly that : not to write all that JS code and just write a few Taffy queries.

                        Dan


                        --- In taffydb@yahoogroups.com, Josh Powell <seasoup@...> wrote:
                        >
                        > Send him the file you are using so you guys are comparing apples to apples. Dropbox it if it is too big to email.
                        >
                        > Josh Powell
                        >
                        > On Sep 24, 2011, at 2:06 PM, linuxdan <no_reply@yahoogroups.com> wrote:
                        >
                        > > In the Error Console, Firefox just tells me there is an error on the JSON.parse
                        > >
                        > > Dan
                        > >
                        > > --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@> wrote:
                        > > >
                        > > > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                        > > >
                        > > > Here is my quick 10,000 records test
                        > > >
                        > > > if (!console) {
                        > > > console = {
                        > > > log:function () {
                        > > > $("#console").append("<div>pass</div>")
                        > > > },
                        > > > warn:function () {
                        > > > $("#console").append("<div>fail</div>")
                        > > > }
                        > > > }
                        > > > }
                        > > > var asert = function (thing,shouldbe) {
                        > > > if (thing == shouldbe) {
                        > > > console.log(thing + " is " + shouldbe);
                        > > > } else {
                        > > > console.warn(thing + " is not " + shouldbe);
                        > > > }
                        > > >
                        > > > }
                        > > > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                        > > > var setupData = function () {
                        > > > for(var x = 0;x<10000;x++) {
                        > > > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                        > > > }
                        > > > a = JSON.stringify(a);
                        > > > }
                        > > > var load = function () {
                        > > > speed = TAFFY(a);
                        > > >
                        > > > asert(speed().count(),10000);
                        > > >
                        > > > }
                        > > >
                        > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                        > > > >
                        > > > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                        > > > >
                        > > > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                        > > > >
                        > > > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                        > > > >
                        > > > > Any ideas?
                        > > > >
                        > > > > Dan
                        > > > >
                        > > > >
                        > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                        > > > > >
                        > > > > > Hey Dan,
                        > > > > >
                        > > > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                        > > > > >
                        > > > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                        > > > > >
                        > > > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                        > > > > >
                        > > > > > var db = TAFFY(yourdata);
                        > > > > > var unique = {};
                        > > > > > var removeThese = [];
                        > > > > >
                        > > > > > db().each(function (r) {
                        > > > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                        > > > > > if (unique[k]) {
                        > > > > > // it is a dup! Kill it like a spider!
                        > > > > > removeThese.push(r.___id);
                        > > > > > } else {
                        > > > > > // it is unique! add it so we can spot dups
                        > > > > > unique[k] = true;
                        > > > > > }
                        > > > > > })
                        > > > > >
                        > > > > > // kill the dups
                        > > > > > db(removeThese).remove();
                        > > > > >
                        > > > > > Ian
                        > > > > >
                        > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                        > > > > > >
                        > > > > > >
                        > > > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                        > > > > > >
                        > > > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                        > > > > > >
                        > > > > > > Should I see a difference on load time if I am using a JSON string load?
                        > > > > > >
                        > > > > > > When I say JSON string I mean like on your example page :
                        > > > > > > // Create a new database using a JSON string
                        > > > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                        > > > > > >
                        > > > > > > Except my array has 5000+ entries.
                        > > > > > >
                        > > > > > > I don't use the function you mentioned :
                        > > > > > > JSON.parse()
                        > > > > > >
                        > > > > > > I had another question. How can I find redundant entries in my DB.
                        > > > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                        > > > > > >
                        > > > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                        > > > > > >
                        > > > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                        > > > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                        > > > > > >
                        > > > > > > Thanks very much for all your help by the way :)
                        > > > > > >
                        > > > > > > Thanks
                        > > > > > > Dan
                        > > > > > >
                        > > > > > >
                        > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                        > > > > > > >
                        > > > > > > > Hey Dan,
                        > > > > > > >
                        > > > > > > > Do you see any improvement by upgrading to the latest release?
                        > > > > > > >
                        > > > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                        > > > > > > >
                        > > > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                        > > > > > > >
                        > > > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                        > > > > > > >
                        > > > > > > > var data = eJSON.pars(yourdata);
                        > > > > > > > var db = TAFFY();
                        > > > > > > > for(var x = 0;x<3;x++) {
                        > > > > > > > setTImeout(function () {
                        > > > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                        > > > > > > > });
                        > > > > > > > }
                        > > > > > > >
                        > > > > > > > Ian
                        > > > > > > >
                        > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                        > > > > > > > >
                        > > > > > > > > Hi,
                        > > > > > > > >
                        > > > > > > > > Thanks for the reply.
                        > > > > > > > >
                        > > > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                        > > > > > > > >
                        > > > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                        > > > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                        > > > > > > > >
                        > > > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                        > > > > > > > >
                        > > > > > > > > Thanks
                        > > > > > > > > Dan
                        > > > > > > > >
                        > > > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                        > > > > > > > > >
                        > > > > > > > > > Hey Dean,
                        > > > > > > > > >
                        > > > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                        > > > > > > > > >
                        > > > > > > > > > Upgrade and let us know how it goes:
                        > > > > > > > > >
                        > > > > > > > > > https://github.com/typicaljoe/taffydb
                        > > > > > > > > >
                        > > > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                        > > > > > > > > >
                        > > > > > > > > > Ian
                        > > > > > > > > >
                        > > > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                        > > > > > > > > > >
                        > > > > > > > > > > Hello,
                        > > > > > > > > > >
                        > > > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                        > > > > > > > > > >
                        > > > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                        > > > > > > > > > >
                        > > > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                        > > > > > > > > > >
                        > > > > > > > > > > Thanks
                        > > > > > > > > > > Dan
                        > > > > > > > > > >
                        > > > > > > > > >
                        > > > > > > > >
                        > > > > > > >
                        > > > > > >
                        > > > > >
                        > > > >
                        > > >
                        > >
                        > >
                        >
                      • linuxdan
                        The delete queries are running great. In about 200ms on the 10,000 item Taffy DB. :) Awesome !
                        Message 11 of 15 , Sep 25, 2011
                        View Source
                        • 0 Attachment
                          The delete queries are running great. In about 200ms on the 10,000 item Taffy DB. :) Awesome !

                          --- In taffydb@yahoogroups.com, linuxdan <no_reply@...> wrote:
                          >
                          > The 10,000 lines are loading ok !!! :)
                          >
                          > There were a couple " characters in the CSV file that caused some issues. The odd thing is that I looked for then via notepad and it found none. Then using a better text editor, I did find them.
                          >
                          > But that was not the only problem. The 10,000 lines load with version 2.3.3 of TAFFY with Firefox 3.6.18. But with Firefox 6.0.2, it still crashes on the JSON.parse. It complains about illegal control characters. But Firefox 3.6.18 does not.
                          > So I am not sure if there is a bug in Firefox 6.0.2, or not, in the JSON parsing functions. It might be useful to report it to the Mozilla team... I'll look into it.
                          >
                          > So in FF 3.6.18 with Taffy v2.3.3 it takes about 6 minutes (at 80% CPU) to load the DB from the JSON string and count the total items in the DB and count the distinct items based on the ssn column.
                          >
                          > But this is running on a netbook laptop. So the CPU is pretty slow.
                          >
                          > So now I have to tackle the "find the redundant rows" issue. I'll take a look at the code you suggested.
                          >
                          > I'm also gonna put some "time stamps" in the code, to figure out what part of the script takes up the most time, in the 6 minutes. Is it the JSON loading or the "distinct" query. I'm guessing it's the JSON loading. But this way I'll be sure by having a timed version of the script.
                          >
                          > Thanks again for all your help :)
                          >
                          > Dan
                          >
                          >
                          > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                          > >
                          > > In the Error Console, Firefox just tells me there is an error on the JSON.parse
                          > >
                          > > Dan
                          > >
                          > >
                          > > --- In taffydb@yahoogroups.com, "tacoman_cool" <ian@> wrote:
                          > > >
                          > > > Hmm...not off hand. Any errors or line numbers? I've tested with both the normal and -min versions with good results.
                          > > >
                          > > > Here is my quick 10,000 records test
                          > > >
                          > > > if (!console) {
                          > > > console = {
                          > > > log:function () {
                          > > > $("#console").append("<div>pass</div>")
                          > > > },
                          > > > warn:function () {
                          > > > $("#console").append("<div>fail</div>")
                          > > > }
                          > > > }
                          > > > }
                          > > > var asert = function (thing,shouldbe) {
                          > > > if (thing == shouldbe) {
                          > > > console.log(thing + " is " + shouldbe);
                          > > > } else {
                          > > > console.warn(thing + " is not " + shouldbe);
                          > > > }
                          > > >
                          > > > }
                          > > > var a = [["id","type","thing","id2","type2","thing2","id3","type3","thing3"]];
                          > > > var setupData = function () {
                          > > > for(var x = 0;x<10000;x++) {
                          > > > a.push(["123"+x,"one","a1","124"+x,"two","a2","125"+x,"three","a3"]);
                          > > > }
                          > > > a = JSON.stringify(a);
                          > > > }
                          > > > var load = function () {
                          > > > speed = TAFFY(a);
                          > > >
                          > > > asert(speed().count(),10000);
                          > > >
                          > > > }
                          > > >
                          > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                          > > > >
                          > > > > Just tested the same data set of 3961 records with TAFFY 2.1 and 2.3.3
                          > > > >
                          > > > > Unfortunately for me, the exact same JS code and data that works with TAFFY 2.1, fails to finish creating the TAFFY DB object with version 2.3.3 (taffy-min.js version of 2.3.3 and 2.1)
                          > > > >
                          > > > > Same one shot loading method using a string. I checked the string. It's complete. So the code did not crash during the building of the JSON string from the CSV. It's the TAFFY object creation that fails.
                          > > > >
                          > > > > Any ideas?
                          > > > >
                          > > > > Dan
                          > > > >
                          > > > >
                          > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                          > > > > >
                          > > > > > Hey Dan,
                          > > > > >
                          > > > > > 2.3.3 is a huge leap forward in performance over 2.1. You may find it just works without the lazy loading.
                          > > > > >
                          > > > > > At that size of data you are stretching the limits of how easy you can manipulate the data once loaded. But as for your question, TaffyDB doesn't have a primary key yet (something I'm looking at in the future), so you'd need to use a more manual route.
                          > > > > >
                          > > > > > I'm just winging it here but I'd probably build an object and loop over the collection adding a key for each unique value. If I ever hit a key that has already been added I add that record to an array to be deleted. This assumes you don't need to merge data or anything like that.
                          > > > > >
                          > > > > > var db = TAFFY(yourdata);
                          > > > > > var unique = {};
                          > > > > > var removeThese = [];
                          > > > > >
                          > > > > > db().each(function (r) {
                          > > > > > var k = r.ssn + "_" + r.first + "_" + r.last;
                          > > > > > if (unique[k]) {
                          > > > > > // it is a dup! Kill it like a spider!
                          > > > > > removeThese.push(r.___id);
                          > > > > > } else {
                          > > > > > // it is unique! add it so we can spot dups
                          > > > > > unique[k] = true;
                          > > > > > }
                          > > > > > })
                          > > > > >
                          > > > > > // kill the dups
                          > > > > > db(removeThese).remove();
                          > > > > >
                          > > > > > Ian
                          > > > > >
                          > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                          > > > > > >
                          > > > > > >
                          > > > > > > I think it's a version 2.1. I downloaded it this month, on the 19th of September. In the code it says version 2.1.
                          > > > > > >
                          > > > > > > I haven't tested on version 2.3.3 yet. I'll download it today.
                          > > > > > >
                          > > > > > > Should I see a difference on load time if I am using a JSON string load?
                          > > > > > >
                          > > > > > > When I say JSON string I mean like on your example page :
                          > > > > > > // Create a new database using a JSON string
                          > > > > > > var db = TAFFY('[{"record":1,"text":"example"}, {"record":2,"text2":"example2"}]')
                          > > > > > >
                          > > > > > > Except my array has 5000+ entries.
                          > > > > > >
                          > > > > > > I don't use the function you mentioned :
                          > > > > > > JSON.parse()
                          > > > > > >
                          > > > > > > I had another question. How can I find redundant entries in my DB.
                          > > > > > > My "primary key" is a social security number. Then I have first name and last name of the person. How can I get a query to get all records that have the same ssn and firstName and lastName?
                          > > > > > >
                          > > > > > > I know I had redundant one because when I do myDB().count() I get a number and when I use the filter() distinct on ssn, firstName and lastName I get a smaller number. But I don't know what is the best way to identify the redundant lines in the DB.
                          > > > > > >
                          > > > > > > Is it best to "order" the DB by ssn and then compare items with the next one? I tried the "brute force" way but that is way too long :
                          > > > > > > Taking all ssn, one by one, and doing a count() on the entire DB. That required 5000+ queries, and Firefox starts telling me I have an unresponsive script. (I am not surprised). So since I tried the "dumb" way and that does not work so well, I'm curious to know how would you do it?
                          > > > > > >
                          > > > > > > Thanks very much for all your help by the way :)
                          > > > > > >
                          > > > > > > Thanks
                          > > > > > > Dan
                          > > > > > >
                          > > > > > >
                          > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                          > > > > > > >
                          > > > > > > > Hey Dan,
                          > > > > > > >
                          > > > > > > > Do you see any improvement by upgrading to the latest release?
                          > > > > > > >
                          > > > > > > > How many columns are you working with per row? I was testing with 9 no problem.
                          > > > > > > >
                          > > > > > > > In terms of where the slow down might be, if you are passing in a string to TAFFY() and are on FireFox then you should be using a native JSON parser which should be really fast. At that point Taffy is going to start looping over the records and adding ___id values to them and then dropping them into the internal array. If you have a template or an onInsert event or are forcing a particular case (via settings) it has to do more work.
                          > > > > > > >
                          > > > > > > > In terms of breaking the DB up, that shouldn't be a major issue. Another option might be to break up the load and make it "lazy". Something like:
                          > > > > > > >
                          > > > > > > > var data = eJSON.pars(yourdata);
                          > > > > > > > var db = TAFFY();
                          > > > > > > > for(var x = 0;x<3;x++) {
                          > > > > > > > setTImeout(function () {
                          > > > > > > > db.insert(data.slice((x*2500),((x+1)*2500)));
                          > > > > > > > });
                          > > > > > > > }
                          > > > > > > >
                          > > > > > > > Ian
                          > > > > > > >
                          > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                          > > > > > > > >
                          > > > > > > > > Hi,
                          > > > > > > > >
                          > > > > > > > > Thanks for the reply.
                          > > > > > > > >
                          > > > > > > > > I am loading the data in one shot by building a big JSON formatted text string and declaring the TAFFY DB object, and passing that string into it. So basically the loading of all the data is passed into the "constructor" of the TAFFY object.
                          > > > > > > > >
                          > > > > > > > > I am going to try experiment a little. To see exactly where the "breaking point is". Right now I know it's around 5000 lines but I don't have the exact number. And i will try to make my initial data JSON string smaller by using smaller properties names. For exemple : {"telephone":"12345678"} would be : {"tel":"12345678"}
                          > > > > > > > > And doing this I will check if the number of lines / DB objects can be bigger than my present limit. If this is the case, then I will know that the poblem is the data size in the DB.
                          > > > > > > > >
                          > > > > > > > > Another question. Is it possible to create 2 TAFFY DB objects and merge them? So instead of making one 10000 item DB I'll make two 5000 item TAFFY DB objects
                          > > > > > > > >
                          > > > > > > > > Thanks
                          > > > > > > > > Dan
                          > > > > > > > >
                          > > > > > > > > --- In taffydb@yahoogroups.com, taffydb-owner@yahoogroups.com wrote:
                          > > > > > > > > >
                          > > > > > > > > > Hey Dean,
                          > > > > > > > > >
                          > > > > > > > > > There are no Max size limitations. I did a little testing and pushed out an update that should help speed update inserts considerably. I was testing with 10,000 records in a CSV format with 9 columns.
                          > > > > > > > > >
                          > > > > > > > > > Upgrade and let us know how it goes:
                          > > > > > > > > >
                          > > > > > > > > > https://github.com/typicaljoe/taffydb
                          > > > > > > > > >
                          > > > > > > > > > The only other thing I'd look at right now is how you are putting the records into TaffyDB. Adding the records all at once via TAFFY([data]) or db.insert([data]) will be a lot faster than doing a for loop outside Taffy and calling .insert() on each record.
                          > > > > > > > > >
                          > > > > > > > > > Ian
                          > > > > > > > > >
                          > > > > > > > > > --- In taffydb@yahoogroups.com, linuxdan <no_reply@> wrote:
                          > > > > > > > > > >
                          > > > > > > > > > > Hello,
                          > > > > > > > > > >
                          > > > > > > > > > > I am fairly new to TAFFY. I am trying to use it to process a set of data consisting of about 10,000 lines of CSV, each line containing around 20 fields. (names, adresses, phone numbers etc...)
                          > > > > > > > > > >
                          > > > > > > > > > > So far, running in FireFox 6, I seem to get stuck around 5000 lines. With this much data my script runs. When i try to feed it more, the browser hangs. I am converting my 10000 lines to CSV to a JSON string to initialize my TAFFY DB.
                          > > > > > > > > > >
                          > > > > > > > > > > So I wanted to know if someone can help me out figuring out why the script is ok with 5000 lines of CSV and KO when over 5000 lnes of CSV data. Is it a string limitation issue in Javascript? Is it a TAFFY object number limitation?
                          > > > > > > > > > >
                          > > > > > > > > > > Thanks
                          > > > > > > > > > > Dan
                          > > > > > > > > > >
                          > > > > > > > > >
                          > > > > > > > >
                          > > > > > > >
                          > > > > > >
                          > > > > >
                          > > > >
                          > > >
                          > >
                          >
                        Your message has been successfully submitted and would be delivered to recipients shortly.