Loading ...
Sorry, an error occurred while loading the content.

Processing large datasets

Expand Messages
  • jonjoffy
    This is really an extension of the post that I created the other week regarding multi-table scenarios. The issue I am facing now is how to download multiple
    Message 1 of 2 , Jan 24, 2012
    View Source
    • 0 Attachment
      This is really an extension of the post that I created the other week regarding multi-table scenarios. The issue I am facing now is how to download multiple tables to the client and process them into Taffy without locking the browser.

      At the moment, I call a Web Service via AJAX which returns data in JSON format, and I then 'upsert' these records using the AJAX 'success' callback.

      I have a limit of 1000 rows per table for each AJAX call to limit the amount of data being sent over the wire. If I need to download more than 1000 rows, the web service is invoked multiple times until all data is downloaded. The JSON contains data for all tables that are on the client (there's a different array of records within the JSON for each table that I need to update).

      The problem is, processing 6 tables within the success callback takes a lot of time (especially if I'm upserting into a table that already contains records), and I've noticed the browser locking during the callback while it does the processing.

      I think an improvement might be to split the processing of the upsert into smaller chunks (say 50 records in each chunk) and process each upsert chunk within a setTimeout so that it doesn't lock the browser and then raise an event when it's finished the whole table (this kind of approach here: http://www.sitepoint.com/javascript-large-data-processing/ )

      Anyway, not sure the point of the post, just thinking aloud really, but I think this might behave a bit kinder for larger datasets, and might be something that benefits Taffy (this will be much less of an issue when there's wider support for Web Workers as you could do all of this on a separate thread).

      Jon
    • jonjoffy
      Here s a crude function that I ve been playing around with. I ve basically added a callback overload to the upsert extension which I can then call recursively
      Message 2 of 2 , Jan 27, 2012
      View Source
      • 0 Attachment
        Here's a crude function that I've been playing around with. I've
        basically added a callback overload to the upsert extension which I can
        then call recursively to work my way through large batches of data
        without hanging up the UI (too much!). The optimal batch size depends on
        how much data is already in Taffy, and on the processor speed and memory
        of the device. I'm still playing around trying to find optimal settings.

        Anyway, just thought I'd share, and no doubt it'll improve over the
        course of the project that I'm working on (I've re-factored the code for
        pasting here, so apologies if there are any typos).

        Jon

        /* Split the upsert into small batches so that the UI is more responsive
        during large upserts */
        function lazyupsert(db, pk_name, records_processed, records)
        {
        var batch_size = 100;
        var delay = 50;

        if (records_processed < records.length)
        {
        // Define the chunk of data for upsert to process
        var upsert_batch = records.slice(records_processed,
        records_processed + batch_size);

        // Run upsert and callback when finished to process next batch
        var upsertCallback = function () { setTimeout(lazyupsert(db,
        pk_name, records_processed, records), delay); } ;

        // Build the upsert
        var processUpsertBatch = function() { db().upsert(pk_name,
        upsert_batch, upsertCallback); };

        // Increment records processed for callback to process next
        batch
        records_processed += batch_size;

        // Process the batch
        processUpsertBatch ();
        }
        }

        --- In taffydb@yahoogroups.com, "jonjoffy" <jon@...> wrote:
        >
        > This is really an extension of the post that I created the other week
        regarding multi-table scenarios. The issue I am facing now is how to
        download multiple tables to the client and process them into Taffy
        without locking the browser.
        >
        > At the moment, I call a Web Service via AJAX which returns data in
        JSON format, and I then 'upsert' these records using the AJAX 'success'
        callback.
        >
        > I have a limit of 1000 rows per table for each AJAX call to limit the
        amount of data being sent over the wire. If I need to download more than
        1000 rows, the web service is invoked multiple times until all data is
        downloaded. The JSON contains data for all tables that are on the client
        (there's a different array of records within the JSON for each table
        that I need to update).
        >
        > The problem is, processing 6 tables within the success callback takes
        a lot of time (especially if I'm upserting into a table that already
        contains records), and I've noticed the browser locking during the
        callback while it does the processing.
        >
        > I think an improvement might be to split the processing of the upsert
        into smaller chunks (say 50 records in each chunk) and process each
        upsert chunk within a setTimeout so that it doesn't lock the browser and
        then raise an event when it's finished the whole table (this kind of
        approach here:
        http://www.sitepoint.com/javascript-large-data-processing/ )
        >
        > Anyway, not sure the point of the post, just thinking aloud really,
        but I think this might behave a bit kinder for larger datasets, and
        might be something that benefits Taffy (this will be much less of an
        issue when there's wider support for Web Workers as you could do all of
        this on a separate thread).
        >
        > Jon
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.