Loading ...
Sorry, an error occurred while loading the content.

Speeding up your PHP Code.

Expand Messages
  • Rip Van Winkle
    Speeding Up PHP Code [Optimization] My job requires that I write Web applications that do some pretty hefty lifting, often over slowish connections. I ve been
    Message 1 of 1 , Nov 3, 2004
    • 0 Attachment
      Untitled Document

      Speeding Up PHP Code [Optimization]

      My job requires that I write Web applications that do some pretty hefty lifting, often over slowish connections. I've been hard at work recently on an application that retrieves data from a MS SQL server and builds complex multi-tabbed dynamic Web forms. Everything from validating my queries against SQL injection attacks to building multi-row in-line editable grids within the forms has forced me to work toward finding efficient ways of programming. And now that Zend 3 has shipped with its Profiler, I'm able to really home in on problem spots and rework code until I get a measurable performance boost. In this article I'll give a few examples of things you can do to make your code more efficient.

      There are several ways to fetch database results, depending on your database server type and your PHP configuration. For example, using the sybase drivers to connect to an MS SQL server, my PHP version will let me use the sybase_fetch_array() function but not the sybase_fetch_assoc() function. Sybase_fetch_array() returns both text and numeric indices for each row returned, however, effectively doubling the data I have to pass around when processing the results. In order to reduce the amount of data I have to work with, I added a function to my database library that lets me choose which type of index I wish to use -- text, numeric, or both. Consider the following code:

       $db=new DB("server","username","password","database","sybase");
        $db->connect();
        $db->usedb("database");
        $db->set_return("both");

       foreach($db->fetch("SELECT TOP 200 * FROM [mainview]") as $row){
            print_r($row);
         print "<br><hr><br>\n";
        }
        $db->disconnect();

      Forget for a moment that I'm using a database library you don't have access to.  It's plain enough that I'm connecting to a database, invoking the "set_return()" function, and then looping through an array of row arrays to print the results. My [mainview] happens to be a view containing some 300 columns. Multiply that times 200 and we're working with a fair amount of data. Now the default behavior of my database class is to return both numeric and text indices, and it consistently takes about 55,000 milliseconds to run this query. Only about 8% of that time is spent actually pulling results from the database; the remaining 92% is devoted to processing the results. But when I invoke 'set_return("text")', the results are somewhat improved.  Total runtime is about 41,000 milliseconds, with 40% of the time spent in retrieving results and 60% processing them; and when I set the return to "numeric," the results are even better: 31,000 milliseconds with a 20/80 split. So what did I do to increase performance?

      I wrote a function within my database library that checks the value of the member variable "return_type," runs through the row results, and returns all numeric or text indices based on the value of that member variable:

       function format_results($rows){
         //$rows is assumed to be a multi-dimensional
         //array composed of arrays of row results.
         if($this->return_type=="both" || !$this->return_type){ return $rows; }
          $newrows=array();
          foreach($rows as $r){
           $vals=array();
           //Get all array keys from the row of row arrays passed to function.
           $keys=array_keys($r);

          //For each key, check return type and set vals[$key] to the appropriate modulo value.
           for($i=0; $i<sizeof($keys); $i++){
            switch($this->return_type){
             case "numeric":
              if($i%2==0){
               $vals[$keys[$i]]=$r[$keys[$i]];
              }
              break;
             case "text":
              if($i%2==1){
               $vals[$keys[$i]]=$r[$keys[$i]];
              }
              break;
              //Default case, just return $rows as it was passed.
             default:
              $i=sizeof($keys) + 5;
              return $rows;
            }
           }
           //Push reformatted single row array onto array of row arrays.
           array_push($newrows,$vals);
          }
         return $newrows;
        }

      The result of adding this function is of course that we spend more time formatting the results before returning them to the calling function, but we spend less time processing the results after the fact. In other words, by weeding out duplicate values once up front, we dodge working with twice the data in any loops in our calling script and we stand to save significant time. If your PHP config is such that you can opt to use functions that will return only numeric or text indices, you may not need to write such a function. The lesson, though, is that you should get only the type of results you'll actually be working with. So if you're using text indices only in your code and you happen to be able to use sybase_fetch_assoc(), be sure you use it rather than using the more top-heavy sybase_fetch_array(). I also benchmarked this on a mysql database with similar results.

      Consider an application that lists some information about a service quote and then displays an editable grid of line items for that quote.  This is a common enough scenario in thick applications, in which users change what info they want and don't have to worry about submitting  after changing each row. I was charged with building such an application for the Web. In order to update the grid data, I have to keep track of the field names for each row by prepending a unique identifier to each field within the row. And when I'm actually processing the results, I have to generate a separate UPDATE statement for each existing row, a DELETE statement for any rows whose delete checkboxes have been checked, and an INSERT statement for a new row if applicable. That's on top of updating the general information about the quote. On large quotes, this can amount to 60 or 80 queries.

      There are two ways of handling these queries. We can execute each one with a separate call to the appropriate _query() function or we can concatenate them into one string and send them all with one call to the _query() function, cutting down some overhead that occurs with each call to the _query() function. The examples below use my database library, but they're sufficiently concise and clear that they should be easy to follow and replicate within your own libraries. The following code block connects, selects a database, and executes 200 queries one at a time with individual calls to the query() function. On my system, this executes in about 225 milliseconds.

       $db=new DB("server","username","password","database","sybase");
        $db->connect();
        $db->usedb("database");

       for($i=0; $i<200; $i++){
         $sql = "UPDATE [Table] SET [Field]='" . $i . "' WHERE [No.]='50000003';\n";
         $db->query($sql);
        }
        $db->disconnect();

      The block below connects, selects a database, and creates a single string composed of 200 queries. It then sends that string to the server in one call to the query() function. It takes about 92 milliseconds to execute.

       $db=new DB("server","username","password","database","sybase");
        $db->connect();
        $db->usedb("database");

       for($i=0; $i<200; $i++){
         $sql .= "UPDATE [Table] SET [Field]='" . $i . "' WHERE [No.]='50000003';\n";
        }

       $db->query($sql);
        $db->disconnect();

      The overhead differential in this example in particular is fairly minimal in terms of actual milliseconds saved, but when you're trimming fat to speed up a resource hog of an application, every little bit helps. It's always nice to make something run more than twice as fast as it did before. One caveat: if you send too much info in the concatenated string, some of your queries might fail. I imagine this happens because the buffer fills up and queries after the "full" point are truncated and not executed. If you know you'll be sending a lot of long queries using this method, you might want to consider pushing them onto an array and joining segments of the array to send smaller batches of queries rather than adding to a string.

      This is a hard one to test accurately because running a single instance of a pattern match takes negligible time that's difficult to separate from network latency. For example, in testing four different pattern matching functions, I found that none consistently took more than about 40 milliseconds to execute a single pattern match, even within a string of 20,000 or 30,000 characters. It's hard to chalk that up to sluggishness of a given function rather than to network or server load issues. The following table shows the functions I tested and the time they took to process:

      <table border="1" cellpadding="3" cellspacing="0">
        <tr>
         <td align="left" valign="top"><b>ereg()</b></td>
         <td align="left" valign="top"><b>preg_match()</b></td>
         <td align="left" valign="top"><b>strstr()</b></td>
         <td align="left" valign="top"><b>strpos()</b></td>
        </tr>
        <tr>
         <td align="left" valign="top">35.1 ms</td>
         <td align="left" valign="top">35.31 ms</td>
         <td align="left" valign="top">35.58 ms</td>
         <td align="left" valign="top">32.79 ms</td>
        </tr>
      </table>

      They're all pretty close. But when I do a lot of pattern matches within a script, rather than just the one instance, the function I use does seem to make a little bit of a difference. Consider the following test code:

       if(!$_GET["type"]){ print "Must specify type ereg, preg_match, or strstr."; exit; }

       //Generate a long string to perform matches against.
        for($j=0; $j<5000; $j++){
         $x .= "test " . $j . " -- ";
        }

       for($i=0; $i<10000; $i++){
         //For 10,000 iterations, run the specified function,
         //searching for the string "st 10".
         switch ($_GET["type"]){
          case "ereg":
           $y=ereg("st 10",$x);
           break;
          case "eregi":
           $y=eregi("st 10",$x);
           break;
          case "preg_match":
           $y=preg_match("/st 10/",$x);
           break;
          case "strstr":
           $y=strstr($x,"st 10");
           break;
          case "strpos":
           $y=strpos($x,"st 10");
           break;
          default:
           print "Must specify type ereg, preg_match, or strstr."; exit;
         }
        }

      The results of this test are as follows:

      <table border="1" cellpadding="3" cellspacing="0">
        <tr>
         <td align="left" valign="top"><b>ereg()</b></td>
         <td align="left" valign="top"><b>preg_match()</b></td>
         <td align="left" valign="top"><b>strstr()</b></td>
         <td align="left" valign="top"><b>strpos()</b></td>
        </tr>
        <tr>
         <td align="left" valign="top">1442.76 ms</td>
         <td align="left" valign="top">252.22 ms</td>
         <td align="left" valign="top">542.96 ms</td>
         <td align="left" valign="top">297.43 ms</td>
        </tr>
      </table>

      This really didn't accord at all with my expectations. I had expected preg_match() to run the slowest because it has to load and run perl's regular expression engine. And I had read that it was slower than eregi, which I expected to be the fastest of the functions. I ran these tests a number of times to verify that the time discrepancy wasn't merely a fluke, and the results were similar each time. So it appears, oddly, that if you're doing a quick pattern match, it may not matter which of these functions you use, but if you're doing many searches, preg_match or testing for a true return from strpos() might be your best bet. This is particularly heartening in that preg_match is more versatile, allowing for much more complicated matches than any of the other functions.

      I also tested ereg_replace() and preg_replace() for speed. Again, contrary to my original expectations, preg_replace() proved faster in both the single-replacement test and in 100 iterations of a block of test code very similar to that given above.

      <table border="1" cellpadding="3" cellspacing="0">
        <tr>
         <td align="left" valign="top"> </td>
         <td align="left" valign="top"><b>ereg_replace()</b></td>
         <td align="left" valign="top"><b>preg_replace()</b></td>
        </tr>
        <tr>
         <td align="left" valign="top"><b>One Run</b></td>
         <td align="left" valign="top">50.16 ms</td>
         <td align="left" valign="top">37.42 ms</td>
        </tr>
        <tr>
         <td align="left" valign="top"><b>100 Iterations</b></td>
         <td align="left" valign="top">1383.75 ms</td>
         <td align="left" valign="top">159.75 ms</td>
        </tr>
      </table>

      I can't help thinking I've overlooked some condition in my testing, as I've always read that the preg functions were slower than the ereg functions. Based on these metrics, however, I've concluded that the preg functions may be the best to use, given their flexibility and apparent superior speed.

      I manipulate information pulled from a database on a daily basis, pushing rows around and testing for various conditions that will determine what I'm to do to my data next. Among my most commonly-used functions are those that allow me to interact with the database and those that I use to test for and modify patterns within strings. Knowing that my applications will function better on the whole if I fine-tune these frequent operations, I sought to measure their performance so that I could strive at all times to use the most efficient code. Findings in this batch of experimentation were that trimming database results prior to actually manipulating the data makes a significant difference; that batching INSERT, UPDATE, and DELETE queries rather than performing them one at a time can more than double the speed with which these queries are executed; and that the preg functions seem to be faster and more versatile for text manipulation.

       

      Daryl Houston

    Your message has been successfully submitted and would be delivered to recipients shortly.