Loading ...
Sorry, an error occurred while loading the content.

High Performance Practices

Expand Messages
  • Hasin Hayder
    High Performance Practices When Raw Speed is a Requirement Leendert Brouwer goat@daholygoat.com Sometimes Web applications need to be extremely fast. At that
    Message 1 of 1 , Jun 5, 2005
    • 0 Attachment
      High Performance Practices
      When Raw Speed is a Requirement
      Leendert Brouwer goat@...

      Sometimes Web applications need to be extremely fast. At that point, everything matters - including every quote and every line that could consume a little too much, and every database query that will take up too much execution time. In this article we're going to focus on practices that will improve the performance of your Web application, and tools that can help to make your Web applications act like rockets - pretty darn fast.


      First, a word of warning - not all of the content of this article is suitable for college professors. We're going to do some things that are rather unconventional, and they might be unhealthy for servants of the orthodox software engineering kingdom, as we know it. If you're still reading, you're either a curious college professor or just not a college professor. Either way, fasten your seatbelts - it's going to be a fast ride. First, we'll look at a few tools that use interesting techniques and are already available for you, provided by either the open source world or commercial organizations. Opcode Caching
      The first technique I'm discussing has nothing to do with the way you're actually coding yet, but cannot be left untouched in an article that is about raw speed. You may or may not have heard about a technique called opcode caching. What is it, and more importantly, why does it make our code faster? I'll explain in a short and simple way.

      Each time we run a PHP script, the Zend engine compiles the script and then executes the script. During the compile process, machine code is generated - these machine codes are also referred to as opcodes. The opcodes eventually get executed. Sounds fine, right? After the script has been executed, the opcodes for the script are dropped. Uh oh! That doesn't sound so fine. Could we have reused the opcodes if our script has not been modified, since the same opcodes would be generated?

      Indeed, and that's what an opcode caching mechanism does. It stores the resulting opcodes from your script in the server's memory when a fresh version of the script runs, and executes the same opcodes when the script is requested, if the script hasn't changed since the last time it was executed. If the script has changed, it will not grab the opcodes, instead it'll compile the script to opcodes again, after which these opcodes get cached again, and so on.

      Running opcodes instead of the actual script can increase execution time exponentially. It is not uncommon that the speed of your Web application increses by three or four times when using an opcode caching mechanism. Common opcode caching mechanisms are:
      • APC - free of charge and open source
      • IonCube - free of charge but commercially licensed
      • Zend Accelerator - not free of charge and commercially licensed, but very good (PHP Magazine Reader's Choice in 2003 for best bytecode cache)
      Alternatively, if you want to build your own opcode caching mechanism, you can read up on how to do it in Issue #05.2003 of the International PHP Magazine in the article "APC - Compiler Cache Internals ", by George Schlossnagle who is one of the authors of APC.

      Fig. 1: Opcode Caching

      Code Profiling
      Code profiling essentially means measuring how fast a certain piece of code executes. First, we'll see how this can be done in a simple way through your own code, and then we'll move on to a few sophisticated ways to do more advanced profiling.

      The most simple way of profiling is to take the time before the block of code you want to measure and the time after that block of code, and subtract the first from the last - you should have the execution time. Listing 1 shows how this can be done.

      Listing 1 <?php
      function get_formatted_microtime() {
      list($usec, $sec) = explode(' ', microtime());
      return $usec + $sec;
      }

      $start = get_formatted_microtime();

      for($i = 0; $i < 100; $i++) {
      echo 'PHP for life!';
      }

      $end = get_formatted_microtime();

      $total = $end - $start;

      echo '<br><b>Block execution time: '.round($total, 6).' seconds</b>';
      ?>
      In the script, the get_formatted_microtime() function returns the total of seconds and milliseconds from microtime(); we use both seconds and milliseconds instead of just seconds to get a more accurate result. Then, the result of this function is assigned to a variable called "start". Next, we note the time (in microseconds) before the code is executed. Then we include the code we want to profile - in this case, I've created a loop that says "PHP for life!" a 100 times. After the for loop is executed, we note the time (in microseconds) and store it in a variable called "end". Now, all we have to do to get to the time it took to execute the code block, is to subtract the 'start' from the 'end'. This is a simple way to measure the execution time of your code.

      If you need some more options than simple end/start time subtraction, use the Profiler class in the PEAR Benchmark package (pear.php.net/package/Benchmark ).

      There are a couple more elegant solutions at our disposal. Most PHP debuggers have profiling functionality, and Xdebug is a good example of such a debugger. Xdebug is a PHP module written by Derick Rethans, and it does much more than simple profiling. Once you've compiled PHP with Xdebug, you'll have functions such as xdebug_start_trace(), xdebug_stop_trace(), and xdebug_dump_function_trace() that'll provide you with useful information about the block of code you're profiling. It is actually hooked into the PHP engine, and it displays memory usage over a timeline, the number of calls to a function, line number for function calls, etc. For more information on profiling with Xdebug refer to www.xdebug.org/docs-profiling.php. Make it part of your toolkit - you won't regret it if you care about the performance of your code.

      The other debuggers that you can use are DBG, APD, and there is one in Zend Studio as well. Tweaking a Database Interaction
      There are a number of things you can do to speed up your database operations. Some of these techniques may seem rather extreme, but they do make sense when extreme is the way to go. I would not recommend applying them if you don't need them, but it's good to add this info into your bag of tricks and use it when the situation demands it. Reconsider Joins
      The college professors I mentioned earlier should please close their eyes for a bit while we continue - I'm going to suggest something evil now.

      Joining database tables is good, and it saves you from writing a considerable amount of code sometimes. Such joins are considered to be a common practice when working with databases. However, joins exact a price - memory and speed. Situations in which larger tables are being joined are often the source of performance trouble. When joins are carried out, the DBMS stores the result table in memory temporarily and if these result tables are large, the server memory fills up quite easily. You might have noted MySQL's 'Table is full' error - this is a possible result of the problem I described earlier. I once had to tell MySQL explicitly to put the temporary tables on hard disk space instead of RAM because I simply ran out of RAM to store them. Although later versions of MySQL handle such situations in a better manner, joins still use up quite a bit of memory space. Needless to say, the whole process of creating such temporary tables takes time, and thus slows things down.

      When performance really is an issue, it might be better to avoid joins and put the join logic in the PHP code instead of the SQL code. Note that extreme decisions like this should not be made common practice. You won't have too much trouble with a few 5000 record tables and a Web site that doesn't get so many hits. But when you find yourself in a traffic intensive environment, it might be wise to consider moving this sort of logic to the application level (to the PHP scripts, that is).

      If you still have to use heavy joins in a high traffic environment, make sure you benchmark and stress test your code well, move your database to a dedicated disk or server, or possibly use database replication. It would also be useful to have a look at SQL's EXPLAIN functionality - EXPLAIN returns the route your query will take, so you can see what optimizations will be used, and what optimizations you might want to add. Figure 2 shows the sample output of an EXPLAIN statement.

      Figure 2: Sample output of an EXPLAIN statement

      Careful with Indexes
      Indexes increase the performance of search operations, if they are used properly. Also, indexes are important in high traffic environments. However, sometime programmers/DBAs forget that indexes are only needed to make selections faster - yes, selections, not inserts, deletes and updates - we might just forget that sometimes.

      When optimizing your database, it is important to think about what kind of application will use the database. If you have an application that does inserts 90% of the time, and selections only during the other 10%, you probably don't want your fields to be indexed by default. This is because indexes can slow down heavy inserts and updates significantly. During data insertions, the DBMS will have to recreate the index on the table on which the insert is being performed. So if you have 10000 inserts, the whole index will need to be recreated 10000 times. Consider a survey application that gets many hits per day. When a user fills out the survey, what kind of database operations do you expect? That's right, inserts. In this case, it doesn't make a lot of sense to have indexes on the tables for the results, since selections on the results are exceptional in this situation, if they're even there. If indexes were included, they would be recreated every time some data is inserted into the database, and one can imagine what would happen if there was a large number of inserts per second.

      On a different note, we would need to use indexes to analyze the statistics of the survey. Especially with millions of records, it can improve the query execution time by quite a few seconds. Listing 2 shows an example of how this would work.

      Listing 2 <?php
      // connect to database..

      // create the index
      mysql_query("CREATE INDEX sid_index ON submitted_answers (sid)");

      // go do a bunch of select queries here..

      // drop the index
      mysql_query("DROP INDEX sid_index ON submitted_answers");
      ?>
      In this script, an index is created before the select queries are carried out - then the selections are done, and they are done fast. After the records are retrieved, the index is dropped so that insert operations are yet again optimized on the application's front-end.

      Another thing to note is that indexes take up hard disk space. Although that issue is not very important since hard disk space is cheap, you might want to think about it when you create a bunch of indexes on tables that are going to be large. As with the other practices in this article - you don't want to do this in a Web application for the bakery around the corner (unless it's a very popular bakery with an amazing online presence where you can bake your own bread and have it e-mailed to you, which will generate some traffic!). Avoiding BLOBs
      BLOB stands for Binary Large OBject, and there has been a lot of heated debates about them. In most of such heated debates self-proclaimed gurus, saying things like "just trust me, don't use them" "blobs are the suX0r" and "did you drink too much Dr. Pepper?", eventually obfuscate the pros and cons. In this section, we'll see why using BLOBs is generally a bad idea. But before doing that, let's have a look at why they are a good idea. You have an excuse to use BLOBs when you want to use database replication, since it's a real pain when all your data is replicated, except for the binary data, which is actually part of the data you want to use and work with. In any other situation, use the server's filesystem.

      The first problem with BLOBs is that they can fragment the server's hard disk pretty badly. In a typical scenario - a large BLOB is removed, smaller BLOBs are inserted, yet another one is removed, then another large BLOB is inserted, and this process goes on - very soon, your disk will start to look like a battle field on the day after the war. Needless to say, this kind of fragmenting slows down disk performance.

      Another reason not to use BLOBs is the type of applications that are usually built when using BLOBs. One example of such an application is an 'image gallery' - in an image gallery, it is common to load a set of, say, 20 thumbnails per page. When using BLOBs to store such thumbnails, you'll need to have a file that will output the correct headers and the binary data that comes from the database, for example . The show_image.php file would then load the binary data with ID 222 from the database and spit it to the browser together with the HTTP headers to indicate that this is about an image. This means that for 20 thumbnails/page there will be at least 20 queries for each of those pages. Instead, if we link to the thumbnails on the filesystem we can get the whole thing done in a single query - now that's a performance gain. The PHP Side of Things
      PHP code can be tweaked to increase the script's execution speed. Some of these practices may look like common sense, while some others might be revelations - either way, they're fast and that's what we want. The Deal with Quotes
      Insiders know this, but apparently many outsiders don't. Almost every time I read code online I find it astonishing how quotes are being used in PHP code. Let us take performance as our primary argument. Firstly, "$variable" is not necessary, it does not need the quotes. Secondly, when you have a string that does not need variable extrapolation, use single quotes. And thirdly, when you have a string that does need to do variable extrapolation, use double quotes. Extrapolation is: throwing variables right into the string and having them replaced by their values.

      There's more to the 'quote' issue. Many people do not quote their array keys. Consider a reference such as $foo[bar]. Internally, the PHP engine will look if there is a constant called 'bar', and throw an internal error if there isn't. After that, due to PHP's forgiving nature, it will treat 'bar' as the string "bar" - this will use up more time, and it's not good to waste time. A better way is to use quotes right away, and not cause an internal error, so that the script is executed faster. Considering the Pressure Areas
      Although the subtitle might sound like something your doctor would say when he/she is checking out your strain injury caused by excessive hacking, I found it an appropriate way to describe yet another problem when facing demanding situations in a Web environment. Sometimes we become used to certain practices and often repeat them out of habit - this is not always a good thing. For instance, say, I have to create an online product catalog for one of my clients and need the data from the catalog pulled out on the front-end, so that visitors can see the data. Typically, I would create a Content Management System (CMS) where my client can maintain the catalog, and have some PHP code pull the catalog items from the database on the front-end. There's nothing wrong with this scenario, if it's about a lightweight Web application. It might be bloat in heavyweight situations though.

      So how do we avoid such situations? Firstly, we need to determine where we need our application to be dynamic? Well, we definitely don't need it on the back-end where the maintainers of the site will update the catalog. Further, contrary to popular belief, we do not always need it on the front-end either. A large percent of the requests that are done on the front-end, to view the product catalog, would just pull out the same data. Besides, the front-end is probably quite crowded too. There are a few routes we can take to go around this. Firstly, we could generate static HTML pages of the data we need on the front-end. If possible, we could set up a cronjob that executes every night, to check if there have been changes to the database since the last time the HTML pages were generated and generate new HTML pages if that is the case (you can read more about setting up a cronjob with PHP in Issue #05.2003 of the International PHP Magazine column titled "Tips & Tricks"). Another route we can take is simply page caching. There are several page caching mechanisms available online, Smarty being the most famous one. Careful with Object Designs
      This section is relevant for those who are into Object Oriented Programming (OOP). A while ago I found myself in a situation where object stacks were causing performance problems - I'll describe the problem in a simple manner - I have a class called School, and when School would be instantiated, it would load Student objects for all students for the school, as they are part of the school. The student data is retrieved from a database. When I designed the application, I thought there would only be 100 students in the school. But the city grew, all the kids wanted to learn, and the number of kids rose to 10000. So even if all I needed from the School object was the school's name from a list of schools, all the 10000 Student objects would get loaded upon instantiation of the School class.

      As mentioned above, we might not need the students present in our School object, not in every situation, anyway. To work around this problem, I created a loader method that would explicitly load the Student objects when they would be needed, and stop loading them from the constructor. That meant a significant decrease in memory load. The example described here is a very real-world example. In reality, such situations are sometimes a bit harder to recognize or predict. Nevertheless, it's not a bad idea to wonder whether your objects might become heavier than initially is expected. A good rule of thumb for me is to not load object collections before they are needed - it might save some trouble. Conclusion
      The practices described in this article may seem rather unorthodox, and sometimes even "not done". I wish some of them were not done. I have needed them along the way, and they have served as a cure for situations I was facing. It might sometimes feel like diving into the cold water to save the dog from drowning. But if it saves the dog from drowning, it must be worth it. Happy hacking.

      Links and Literature

      --
      --------------------------------------------------------
      Hasin Hayder
      SomewhereIn (www.somewherein.net)
      Homepage : www.hasinme.info
      MSN : hasin_hayder@...
      Yahoo! : augustwind16@...
    Your message has been successfully submitted and would be delivered to recipients shortly.