Testing and tuning TYPO3 performance
December 23, 2005
Author: Michael Scharkow & Steffen Müller
TYPO3 has a reputation of being a resource hog, and performance on popular websites (such as typo3.org) has often been somewhat flaky in the past. This article addresses some recommendations on performance tuning as well as some testing scenarios with regard to TYPO3.
Website administrators and people who plan to launch popular and heavily frequented sites with TYPO3.
Some basics about website performance
"Premature optimization is the root of all evil."
Ideally, website performance should not be an issue as long as your application is scalable with more and better hardware. Why bother with expensive performance optimization when you can simply throw more hardware at the problem? This principle is successfully applied in basically all really large dynamic websites, such as Wikipedia, Slashdot, LiveJournal and most prominently Google.
TYPO3 is in our opinion basically suitable for large and popular sites. But most websites are probably not that big, so this paper examines performance in a typical one server setup.
Our central question is: What can you do about TYPO3 performance except buying more and better hardware? Which factors are influencing your site's speed and server load? Where should you start optimizing, thus avoiding premature optimization? How can you test your website, or even simulate a slashdotting (which is basically a friendly Denial of Service (DOS) attack)?
Performance tests sometimes come along with unforeseen and nasty consequences. Before you start, make sure that you don't run into troubles with your ISP/ hosting provider. Don't do anything on your production server unless you know what you are doing. You might crash your machine or get locked out from it. In any case, this document comes with absolutely no warranty.
For those who can't wait
For the impatient, here's a quick overview about what we did to gain performance. The recommendations are based on our server setup, using the TYPO3 Quickstart package. For more details, have a look into the following chapters. Notice that we do not reveal any secret magic performance tricks, sorry.
Install the latest and greatest 2.6 kernel
Install a libc6 with Native Posix Thread Library support (NPTL).
Add or modify some of the following options in my.cnf:
Disable bin log
Switch on and increase query_cache
query_cache_limit = 2M # default was 1M
query_cache_size = 64M # default was 0
query_cache_type = 1
table_cache = 256 # default was 64
key_buffer_size = 64M # default was 8M
Adapt MaxClients to find a balance between performance and server capacity, a conservative start value is 32.
disable excessive logging
Disable DNS lookups, your log file analyser can do this afterwards
Install eAccelerator using the following minimal configuration for one(!) average website (in our case, the Quickstart package):
If you are hosting more than one site, use the same source directory for all sites.
Make as many pages cacheable as possible.
Watch out for performance killer extensions from TER by constantly testing their impact on performance.
In order to tune performance you have to benchmark your site first, so that's what we did. Please remember that we're not really interested in absolute numbers but rather in relative performance gains and losses when tweaking different parts of the system. On the other hand, our server setup is certainly quite common, especially in current root server hosting. You might therefore be interested in the actual numbers, so you can get a rough estimate of what your server should be able to deliver. If your performance is far worse than ours, think about tuning it. If it is far better, please drop us a line on how you did it.
We used several servers and clients for these benchmarks, but most of them were taken on a fairly similar and commonly used hardware and software: an AthlonXP 2200 with 1 GB RAM, basically commodity hardware.
Since we were interested in the effects of various software versions, we used the following platforms for testing:
GNU/Linux: Debian Sarge and Ubuntu Breezy
MySQL 4.0 and 4.1
Apache 1.3 and 2.0, lighttpd 1.4.8
PHP4 and PHP5
TYPO3 3.8, Quickstart package
Note that even if we used Linux to test our installation, most of the recommendations are valid for Windows users, too.
Before we begin, a disclaimer: We are aware that, for one reader or another, we used the wrong methods, got the wrong results, made wrong assumptions and drew wrong conclusions. Our point is not giving you all the low-level tricks on file systems, database tuning and kernel hacking. We were not experts on those fields, and plenty of information on this is available already. We concentrated on two things:
How much requests could our website handle simultaneously?
How much load did the server have when being heavily used or even being slashdotted?
For those purposes, we found various tools around in the web. Our favourite candidate was Apachebench (ab), part of the Apache utilities, to measure the HTTP output of our site. There were other tools such as siege and flood for less synthetic testing, but we certainly did not expect any fundamental differences in the results.
Benchmarking and tuning the parts
In this section we discuss various factors that can influence our website's performance. We introduce some commonly used tuning parameters and see whether they do effect the performance. The chapters order goes from low-level to high-level tuning. If you think one factor is not relevant for you (because you can't change it, or you know it's not a bottleneck), simply skip the section and move on.
We started our benchmarking with a worst-case scenario: A freshly installed Debian Linux server, with a default Apache1 and MySQL configuration, mod_php4 and a fresh 3.8 Quickstart package. Because we knew we should be optimizing for less-than-optimal situations, we did choose to benchmark with TYPO3 caching turned off, just as a baseline, so we could be sure that real-world performance would probably be better. For all testing stages, we used a second host as a client, which was located on the same 100MBit switched network as the server.
In order to have a comparable value, our first test was to request a 24 kByte sized static file (so TYPO3 was not involved at this time) from the server. This gave us an approximation of the maximum speed of our server.
We fired up Apachebench with 1000 requests and a concurrency of 100. Note that the concurrency was the actual crucial, the number of requests was just set to a higher number so that our averages were more robust.
$ab -n 1000 -c 100 http://yourserver/static_file.txt
The average time of one request was about 3ms. That meant more than 300 concurrent requests per second (300 Req/s).
The next step was to fetch some TYPO3 pages:
$ab -n 1000 -c 100 http://yourserver/index.php?no_cache=1
Be warned before you try to play with this kind of test: Chances are your webserver will get desperately overloaded after a few seconds, "DOSing" yourself.
The result in our environment was, well, somewhat disappointing 4 Req/s. Compared to the static test, that was nothing. Time for some tuning.
The choice of operating systems does in many cases make a difference in performance. Some popular benchmarking comparisons stressed the fact that,
Linux 2.6 performs better than 2.4, for both Apache and MySQL. See http://www.2cpu.com/articles/98_1.html and http://www-128.ibm.com/developerworks/linux/library/l-web26/?ca=dgr-lnxw07KernelCompare
Linux scales better than *BSD, not to mention Windows or Solaris. See http://bulk.fefe.de/scalability/
Many of these benchmarks were either low-level or made on extreme hardware (heavy SMP machines), so we checked if simply replacing the stock 2.4.x kernel with 2.6.14.x and activating Native Posix Threads Library (NPTL) did enhance performance: No, it did not significantly. The overall effect was hard to measure and not even stable, still with a result of 4 Req/s.
Conclusion: Does that mean tweaking the OS is useless? No, but in our case threading and scheduling were simply not the bottleneck. If you find out that spawning threads and processes is actually bad on your system, feel free to tune and change it. Especially if you use Apache2, NPTL is a must have to use threads instead of processes. Compared to processes, threads significantly reduce the load Apache2 causes on your machine. But anyway, this does not mean you would automatically serve your pages faster.
One more hint on kernel stuff: In one of our tests, we somehow managed to synflood the network stack of the 2.6 kernel. The result was that incoming TCP connections were dropped. Since the test results within that synflood were totally unreliable, we advise you to check your kernel logs for TCP drops from time to time.
Database - tuning MySQL
A frequently suspected cause for bad performance is the database. We know TYPO3 is pretty DB-intensive. It does multiple queries on each and every request, like getting data from the cache, updating sessions, etc. So the database is heavily used. MySQL optimization is a science by itself, and there are a lot of myths and useful recipes floating around on how to tune it. Note that there's a detailed chapter in the official MySQL documentation about that topic. We mainly concentrated on tuning the server parameters by altering the configuration and measuring the effect:
We raised max_connections to more than 100 - since we were using (because it is default) persistent connections, the number roughly corresponded with the number of MaxClients in Apache (see below). Setting this number up did not cost performance and gave us some reserves for additional requests.
A next step was to tweak query_cache_size, query_cache_limit, key_buffer_size and table_cache to higher values, since we had enough spare RAM - this seemed to be quite useful for tuning because TYPO3 does a lot of repetitive queries, so caching queries should work well.
Then we switched off the bin-log, because we did not use replication and our backup strategy was not based on logging every single write query.
Although tweaking I/O for the DB (by using a dedicated RAID, a different file system, or noatime options) was even mentioned in the TYPO3 wiki, we did not spend time on that. The reason was, that we had enough RAM to do memory caching, which was generally faster than disk access.
After tuning our database, the result was, again, nothing significantly changed: 4 Req/s. Checking the MySQL stats (using mysqladmin status), we saw that caching worked quite well, so where was the problem?
It turned out that the database was simply not the bottleneck in our setup, for several reasons:
In a setup with one website, we did not have that many simultaneous connections, so spawning threads for queries is not a problem. This would probably change once we serve several websites.
The Quickstart site served (mostly) static content, which was perfectly MySQL-cacheable. This would probably change once the amount of dynamic content grows.
Conclusion: MySQL tuning was not very useful in our case, but could help in other cases. The impact of the database grows with the number of sites and the amount of dynamic, non-cacheable content you serve.
Webserver - Apache and alternatives
The next of the usual suspects was the webserver. For a number of reasons, Apache is the standard web server for most TYPO3 installations, therefore we concentrated mainly on tuning and comparing the Apache 1.3 and 2.0 series. Additionally, we had a look at the relatively new lightweight webserver lighttpd which seemed to be the first of the smaller server that had all the features needed for TYPO3 hosting.
Tuning Apache 1 for TYPO3 was not really rocket science, because we were mainly fiddling with the parameters related to forking new processes with requests. The main issue was the number of MaxClients, or maximum child processes spawned. Before any tuning, the default number of MaxClients was 150. This did not cause problems when serving only static files. But once TYPO3 was involved, each of those Apache processes included mod_php and code execution, so they did consume quite an amount of CPU and memory (remember that 32MB memory limit you set in php.ini?). Our machine was not powerful enough to handle that amount of simultaneous requests and all server power was sucked up in seconds. The solution was to decrease that value, so that less processes were forked and the host was again able to breathe. The drawback was that we could not accommodate all requests at the same time if there were more parallel requests than MaxClients. They did have to wait in the queue and, in the worst case, timed out.
So we had a trade-off between being nice to everybody (and spawn processes like hell) and risk that our server burns in no time, or limit the number of processes and decrease load and request time for the accepted connections. This required some adjustment and trial and error. However, we found that, at this stage of our tests, we could do almost as many requests per second with MaxClients 32 than with 64, with considerably lower load (so that our ssh session still reacted without much delay).
Another solution was to use another process model, introduced with Apache2, that did not spawn expensive processes but cheap threads. This "worker" module is reported to be significantly less resource hungry and scalable. However, it was not marked stable with mod_php at the time of our research and could not even been installed without much hassle on Debian. Therefore, we only tested the Apache2 "mpm-prefork" module (without threading), which unsurprisingly gave approximately the same performance as plain Apache1.
Another option, similar to the Apache2 worker approach, was to use a lightweight non-forking webserver. We knew that Apache1 was not the fastest option for pure static file throughput. There is a long list of small webservers, both free and commercial, that outperform Apache1 in this discipline. The trouble was we were not serving only static files, and most of those servers had bad support for dynamic webpages. One of the better-looking alternatives is lighttpd, which fully supports the FastCGI interface for long-running PHP processes.
We compared lighttpd+FastCGI+PHP with Apache+mod_php. The results were the following:
Lighttpd did a substantially better job than Apache when serving static files.
Lighttpd did, however, not perform better than Apache when serving TYPO3 sites. In fact, it was a little slower (serving 20 Req/s compared to Apaches 23 Req/s, both with eAccelerator running). We couldn't confirm the 20 percent speed increase reported by others, maybe because TYPO3 was not simply outputting phpinfo().
Lighttpd, in our configuration, caused a lot less load on the server when serving TYPO3. The four (!) FastCGI-processes did almost the same work using less resources.
At a first glance, lighttpd seems to be a good option because of its lean design, neat configuration and scalability (e.g. the distribution of FastCGI processes on different machines and built-in loadbalancing).
Conclusion: Be careful when configuring Apache if you expect high peak load. Adjust the value of MaxClients for your personal needs by trial-and-error. Have a look at lighttpd if you like smaller tools and are able to live without the modules and support Apache offers.
There were a lot of other parameters that we could have tweaked to enhance Apache's performance, like disabling AllowOverride, HostnameLookups or access logs, compress output. But again, since we were mostly dealing with relatively few, but expensive dynamic page requests, these issues would effect our performance only marginally. Just like MySQL - tune if you like, it may be worthwhile if you grow really big. For further details on Apache performance, we recommend two articles, one on the official Apache website and another at http://www.xs4all.nl/~thomas/apachecon/PerformanceTuning.html.
The next candidate for optimization was PHP, and a very suitable candidate it was. In fact, PHP optimization was so trivial and effective that we hardly dare to tell you: We installed a PHP accelerator, such as eAccelerator or Zend Optimizer, and got huge speed improvements with TYPO3. The reason for this was that TYPO3 generally does a lot of computation, code lookups and expensive calls, so that caching the byte-code payed off really well.
At the time of our research, there were at least four acceleration tools for PHP. Since our aim is not to compare competing products, but show the impact of PHP acceleration, we concentrated on one product. Our choice was eAccelerator, simply because it's was free, GPL licensed and available as a Debian package.
First of all, let's have a look on how eAccelerator works in principle. Installed as a PHP module, it gets activated and configured in php.ini. There are two major components to speed up the execution of PHP scripts: Code optimization and caching. eAccelerator first analyses the compiled bytecode and then tries to optimize it for speed enhancements. After that, it gets cached as an object in the shared memory and/or on your hard disk. Once cached, far less operations (and thus time) are needed to execute the script code. With regard to the amount of code in TYPO3, it's easy to understand, why eAccelerator comes with an enormous performance boost.
Now let's have a look at it in practice. Our first step was to find out, which component had more impact on performance. Both optimization and caching work independently and can be turned on/off in php.ini:
; Turn on caching
We had two tests, in each case turning on only one of both. The result was clear, optimization did not significantly speed up page generation, but caching did. Simply activating caching without any further tweaking gave us 21 Req/s for FC Bigfeet, that is five times the performance. Although we could not measure changes when using only bytecode optimization, combining both seemed to be the best practice. Of course, the improvement might vary with the complexity of your TYPO3 site, but since there are practically no drawbacks, using a PHP accelerator is a must for every TYPO3 site.
Since caching was so useful, we had a closer look on memory usage. We used a simple usage statistic tool (eaccelerator.php), which comes along with eAccelerator. Among other things, it provided information about the allocated memory, the number of cached files plus the amount of memory each cached file needed. Our question then was how much memory did eAccelerator consume with TYPO3. The goal was to provide a basic rule. We quickly clicked through all the FE pages of FC Bigfeet and got a total amount of 8 Mbyte. Rushing through the BE, the amount increased to 26 Mbyte. Additionally testing some other websites on the server, we found out the following:
The more extensions and websites we had, the more memory we needed to spend.
To find out a good value of how much memory to spend, we had to test our requirements over and over. The more extensions we added, the more memory we needed to share. In most cases, it was possible to measure the total amount of needed memory for a website. As a rule, 32 MByte for one average site was a minimum:
As we had more than one website, we could minimize the memory consumption by using the same source directory for all sites. This is a must for multihosting server!
Reaching the limit of available memory, we found two solutions:
Remove less frequently used scripts from the cache by short time to live (TTL) values:
Turn off eAccelerator for the BE, by adding the following line in ./typo3/.htaccess:
php_flag eaccelerator.enable 0
Although the eAccelerator website did miss some user docs in its documentation section, the software is shipped with a README file, where we found all necessary information about how to install, configure and use it. On the top of it all, eAccelerator came along with an API to integrate accelerating functions into ones own scripts.
Conclusion: A PHP accelerator is a must have. With eAccelerator, the load on our server decreased significantly because less script parsing and compilation was involved. So we could raise Apache's MaxClients back to 64, a nice side effect.
At last we have a closer look at TYPO3 itself, for which there are also a few well-known issues concerning performance tuning. Note that the TYPO3 developers have done a lot to improve the code execution speed over the years, like tuning regular expressions and reducing expensive I/O and database lookups, and continue to do so. But TYPO3 is a complex application and all the features you love cost performance. DBAL, for example, does have quite a negative impact on performance, but at the same time it increases scalability, flexibility and stability for really big web sites (and we know that these sites scale by adding more servers).
What did we do about TYPO3 performance without touching the code?
Using the caching abilities (caching pages) gave us huge performance boosts. Our benchmarking revealed a performance of 55 Req/s, that was around 2.5 times faster than the non-cached pages. So we recommend to make as many pages cacheable as possible.
We tried to make all used extensions cacheable and removed non-cacheable objects from our templates. A lot of problems related to pages not being updated properly could be solved with proper configuration, like the underrated TCEMAIN.clearCacheCmd
Finally, we watched out for performance killer extensions from TER. Extensions that are triggered with every request are notorious, like those that transform HTML code shortly before it is outputted.
After a lot of successful and useless tweaking, we finally managed to improve our server performance from 4 Req/s to 55 Req/s, That is 14 times faster since we began to play with the knobs. You can see the changes in the chart below. Tuning parameters were added from left to right, beginning with our baseline (green bar), then tuning MySQL, the Linux kernel and libraries, installing eAccelerator and finally turning on caching in TYPO3 (orange bars).
Notice that low-level tweaking (Linux Kernel, MySQL) did not improve the performance of the Quickstart site, but remember that its page contents are basically static. The more you increase the share of dynamic content in your website, the more the database could become a limiting factor. Caching was the most prominent tuning candidate: One one hand, by using eAccelerator to cache precompiled PHP bytecode, speeding up the execution of PHP scripts. On the other hand by using the caching ability of TYPO3, storing once rendered pages as HTML in the database.
Fortunately, most of the TYPO3 sites we know serve mostly static content, so the case of a completely uncacheable site is quite rare. In fact, caching increases performance so much that, unless you need real-time user-specific content, even extremely short caching intervals pay off immediately.
Finally, here come our recommendations:
Most importantly, install a PHP bytecode cache, such as eAccelerator, and make sure you use TYPO3's caching mechanisms, which are introduced in two excellent articles on the typo3.org development section.
Furthermore, enable MySQL caching if your database is the bottleneck and produces system load. To avoid too much load on your machine, try to reduce the number of MaxClients in the Apache configuration file.
If you like low level tweaking and have control over your systems kernel and libraries, go for a Linux 2.6 kernel with NTPL.
In any case, spend enough memory for each of the candidates.
We'd like to thank Patrick Gaumond for reviewing a draft of this article.
About the authors
Michael Scharkow and Steffen Müller both are students at the Freie Universität in Berlin/Germany. Since relaunching their institutes website with TYPO3 in 2002 they have been active members of the TYPO3 community.