Using Cache-Control Headers in TYPO3
January 21, 2005
Author: Karsten Dambekalns
Large websites with a lot of visitors are often fighting performance problems. Since buying faster hardware isn't the best solution, TYPO3's page caching is very helpful in those cases. But this still needs server resources, so why not let the browser or a proxy cache the content? This article demonstrates how to configure TYPO3 to enable client-side caching and shows possible problems with this approach.
This article is intended for TYPO3 administrators who are considering client-side caching of content to improve performance.
You should be able to install and configure TYPO3 and know about the basics of TypoScript.
How it started
The Danish Consumer Agency runs the consumers' portal forbrug.dk which is one of the largest TYPO3 installations in the world with 130,000 unique visitors giving 13 million hits per month.
This was managed with only 1 server for the website and database and 1 server for delivering mail. During the fall 2004 they experienced spikes in their traffic (usually caused by being mentioned on TV and net media). These spikes were more than the installation could handle.
Upgrading the CPUs in the server was quickly ruled out as a solution: Doubling the speed would have cost at least 15,000 Euro in hardware alone.
So a software solution was needed. They analyzed their documents and most of them are not changed during 24 hours – these could be cached without any problems. A few, however, must never be cached as they change (e.g. the result of a search).
„We tried making a small change where all pages were cached for 24 hours using Apache's proxy module as caching engine. While this gave a remarkable speed up (in the order of 10 times) this proved fatal to the few pages that must never cached.
The right thing to do was of course to make TYPO3 compliant with current caching practices (e.g. RFC-2616). After identifying the changes needed in term of HTTP-headers that should be generated, we turned to Kasper Skårhøj and had him implement the changes. The identification process took in the order of a man week, and implementing the changes in TYPO3 and testing them took also in the order of a man week.“ (Ole Tange, www.forbrug.dk)
The theory of client-side caching
When we say client-side caching, this isn't strictly referring to client as in web browser, but can also refer to a proxy anywhere in the transmission chain.
Anyway, without the cache-control headers, a page that is requested from a site run with TYPO3 is always transmitted in full. The caching of TYPO3 suppresses only the full rendering of the requested page by TYPO3 (saving a lot of database queries), but the content read from the cache is still sent to the client all the time.
If we enable caching by sending the needed cache-control headers, any proxy or the browser itself may cache the page locally. So when a page is requested and an existing cached copy is still valid (i.e. not too old), it is not transmitted from the server to the client at all. Obviously this will reduce network usage as well as the load on the server.
To provide control over the caching of web pages, the HTTP/1.1 protocol defines a set of headers that can be sent back and forth between server and client. Those allow you to prohibit caching completely (as is needed for any page that is truly dynamic, such as search results or pages depending on your login).
Since in some situations a client may need to actively circumvent a cache and fetch a fresh copy from the original server, HTTP/1.1 defines ways to do so. If a client sends specific headers (usually after reloading with Shift or Control pressed), any cache involved must not use a cached copy but check for a new one first.
Enabling cache-control headers in TYPO3
There is a new option called sendCacheHeaders in the CONFIG top level object of TypoScript. If it is set, TYPO3 will output cache-control headers to the client. This conditionally allows client browsers and/or reverse proxies to take load off of TYPO3 websites.
The conditions for allowing client and/or proxy caching are the following:
The page was cached by TYPO3
No *_INT or *_EXT objects were on the page (e.g. USER_INT)
No frontend user is logged in
No backend user is logged in
If these conditions are met, headers are sent that allow caching of the page. In case caching is not allowed after evaluating the above conditions, headers are sent to prohibit client caching. Details about what headers are sent can be found in TSref.
To avoid potential problems TYPO3 of course honors client requests for cache revalidation and regenerates a page internally if asked to do so.
More on caching and FE user logins
Above it has been said that cache-control headers are not sent when a user is logged in. This is needed because in TYPO3 the same URL can show different content depending on whether a user is logged in or not. But if the same URL was visited without a login prior to the login (therefore allowing caching) the user will still see the page from cache when logged in - and so thinks he is not logged in anyway!
One way to solve this, is to have different URLs when users are logged in (this could be achieved with the realurl extension). Another way is to set the sendCacheHeaders_onlyWhenLoginDeniedInBranch option in the CONFIG top level object of TypoScript. Then cache-control headers are only sent when user logins are disabled for a page or a branch of pages (possible in the "Advanced" page type).
Since many sites only need the login in a certain branch of the page tree, disabling it in all other branches makes it much easier to use cache-control headers in combination with logins: cache-control headers will simply be sent when logins are not allowed and never be sent when logins are allowed!
Notice that enabling caching means you have to consider how log files are written: When a page is cached on the client, it will not be requested from the web server, thus no logging will happen. Often the client will ask the proxy and using Apache as proxy makes it easy to get the same logging as before. So there are ways to circumvent these problems but they are outside the domain of TYPO3 in any case.
Again Ole Tange: "After implementation we see a speedup factor in the order of 4-5. The reason we do not get more is that extension writers often do not think in terms of caching; so they will default their extension to be not cacheable (e.g. by making it an INT-object).
We have had remarkably few problems with the caching. If editors want a change to be published immediately they simply use the web browser's facility to bypass the cache (forced reload - CTRL-F5 or Shift-Reload)."
RFC 2616, section 14.9 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
Thanks to Ole Tange for co-authoring this feature as well as providing input for this article, Robert Lemke for preparing the nice template for this document and of course Kasper for all the fish.
About the author
Karsten Dambekalns lives in Braunschweig / Germany with his beloved wife Līga and their (nameless) espresso machine. He actively participates in the TYPO3 community and can confirm this is a great thing to do.