The mysteries of &cHash

Categories: Development Created by Kasper Skårhøj

The &cHash parameter of frontend plugins might have puzzled quite a few developers. That's why Kasper wrote an interesting article which explains how the &cHash mechanism works and what to avoid in order to make great, reliable plugins with TYPO3. A must read for every extension author!

Intended Audience

Developers of frontend plugins for TYPO3 and people who think hash is something you smoke.

Caching principles

The basic principle of caching is to spend time to generate an answer only once and then deliver that answer to the same question many times.

For websites that means using webserver resources for page generation only one time and then deliver the result over and over again when people request the page by the same parameters.

Basic page caching in TYPO3

When a page in TYPO3 is requested, an &id parameter is used to identify the page. TYPO3 will start a process which generates the page content from all the components it consists of. This process takes resources from the webserver. The next time a user asks for the same page, why not just pass the result of last time the page was generated? This is what we do when caching pages in TYPO3, thus saving server resource and speeding up the surfing experience. The generated pages are temporarily stored in a database table, cache_pages.

However TYPO3 allows the same page ID to produce different output based on more parameters than just the &id. For example the "&type" parameter is directly supported by TYPO3 for such a purpose (usually for generating sites with frames). So the caching mechanism has to store two different pages when "?id=123" and "?id=123&type=1" is called. In other words; A cached page must be identified by a combination of variables.

Snakes in paradise

An easy solution to caching would be to simply identify a page by its URL. So when someone asks for "?id=123" and "?id=123&type=1" we will just md5-hash the URL and use that to look up the page next time.

Well, that would provide the basis for a major DoS attack. What if someone requests 1 million URLs with random parameters like "?id=123&USELESS_PARAMETER=[any random number]" - that would spam our database with gigabytes of cached instances of... the SAME page!

Binding the devils...

The solution to this problem is to cache only when known and valid parameters are sent. So in reality TYPO3 will cache "?id=123&USELESS_PARAMETER=[any random number]" under the same identification as "?id=123" because the USELESS_PARAMETER is unknown to TYPO3 and the page content is the same. The reason why "?id=123" and "?id=123&type=1" will be cached as different pages is because TYPO3 is hardcoded to acknowledge the "&type" parameter as generating different page content. And even then the value of "&type" is checked - setting it to "&type=987654321" will not spam the caching table since no matching "typeNum" value is found in the TypoScript template!

... but not yourself

However developers want to use GET variables in their plugins which are not ignored by TYPO3s caching mechanism. For example a plugin found on page 566 might want to use the parameters "?id=566&tx_myext[uid]=44" to display the full news article #44 while the main page "?id=566" displays the overview. But because the GET variable "tx_myext[uid]" is completely unknown to the TYPO3 core the page will be cached only based on the page id - which is not good enough if you have more than one news article!

There can be three solutions to this problem:

  1. &no_cache=1 (slow): Each link involving more parameters than the &id must contain "&no_cache=1" in addition, effectively disabling caching of the whole page. In this way it is safe to create custom content based on GET variables but the performance is poor since no caching is applied.URL Examples:
  2. USER_INT (medium): Make the plugin a USER_INT type. In this way the page is cached only one time with a placeholder which is substituted by the content of the plugin being run dynamically on each request. This caches the static parts of a page but still involves the overhead of processing the plugin .With this solutions "?id=566" will be picked up for caching by the system while "&tx_myext[uid]=44" will be used dynamically in the plugin thus taking effect on the final output. This solution is the most flexible because you don't have to worry about caching and parameters. Recommended for realtime and user customized content such as search plugins, booking forms, user setting pages etc.
    URL Examples:
  3. &cHash (fast): Each link involving more parameters than the &id must contain "&cHash=[hash string]" in addition. The value of "&cHash" is used on the server to verify that the combination of additional parameters is made by the server itself and not forged by an outside spammer.URL Examples:

&cHash revealed

Weeding out the misunderstandings

A hash string is a short string with a fixed length which uniquely represents a larger string of unknown length. In TYPO3 - and PHP - such a string is often generated by md5() which creates a 32 character hex-string. TYPO3 will sometimes shorten such hash strings down to 10 characters which is considered accurate enough for some purposes.

Hash strings are nice because they are impossible to reverse engineer back to the original value. Additionally there is a very small probability that two different source strings will generate the same hash string. This means they are great for comparison and validation of data. This is exactly how we use them in TYPO3!

The term "hash" apparently comes by way of analogy with its standard meaning in the physical world, to "chop and mix." (according to http://en.wikipedia.org/wiki/Hash_function) - which is of course a very boring explanation, but I wasn't the author of that.

Generating the cHash string

The "&cHash" string is a hash of the - to TYPO3 unknown - additional parameters in a URL in the frontend. By creating and sending the cHash string along with the URL we can verify on the server that the additional parameters received in the URL are indeed created by TYPO3 when generating the page. In that case we should be safe by caching the page based on these extra parameters!

This is the flow:

Page being generated by TYPO3:

  1. A plugin (running as a USER cObject) wants to create the link "?id=566&tx_myext[uid]=44&type=1&tx_myext[details]=yes" and want it cached. The link is sent to an internal API function, ->typolink()
  2. The ->typolink() function sees that the URL must be cachable and begins to generate the cHash value of the parameters. In this process the variables "?id=566" and "&type=1" are filtered out because they are already known by the core system. So now "&tx_myext[uid]=44" and "&tx_myext[details]=yes" are left and after ordering them alphabetically the hash string "13b5d6efa7" is generated. (t3lib_div::makeCacheHash())
  3. The URL is returned to the plugin as "?id=566&tx_myext[uid]=44&type=1&tx_myext[details]=yes&cHash=13b5d6efa7"

Page is requested by the URL "?id=566&tx_myext[uid]=44&type=1&tx_myext[details]=yes&cHash=13b5d6efa7":

  1. The server immediately reacts on the known parameters which are "?id=566" and "&type=1" by validating the page id and type-number in relation to the Template.
  2. The server discovers that a "cHash" value is set; this means one thing: additional parameters in the URL not known to TYPO3 must match this value when a hash is created upon them. After filtering away the &id, &type and &cHash parameters TYPO3 is left with "&tx_myext[uid]=44" and "&tx_myext[details]=yes" which turns out to generate the hash string "13b5d6efa7" which is exactly the same as "&cHash"! This is proof that no parameters has been forged from the evil darkside. The page can safely be cached based on this custom parameter combination!
  3. In case the enemy had changed "&tx_myext[uid]=9876554321" in order to spam our cache table the resulting hash string generated will with all probability not match "13b5d6efa7": Page caching will be disabled immediately, but the page will still be generated trying to display news item 9876554321 (which probably does not exist, but so what? The cache table is not loaded with that...)

So the cHash is like a signature that ensures us the parameters are OK!

Forging &cHash?

Now, could the enemy calculate that cHash value himself? Well, only if he can guess the value of the $TYPO3_CONF_VARS[SYS][encryptionKey] since that is included both in the generation of the cHash in the URL and during verification. This value is supposed to be secret and since the cHash cannot be reverse engineered the only way to find that value is to hack the server or guess it.

The dangers of &cHash

Empty &cHash

If there is a cHash value in the URL, TYPO3 will take care of all validation and if parameters are forged caching will be disabled. The plugin using the parameters doesn't have to worry a thing.

However if there is no cHash value is found TYPO3 has no way of knowing that the additional parameters are used by a plugin which expects its output to be cached based on these values and thus no evaluation will take place! This problem does open up for spamming the caching table but results in something even worse in the eyes of a website user: Unexpected page content!

The result of this error can be that the specific plugin output of "?id=566&tx_myext[uid]=44" showing a news article will be cached using only "?id=566" as identification for the cached page. Thus, when someone requests "?id=566" they will get to see the news article and not the expected article archive list which would normally be on "?id=566"! It is not only confusing but altogether impossible to see the archive list because its place in the cache is taken by the article until someone clears the cache! Further, the occurrence of this error is puzzling and "occasional" to developers because it depends on whether the page "?id=566" or "?id=566&tx_myext[uid]=44" was viewed first after a cache-clearing; the first one viewed will be the one to fill the cache of "?id=566".

Solving the empty &cHash problem

The reason for spending 4 hours writing this article is that this problem cannot be solved on system level like the validation of a non-empty &cHash value; it has to be implemented inside the plugins!

Basically a plugin author has to ask himself; "Is my plugin running as a USER cObject?" If so, then "where am I creating cached output depending on external variables" and at those locations you should check if the &cHash value is set and if not, disable caching.

In order to make this easy for plugin developers TYPO3 version 3.8.0 will feature an API which makes this easy to fix for plugins made by the standards. The basic assumptions are that a plugin will only display specific information based on parameters in the internal "->piVars" array which are initialized during instantiation of the plugin class. So, we can simply check if this array contains values (meaning parameters are sent) and if so run a check whether the cHash value exists and if it does not, disable caching. As soon as caching is disabled we are on safe ground and even with forged or otherwise obsolete input variables we can generate output from the plugin without compromising the integrity of the cache.

So, to make it short:

If your plugin is running as a USER cObject being cached with pages, set this internal variable (highlighted) in the the plugin class (example with "mininews" extension):

class tx_mininews_pi1 extends tslib_pibase {

// Default plugin variables:

var $prefixId = 'tx_mininews_pi1';

var $scriptRelPath = 'pi1/class.tx_mininews_pi1.php';

var $extKey = 'mininews';

var $pi_checkCHash = TRUE;

// TemplaVoila specific:

var $TA='';

var $TMPLobj='';

....

This API is implemented in the base class, tslib_pibase:

function tslib_pibase() {

...

$this->piVars = t3lib_div::GParrayMerged($this->prefixId);

if ($this->pi_checkCHash && count($this->piVars)) {

$GLOBALS['TSFE']->reqCHash();

}

...

The function TSFE->reqCHash() will simply disable caching if the &cHash value was not found in the URL:

function reqCHash() {

if (!$this->cHash) {

$this->set_no_cache();

}

}

Pre-3.8.0 solutions

Since this API is not implemented for TYPO3 versions prior to 3.8.0 you can of course create a custom implementation where you check directly on TSFE->cHash and call TSFE->set_no_cache() if needed. However, it should suffice if you set the internal variable pi_checkCHash to true since that does not spoil backwards compatibility; So the feature is enabled for 3.8.0 but not fixed when the plugin runs under older versions of TYPO3.

Notice that this internal variables has to be set in the class definition hardcoded and not dynamically during execution; the reason is that the use of it is done during instantiation of the object!

Error-prone setups

One might ask; When will a URL ever contain a blank cHash value since the combination of parameters and cHash is automatically created by TYPO3 anyway?

An obvious answer is when the enemy forges URLs on you website! It will be very easy for the enemy to tease you with this confusing bug. This alone should be motivation enough to fix the problem.

However a much more likely scenario is that it happens when using the "realurl" extension or some other extension manipulating the URL and its parameters:

With realurl the problem simply is that the value of "cHash" is not "speakable" under any circumstances and since the realurl extension is trying to translate parameter strings into something human readable the cHash value of URLs will typically be the only left-over that cannot be a part of the URL. The solution chosen for realurl is to store the cHash value in a table linked to the complete URL it was associated with. This works very well until something changes the values of the URL. In reality what happens is that when translating the URL back, realurl might

  1. either retrieve a wrong cHash value from the database; no problem, caching will be disabled since cHash-strings didn't match.
  2. or retrieve nothing; big problem, because parameters will be set but cHash empty!

This is the reason why "realurl" has sparked some fancy situations where the frontpage of the documentation library of TYPO3.org has been a particular page deep inside the documentation library instead of the frontend page - because something has resulted in no returned value from the cHash-table in realurl!

Appendix: Hardcoded Frontend parameters

GET variables involved in creating the caching identification These GET-variables and other factors are playing the key role when TYPO3 creates the cache identification.

GET var

Description

id

Page id.

Valid only if it points to an accessible page in the database. Results in a page-not-found error if not valid.

type

Page subtype. Typically used to generate the different frames and framesets for sites based on frames.

Valid only if a corresponding typeNum configuration is found in the TypoScript template for the page. Results in a page-not-found error if not valid.

MP

Mount-Point identification. Value is auto-generated by TYPO3 when mount-points are used in a website.

Valid only if it is found to connect mount points and mounted pages which is validated when resolved. Results in a page-not-found error if not valid.

cHash

Represents the unique combination of custom parameters created by a TYPO3 plugin.

Valid only if it matches the same calculated value from the actually sent parameters. If found not to match it will disable caching since it indicates a URL which is probably forged from outside or otherwise obsolete.

[Login usergroups]

When a user is logged in it is obviously impossible to cache any personal content he might be presented. Plugins delivering content customized to the user should run as a non-cached plugin, using the TypoScript cObject USER_INT.

However, content delivered to usergroups might be cached since this targets a larger group. Therefore a cached page is also identified by the list of usergroups for the current user. This is not found in the URL as a GET-variable of course but comes from the login session.

When no user is logged in the group-list (->gr_list) is "0,-1". When a user is logged in and member of eg. group with uid 3 and 6 the group-list used for caching is "0,-2,3,6"

The fact that pages are cached based on usergroup combinations means that the same page id might be cached for as many usergroup combinations there is in the user table!

[TypoScript conditions]

Another non-GET variable factor in caching is conditions in the TypoScript template. When you are using conditions in the TypoScript templates the relation between available conditions and matching conditions will also be a part of the caching ID. This means that if you make a condition in TypoScript to output different code for a specific web browser that will create at least two pages in the cache table; one for the specific browser and one for all others.

Sometimes conditions in TypoScript are used to capture values from custom GET-variables and in that case the value of those GET variables will also affect caching. A typical example is this:

...

config.sys_language_uid = 0

config.language = uk

[globalVar = GP:L = 1]

config.sys_language_uid = 1

config.language = fr

[global]

In this case the condition looks if a GET variable "&L=1" is found and if so sets up the output to french instead of english. In this way "?id=123&L=1" and "?id=123&L=[whatever other indifferent value]" will be cached as individual pages.

Dynamic GET variables, requiring caching to be disabled These variables are affecting the page output from TYPO3, for instance by displaying a search result but they are combined with disabling of the page cache, typically by also sending "&no_cache=1"

GET var

Description

sword, stype, scols, scount, spointer, sword_list, defVals

All related to forms and searching in TYPO3.

Meta GET variables These parameters plays no role in the generation of page content but is usually used to instruct the frontend to special processing far before page rendering starts.

GET var

Description

no_cache

If set true (eg. "1") it will bypass page caching completely.

jumpurl

juSecure

juHash

mimeType

Used to redirect to URLs or files while either testing access permissions (secure downloads) or logging access in the log (so site statistics can show when a link to external pages was clicked).

RDCT

Redirection identification used to create short links to long URLs. Used for direct mail systems.

pid

Page id of where login users are found. Used for login forms.

FE_SESSION_KEY

Transfer of login sessions from other domains.

recs

Available for setting session values, eg. items in a shopping basket.

ADMCMD_*

Special processing instructions for backend previewing.

mid (direct mail)

rid (direct mail)

Ids related to direct mail plugin.

locationData

String identifying location of email forms for validation

Appendix: Cache control headers

When TYPO3 caches a page it is possible to make TYPO3 send cache control headers to the web browser so the page is cached locally - or in a reverse proxy between TYPO3 and the user. This will speed up a website even more than using the internal page cache table. However, such caching can only be allowed when no USER_INT objects are found on the page (risk of caching realtime content) and when no user logins are active (risk of caching personalized content). The setting is found in the CONFIG object of TypoScript, see Tsref. Appendix: Hint on clearing cache Did you know that from version 3.8.0 of TYPO3 you can force the regeneration (and re-caching) of a page in the frontend by "shift-reloading" it? About the author Kasper Skårhøj is the founder and main developer of TYPO3 through more than 6 years. He lives in Copenhagen / Denmark in a nice suburbian house with his wife Rie and his DeWalt cordless drilling machine which is the secondmost powerful tool in his possession.