A lot of options were added to the extension manager configuration, that allow settings to improve and enable new crawler features:
Formerly configuration was done by using page ts (see below). This is still possible (fully backwards compatible) but not recommended. Instead of writing pagets simply create a configuration record (table: tx_crawler_configuration) and put it on the topmost page of the pagetree you want to affect with this configuration.
The fields in these records are related to the page ts keys described below. The “name” fields corresponds to the “key” in the pagets setup.
Property: | Data type: | Description: | Default: |
|---|---|---|---|
paramSets.[key] | string | Get Parameter configuration. The values of GET variables are according to a special syntax. From the code documentation (class.tx_crawler_lib.php):
Examples: &L=[|1|2|3] &L=[0-3] &L=[0-3]&contentId=[_TABLE:tt_content] | |
paramSets.[key].procInstrFilter | string | List of processing instructions, eg. “tx_indexedsearch_reindex” from indexed_searchto send for the request. Processing instructions are necessary for the request to perform any meaningful action, since they activate third party activity. | |
paramSets.[key].procInstrParams.[procIn.key].[...] | strings | Options for processing instructions. Will be defined in the respective third party modules. Examples: .....procInstrParams.tx_staticpub_publish.includeResources=1 | |
paramSets.[key].pidsOnly | list of integers (pages uid) | List of Page Ids to limit this configuration to | |
paramSets.[key].userGroups | list of integers (fe_groups uid) | User groups to set for the request. | |
paramSets.[key].cHash | boolean | If set, a cHash value is calculated and added to the URLs. | |
paramSets.[key].baseUrl | string | If not set, t3lib_div::getIndpEnv('TYPO3_SITE_URL') is used to request the page. MUST BE SET if run from CLI (since TYPO3_SITE_URL does not exist in that context!) |
[Page TSconfig: tx_crawler.crawlerCfg]