The idea of the queue is that a large number of tasks can be submitted to the queue and performed over longer time. This could be interesting for several reasons;
To spread server load over time.
To time the requests for nightly processing
And simply to avoid “max_execution_time” of PHP to limit processing to 30 seconds !
A “cron-job” refers to a script that runs on the server with time intervals.
For this to become reality you must ideally have a cron-job set up. This assumes you are running on Unix architecture of some sort. The crontab is often edited by “crontab -e” and you should insert a line like this:
* * * * * [pathToYourTYPO3Installation]/typo3/cli_dispatch.phpsh crawler
This will run the script every minute. You should try to run the script on the command line first to make sure it runs without any errors. If it doesn't output anything it was successful.
You will have to add a user called “_cli_crawler” and you must have PHP installed as a CGI script as well in /usr/bin/
In the “CLI status” menu of the Site Crawler info module you can see the status:
This is how it looks just after you ran the script. (You can also see the full path to the script in the bottom - this is the path to the script as you should use it on the command line / in the crontab)
If the cron-script stalls there is a default delay of 1 hour before a new process will announce the old one dead and run a new one. If a cron-script takes more than 1 minute and thereby overlaps the next process, the next process will NOT start if it sees that the “lock-file” exists (unless that hour has passed).
The reason why it works like this is to make sure that overlapping calls to the crawler CLI script will not run parallel processes. So the second call will just exit if it finds in the status file that the process is already running. But of course a crashed script will fail to set the status to “end” and hence this situation can occur.
To process the queue you must either set up a cron-job on your server or use the backend to execute the queue:
You can also (re-)crawl singly urls manually from within the Crawler log view in the info module:
By clicking on the context menue of the configuration record you can add the urls resulting from this record to the queue:
An alternative mode is to automatically build and execute the queue from the command line in one process. This doesn't allow scheduling of task processing and consumes as much CPU as it can. On the other hand the job is done right away. In this case the queue is both built and executed right away.
The script to use is this:
[pathToYourTYPO3Installation]/typo3/cli_dispatch.phpsh crawler_im
If you run it you will see a list of options which explains usage.
Basically you must pass options similar to those you would otherwise select using the Site Crawler when you set up a crawler job (“Start Crawling”). Here is an example:
We want to publish pages under the page “ID=3 (“Contact” page selected) and 1 level down (“1 level” selected) to static files (Processing Instruction “Publish static [tx_staticpub_publish]” selected). Four URLs are generated based on the configuration (see right column in table).
To do the same with the CLI script you run this:
[pathToYourTYPO3Installation]/typo3/cli_dispatch.phpsh crawler_im 3 -d 1 -proc tx_staticpub_publish
And this is the output:
[22-03 15:29:00] ?id=3
[22-03 15:29:00] ?id=3&L=1
[22-03 15:29:00] ?id=5
[22-03 15:29:00] ?id=4
At this point you have three options for “action”:
Commit the URLs to the queue and let the cron script take care of it over time. In this case there is an option for setting the amount of tasks per minute if you wish to change it from the default 30. This is useful if you would like to submit a job to the cron script based crawler everyday.
Add “-o queue”
List full URLs for use with wget or similar. Corresponds to pressing the “Download URLs” button in the backend module.
Commit and execute the queue right away. This will still put the jobs into the queue but execute them immediately. If server load is no issue to you and if you are in a hurry this is the way to go! It also feels much more like the “command-line-way” of things. And the status output is more immediate than in the queue.
The examples above assume that “staticpub” is installed.






