In this update, Marcin Sągol shares the progress and key developments of the Documentation Search Improvements project for TYPO3. This initiative aims to enhance search functionality, making it easier for users to find relevant documentation efficiently and effectively.
The functionality described in this article is now available at docs.typo3.org for all recently rendered documentation repositories.
The Goals
One of the most important aspects for new TYPO3 users - whether developers, integrators, or editors - is quickly and easily becoming familiar with its basic concepts, technical details, and understanding how TYPO3 works. A crucial part of this learning process is having well-written, easily searchable documentation. For a long time, the Documentation Team has been improving content quality, and with our Community Budget Idea for Q3, we aimed to enhance the search functionality. Our goal was to streamline the search workflow, making it easier for users to find what they need by providing more relevant search results.
We planned to split this work into three key steps:
Consult and align with the TYPO3 Documentation Team on the design of the search form, specifically determining the functionalities and capabilities it should offer users.
Design the index structure in collaboration with an Elasticsearch expert to efficiently index all data from the documentation files. This will ensure that the search functionality provides relevant suggestions and accurate search results, meeting the requirements set in step 1.
Develop all necessary API endpoints to support the search form, ensuring it functions as intended and delivers the features outlined in step 1.
For those interested in reading more about our submission for the Community Budget Ideas Q3, you can find additional details on this forum page: Documentation indexing & search improvements.
Initial Brainstorming, Meetings and Mockups
Starting with the first step, we scheduled initial meetings with Marcin Sągol and Lina Wolf, the head of the Documentation Team. During these meetings, Lina shared her vision for improving the search experience from a user perspective, aiming to make it more similar to the search functionality offered by platforms like GitHub or Slack.
To ensure both teams (developers and the Documentation Team) had a shared understanding of the goals, we began working on mockups for the search form, visualizing its workflow and the capabilities it should offer to users. This productive session resulted in multiple mockups created in Figma.
After our initial call, where we created mockups for all the features the new search form should offer, we had a follow-up meeting to review them and confirm that nothing was missed and that both teams fully understood the mockups and their assumptions. Lina Wolf, Marcin Sągol, and Tymoteusz Motylewski participated in this session. Following this meeting, Marcin created a technical document (TYPO3 Documentation Search Improvements Initiative) detailing the current implementation of the search and indexer apps, along with all the planned changes, describing each new mockup and its intended behavior.
To summarize, the new search is designed to offer an advanced suggestion mechanism, including entries that link to documentation pages as well as options to narrow the search scope to a specific manual, vendor, option, or version. The table below provides details and explanation what we understand by scopes and what are their role in context of searched data:
Scope
Suggestions
GLOBAL
Allows to search in all manuals (vendor/package), options and versions - any restrictions/scopes are ignored
MANUAL
Allows to search only documentation which belongs to the given manual (vendor/package).
Examples: georgringer/news, typo3/cms-core (so the scope is given composer package)
VENDOR
Allows to search only documentation files which belong to the given vendor. If there are multiple manuals which belong to the same vendor, they all will be searched.
Example: georgringer, typo3
OPTION
This will be a new feature. Some files will contain documentation for the so-called Option. An Option will be a special group which documents related configuration like for TCA.
The documentation files in the tags containing sections can contain data-search-facet attribute, which points to a section.
VERSION
This will narrow the search scope only to the given manual version
Possible scopes for performing the search
At this stage, we were ready to complete the first step of our initiative and move forward with planning the structure of the documentation data index.
Optimizing Search Data Index
With a clear understanding of the features the new TYPO3 documentation search should offer, we wanted to ensure it would be both performant and able to return highly relevant results, giving users options to refine their searches as much as possible. Since the indexing and search applications use ElasticSearch, we decided to consult an expert in this software.
We reached out to Rafał Kuć, a well-known ElasticSearch expert, consultant, and author of several books on ElasticSearch. During a call with Rafał, Marcin outlined our goals and sought his advice on designing an optimal index structure. The aim was to organize the indexed data in a way that would allow easy filtering by documentation type, relevant package (extension), version, and more. Speed and performance were critical factors for us. A few days later, we received a list of suggestions from Rafał, and we were ready to proceed to the final step - implementation.
Implementing Changes Into the Index Structure and the API
All the content available on the TYPO3 documentation page comes from manuals provided by TYPO3 core team members or extension authors. These manuals are HTML files generated from reStructuredText (RST) files. For the content to be searchable, it must first be indexed.
A dedicated application, built on the Symfony framework, handles both indexing the documentation files and providing an API for searching the ElasticSearch index. During indexing, it processes all the HTML files by parsing their content, processing it, and then inserting or updating data in the index. It’s important to note that each row in the index doesn’t contain the entire content of an HTML file but rather a single section from it. This approach enables more accurate search results, linking items on the results page to specific sections within the documentation.
In collaboration with Lina, we expanded the search indexing functionality to support additional attributes for sections within the generated documentation (render guides). All the code changes were made by Oskar Dydo, who was the main developer for this initiative. The newly supported section attributes on the indexing side are as follows:
data-search-title - replaces the section title with the specified content, if present
data-search-facet - defines the section category, such as TypoScript, TsConfig, or SiteSettings
data-search-id - allows linking to the section (used in conjunction with data-search-facet)
These attributes are now read during the document section indexing, and their values are stored within the ElasticSearch index alongside the content for each section. They will be used to enhance search functionality, rendering relevant suggestions or search results.
For readers interested in learning more about the ElasticSearch index, here is a list of all columns populated with data and used to search for documentation content:
Column name
Description
manual_title
Name of the package (supports search)
manual_type
Document type based on its path
manual_version
Stores related versions of the document (if unchanged between versions); version for core entries typically matches TYPO3 versions (e.g., 11.5, 12.4, main); external extensions may differ (e.g., 0.1, 2.4, main)
manual_language
Document language (usually en-us). Used in the path e.g. /m/typo3/tutorial-editors/12.4/en-us/Pages/Index.html
manual_slug
Relative path to the documentation, e.g. /m/typo3/tutorial-editors/12.4/en-us/Pages/Index.html
manual_keywords
Currently, this is simply the extension name to boost searches for specific extension names, e.g., news
fragment
Snippet ID, used for linking to the appropriate document section
relative_url
Path inside a main manual, e.g. /m/typo3/tutorial-editors/12.4/en-us/Pages/Index.html
snippet_title
Title of the document
snippet_content
Section content of the snippet (appears in search results)
content_hash
Document hash
major_versions
Main versions that the document refers to; for core extensions, it could be 11 instead of 11.5, 12 instead of 12.4; external extensions may vary (e.g., 0, 1, 2, main)
manual_vendor
Vendor of the extension (used in the new suggestion system, e.g., typo3 in typo3/cms-form)
manual_package
Full name of the extension (used in the new suggestion system, e.g., typo3/cms-form)
manual_extension
Name of the extension (e.g., cms-form in typo3/cms-form)
option
Snippet type, allows searching for specific documents (e.g., TypoScript allows searching for documents tagged as TypoScript in documentation)
option_keywords
Text corresponding to the option keywords to facilitate searching (not supported yet)
is_core
Field indicating whether the document is official TYPO3 core documentation or not
With these changes added to the indexing application and the ElasticSearch index structure, we were able to collect all the data needed to provide an advanced search suggestion mechanism and support for managing search scopes.
We have added new API endpoints to support data feeding into the search suggestion mechanism. The suggestions are based on a "narrowing down" logic, refining the search scope as the user enters text. Currently, the suggestions do not support multiple phrases inputted by the user. Our changes are focused exclusively on modifications to the t3docs-search-indexer repository.
These are some general details about the code changes (currently delivered as an open pull request to the main repository), but we have also created a document (Documentation indexing & search improvements), in which Oskar provides an in-depth description of the ElasticSearch index after the updates. This document explains how request parameters map to the relevant columns in the index and details how the updated and extended suggestion endpoints function. We strongly encourage you to read it, as it contains comprehensive information on each endpoint used for providing suggestions, how these endpoints work, the data structure returned by the API, and more.
Summary
The main goal of the Documentation Search Improvements initiative was to focus on the backend part only. It means we declared to extend the ElasticSearch index structure, add modification to the data indexing process, improve and add new suggestion API endpoints, which would handle all the described before searching scenarios, supporting narrowing the search and suggestions for scopes like package name, vendor name, package version or the options.
We were also asked to provide a proof of concept for our work - something users can interact with and test. Although this was outside the original scope of our initiative, we decided to add the /search/beta endpoint.
This endpoint was implemented in a basic way: the user types a phrase, selects the scope, clears the text, and can then search with a new phrase within the current search context. When a suggestion is selected, the entered text is cleared, allowing the user to search for additional content (guiding the user step by step toward the most precise search results). This behavior 'simulates' how the functionality would work in real life, although it is not the final version of the search suggestions (e.g. it is not reactive).
If you want to test it by yourself and play around a bit with the new search functionalities, you can clone the t3docs-search-indexer repository, follow the instructions mentioned in the README.rst file, and check the documentation for the /search/beta endpoint.
So, one might ask, what’s next? Following our initial plan and the community's suggestions, we should collaborate with the server team to set up a dedicated server where the updated indexing and search application code can be deployed. This server would run alongside the current one, which is currently used to search the documentation. Once all tests are passed and we confirm that the new search functionality works as expected, the old server could be replaced with the new one, along with the updated search capabilities.
The final step would be to create a new search form based on our initial mockups and replace the current one. We plan to submit this as a separate initiative for the TYPO3 Community Budget Ideas in 2025. This initiative would primarily focus on the frontend aspect. We will continue consulting with the Documentation Team to ensure the final solution is delivered to users as soon as possible .
Gratitude to Those Who Contributed to the Project
I would like to extend my gratitude to everyone involved in this initiative. First and foremost, a big thank you to Lina Wolf, who knows the technical details of TYPO3 documentation like no one else. She was the first person I had the pleasure of contacting, and together, we brainstormed the required changes. Lina had a clear vision of the direction the changes should take, and we worked to visualize and document these ideas to create a solid plan for code modifications. During our meetings, we also received valuable feedback from Tymoteusz Motylewski.
With our assumptions documented, we were able to consult with Rafał Kuć, known as an ElasticSearch expert, who advised us on structuring the data index for optimal performance during indexing and searching. Thank you, Rafał, for your invaluable support.
Armed with this knowledge, we then began the development and implementation of the changes and features in the documentation indexing and search application. This phase was led by Oskar Dydo, who handled all necessary modifications, consulting closely with Lina Wolf and Rafał Kuć, and brought our initiative to completion. Excellent work, and thank you, Oskar.