Optimizing Website Internal Linking With TYPO3 Extensions

Categories: Budget idea, Development Created by Wolfangel Cyril
Optimizing website structures and content relevance are crucial elements for an optimal user experience and good SEO. In this article, I will share my journey in developing a suite of TYPO3 extensions designed to improve the internal linking of websites.

My Journey in Semantic Optimization

My adventure with the semantic web began in 2008 with the TYPO3 Solr extension, an open-source search platform. While initially focusing on improving search capabilities, I quickly understood the crucial importance of internal linking — the way pages are interconnected within a site.

Traditional methods of manual link building are not only time-consuming but also often ineffective in ensuring an optimal structure. This realization led me to explore the use of semantic technologies and artificial intelligence to automate and optimize connections between pages based on content similarity.

After years of research and development, I presented the ATLAS software solution (Automated Topological Link Analysis System) at the 25th edition of the Extraction and Knowledge Management Conference (EGC25) in 2025. 

Proven Results

Case studies have demonstrated the effectiveness of ATLAS:

  • On an unoptimized site of 50 pages:
    • Reduction of orphan pages from 50% to 0%
    • Increase in link density from 0.04% to 0.47%
  • On a site already optimized by an expert (60+ pages):
    • Reproduction of about 50% of the expert linking structure
    • Confirmation of the ability to maintain high-quality structures

“ATLAS is a tool designed to automate the internal linking of websites by leveraging advanced artificial intelligence techniques. This research is part of an effort to optimize natural referencing through the structural improvement of websites.”

The work presented in this article resulted in the development of a suite of complementary TYPO3 extensions, allowing the practical application of the theoretical concepts explored in our research.

This suite includes three main extensions:

  • Page Link Insights (page_link_insights): An interactive visualization module for page relationships, offering a graphical representation of internal linking and advanced metrics (PageRank, centrality, broken link detection).
     
  • Semantic Suggestion (semantic_suggestion): The core of the ATLAS system is implemented as a TYPO3 extension, automatically proposing relevant links between pages based on content similarity through vector analysis.
     
  • NLP Tools (nlp_tools): A fundamental extension providing the necessary natural language processing capabilities for semantic analysis, including language detection, stop word filtering, and text vectorization.

Together, these extensions form a complete solution for optimizing the internal linking of TYPO3 sites. 

Let's examine each of them in detail.

Semantic-Suggestion has been in production since early March 2025, and the results on Google Search Console are impressive.

Page Link Insights: Visualization and Analysis

Page Link Insights is a backend module that offers an interactive visualization of a website's internal link structure.

Main Features

  • D3.js Visualization: Represents relationships between pages with an interactive force-directed graph.
  • Link Detection: Identifies different types of links (HTML, typolink, content elements).
  • Advanced Metrics:
    • PageRank calculation to determine the most influential pages.
    • Centrality scores to identify pivotal pages.
    • Broken link detection.
  • Thematic Analysis:
    • Automatic extraction of significant keywords.
    • Grouping pages by themes.
  • Global Statistics:
    • Network density.
    • Identification of orphan pages.
    • Average links per page.

Backend Module Overview

The backend module offers a clear and intuitive interface to visualize and understand your site's structure:

<?php
// Example of representing a link between two pages
$link = [
    'sourcePageId' => $sourceId,
    'targetPageId' => $targetId,
    'contentElement' => [
        'uid' => $element['uid'],
        'type' => $element['CType'],
        'header' => $element['header']
    ]
];

The extension uses a D3.js force-directed graph to visualize connections between pages, with different colors representing different link types and node size proportional to their importance.

Semantic Suggestion: Intelligent Link Automation

Semantic Suggestion is the core of the system, automatically proposing relevant links between pages based on content similarity.

How Semantic Suggestion Works

  1. Content Analysis: The extension extracts content from the site's pages.
  2. Vectorization: It creates TF-IDF (Term Frequency-Inverse Document Frequency) vectors to represent the content of each page.
  3. Similarity Calculation: Cosine similarity is calculated between the page vectors.
  4. Database Storage: Results are stored for optimal performance.
  5. Frontend Display: The most relevant suggestions are displayed to users.
  6. Scheduled Tasks: Calculations are performed in the background at regular intervals.

Flexible Configuration via TypoScript

plugin.tx_semanticsuggestion {
    settings {
        parentPageId = 1
        proximityThreshold = 0.7
        maxSuggestions = 3
        excerptLength = 150
        recursive = 1
        excludePages = 8,9,3456
        recencyWeight = 0.2

        analyzedFields {
            title = 1.5
            description = 1.0
            keywords = 2.0
            abstract = 1.2
            content = 1.0
        }
    }
}

This configuration allows you to:

  • Define a parent page to limit the analysis.
  • Adjust the similarity threshold (proximityThreshold).
  • Limit the number of suggestions.
  • Weight different content fields (title, keywords, etc.).
  • Integrate a recency factor to favor recent content.

NLP Tools: The Fundamental Linguistic Analysis Engine

The NLP Tools extension constitutes the technical foundation for advanced semantic analysis within the suite. Currently in 'beta' and compatible with TYPO3 v12/v13, it provides a set of modular services usable by other extensions via TYPO3's dependency injection.

These services support essential Natural Language Processing (NLP) operations:

  • Language Detection: The LanguageDetectionService automatically identifies the language of a text (among FR, EN, DE, ES) based on an n-gram analysis of stop words. It can also leverage the TYPO3 frontend linguistic context (sys_language_uid) if available.
  • Text Preprocessing: The TextAnalysisService cleans the text (lowercasing, accent removal), segments it into words (tokenization) handling Unicode, and filters stop words specific to the detected language using the StopWordsFactory.
  • Stemming: Rather than lemmatization, the extension performs stemming via the TextAnalysisService, which uses the external library wamania/php-stemmer to reduce words to their root (supports FR, EN, DE, ES).
  • Text Vectorization: The TextVectorizerService transforms preprocessed texts into numerical representations:
    • Calculation of normalized TF-IDF (Term Frequency-Inverse Document Frequency) vectors.
    • Creation of Document-Term Matrices (DTM).
    • Calculation of Cosine Similarity between vectors to evaluate semantic proximity.
  • Clustering: The TextClusteringService allows grouping similar texts using different methods:
    • K-Means (based on TF-IDF vectors).
    • Hierarchical Agglomerative Clustering (based on a distance matrix).
    • Simple clustering based on a similarity threshold.
  • Topic Modeling: The TopicModelingService offers features to extract thematic information:
    • Extraction of the most representative terms for a group of texts (based on TF-IDF).
    • Extraction of themes (simplified approach using K-Means on texts).
    • Extraction of key phrases from a document.
  • Performance: Services are designed to be performant and integrate the possibility of using TYPO3's caching system to speed up repetitive calculations.

Modular Architecture and Usage

The extension follows TYPO3 best practices with an architecture based on injectable services, facilitating its integration:

  • Input: Starts with raw text.
  • Language Detection: LanguageDetectionService identifies the language (can use TYPO3 context).
  • Text Analysis Core: TextAnalysisService takes the raw text and detected language to:
    • Clean the text.
    • Tokenize it into words.
    • Remove language-specific stop words (using StopWordsFactory).
    • Stem the words (using wamania/php-stemmer).
  • The output is processed text (usually a list of stemmed tokens).
  • Advanced Processing (using processed text and language):
    • TextVectorizerService: Converts processed text from multiple documents into numerical vectors (TF-IDF or DTM) and calculates cosine similarity between them.
    • TextClusteringService: Uses the vectors (from TextVectorizerService) to group similar documents together using algorithms like K-Means.
    • TopicModelingService: Uses processed text or vectors to extract representative terms, topics (often via clustering), or key phrases.
  • Direct Outputs: Basic results like tokens or n-grams can also be directly obtained from TextAnalysisService.
  • Cross-Cutting Concerns:
    • All services are typically obtained and used via TYPO3's Dependency Injection.
    • Most calculation-intensive services (TextAnalysisService, TextVectorizerService, TextClusteringService, TopicModelingService) can leverage the TYPO3 Caching Framework to store and reuse results, improving performance.
  • Output: The results (similarity scores, clusters, topics, etc.) are then available for use by the calling application or extension (like Semantic Suggestion or Page Link Insights).
<?php
// Example of injection and usage in another TYPO3 service
use Cywolf\NlpTools\Service\TextAnalysisService;
use Cywolf\NlpTools\Service\LanguageDetectionService;
class MyContentProcessor {     private TextAnalysisService $textAnalyzer;     private LanguageDetectionService $languageDetector;
    public function __construct(         TextAnalysisService $textAnalyzer,         LanguageDetectionService $languageDetector     ) {         $this->textAnalyzer = $textAnalyzer;         $this->languageDetector = $languageDetector;     }       public function analyze(string $rawText): array     {         $language = $this->languageDetector->detectLanguage($rawText);         // Cleans, tokenizes, and removes stop words         $cleanedText = $this->textAnalyzer->removeStopWords($rawText, $language);
        // Reduces words to their root (stemming)         $stemmedWords = $this->textAnalyzer->stem($cleanedText, $language);
        // Returns an array of stemmed words         return [             'language' => $language,             'processed_text' => implode(' ', $stemmedWords)             // ... other possible analyses         ];     } }

Thanks to this solid foundation and its extensive features (going beyond simple stop word removal to include vectorization, clustering, and topic modeling), NLP Tools enables other extensions like Semantic Suggestion to perform complex and relevant semantic analyses.

Multilingual Support

A particularly important aspect of NLP Tools is its support for multiple languages. The extension includes stop word dictionaries and specific rules for several European languages:

  • French
  • English
  • German
  • Spanish

Automatic language detection allows for correct processing of multilingual sites without additional configuration.

Integration with Solr for Enhanced Search

In addition to optimizing internal linking, the extension suite integrates seamlessly with Apache Solr to improve search results.

Weighting Search Results

Metrics calculated by Page Link Insights (PageRank, centrality) are used to weight search results:

plugin.tx_solr {
   search {
       relevance {
           multiplier {
               pagerank = 2.0
               inboundLinks = 1.5
           }

           formula = sum(
               mul(queryNorm(dismax(v:1)), 1.0),
               mul(fieldValue(pagerank_f), 2.0),
               mul(fieldValue(inbound_links_i), 1.5)
           )
       }
   }
}

This configuration increases the relevance of pages that are important within the site structure during user searches.

Practical Use Cases for the Extensions

For News Websites 

On news websites, the Semantic Suggestion extension can automatically generate "Related Articles" sections by identifying articles sharing similar themes. This helps keep readers engaged longer on the site by offering them additional relevant content.

For E-commerce Sites

In an e-commerce context, the extension suite can improve product recommendations by analyzing descriptions and categories to suggest complementary or alternative products, thereby increasing cross-selling opportunities.

For Institutional Websites

For institutional sites with numerous informational pages, thematic analysis helps create coherent "See also" sections, facilitating user navigation to related information without manual intervention.

Performance and Optimization

One of the major challenges of semantic processing is performance management, especially on large sites. The extension suite uses several strategies to maintain optimal performance:

  • Database Storage: Similarity calculations are performed by a scheduled task and stored in the database.
  • Caching: Intermediate results are cached to avoid unnecessary calculations.
  • Asynchronous Processing: Intensive calculations are performed in the background.
  • Algorithm Optimization: Similarity algorithms are optimized for large data volumes.

Future Development Perspectives

The next development steps include:

  • Integration of external similarity calculations: Adding the possibility to use an external module.
  • More comprehensive support from nlp_tools to lighten semantic_suggestion and page_link_insight.
  • New visualizations to represent thematic clusters and optimal user journeys.
  • Fine-tuning up to a stable version.

Innovative Approach to Automated Internal Linking Optimization

The Page Link Insights, Semantic Suggestion, and NLP Tools extension suite for TYPO3 represents an innovative approach to automatically optimizing the internal linking of websites. By combining techniques of linguistic analysis, content vectorization, and interactive visualization, it offers a complete solution for improving site structure, user experience, and natural referencing (SEO).

These extensions transform a traditionally manual and time-consuming task into a semi-automated process guided by semantic content analysis. The result is a more coherent, better-structured website offering an improved browse experience for users.

In a world where content is constantly growing and becoming more complex, these tools provide TYPO3 site administrators with a significant advantage in maintaining an optimal content architecture, thereby fostering better user engagement and increased visibility in search engines.

Get Involved!

Want to optimize your TYPO3 website's internal linking structure? Try these extensions today:

This project is actively evolving, and I welcome contributions, feedback, and testing from the TYPO3 community. Whether you have ideas for improvement, need help with implementation, or want to contribute code, please reach out!

Join the Conversation

Connect with me on TYPO3 Slack in the #semantic_suggestion channel to discuss these extensions, share your experiences, or ask questions.

Direct Contact

Feel free to contact me directly:

Let's work together to improve website structures through intelligent, automated internal linking!

Additional contributors for this article