Login / Status
developer.Resource
Home . Documentation . Document Library . Core Documentation
Sponsors
hosted by punkt.deTYPO3 and Open Source Magazine

5.2. Transformations

Introduction

Transformation of content between the database and an RTE is needed if the format of the content in the database is different than the format understood by an RTE. A simple example could be that bold-tags in the database <b> should be converted to <strong> tags in the RTE or that references to images in <img> tags in the database should be relative while absolute in the RTE. In such cases a transformation is needed to do the conversion both ways; From database (DB) to RTE and from RTE to DB.

Generally transformations are needed for two reasons:

  1. Data Formats; If the agreed format of the stored content in TYPO3 is different from the HTML format the RTE produces. This could be issues like XHTML, banning of certain tags or maybe a hybrid format in the database. (See section 3 in the illustration some pages ahead)

  2. RTE specifics; If the RTE has special requirements to the content before it can be edited and if that format is different from what we want to store in  the database. For instance an RTE could require a full HTML document with <html>, <head> and <body> - obviously we don't want that in the database and likewise we will have to wrap content in such a dummy-body before it can be edited. (This is the case with “rteekit”, see section 4 in the illustration some pages ahead).

Hybrid modes

The traditional challenge of incorporating an RTE in TYPO3 has been that the RTE was available only to a limited set of browsers, typically MSIE on Windows. Therefore if an RTE was supported it had to be backwards compatible with situations where content was to be edited from regular <textarea>'s with no visual formatting.

Among the transformations in TYPO3 there are two modes, “ts_transform” and “css_transform”, which are trying to maintain a data format that is as human readable as possible while still offering an RTE for editing if applicable.

To know the details of those transformations, please refer to the tables in the next section. More historical background can also be obtained later in this document. But here is a short example of a hybrid mode:

In Database:

This is how the content in the database could look for a hybrid mode (such as “css_transform”). As you can see the TYPO3-specific tag, “<link>” is used for the link to page 123. This tag is designed to be easy for editors to insert. It is of course converted to a real <a> tag when the page is rendered in the frontend. Further line 2 shows bold text. In line 3 the situation is that the paragraph should be centered - and there seems to be no other way than wrapping the line in a <p> tag with the “align” attribute. Not so human readable but we can do no better without an RTE. Line 4 is just plain.

Generally this content will be processed before output on a page of course. Typically the rule will be this: “Wrap each line in a <p> tag which is not already wrapped in a <p> tag and convert all TYPO3-specific <link>-tags to real <a> tags.”  and thus the final result will be valid HTML.

This is line number 1 with a <link 123>link</link> inside
This is line number 2 with a <b>bold part</b> in the text
<p align=”center”>This line is centered.</p>
This line is just plain

In RTE:

The content in the database can easily be edited as plain text thanks to the “hybrid-mode” used to store the content. But when the content above from the database has to go into the RTE it will not work if every line is not wrapped in a <p> tag! The same is true for the <link> tag; it has to be converted so the RTE understands it:

<p>This is line number 1 with a <a href=”index.php?id=123”>link</a> inside</p>
<p>This is line number 2 with a <strong>bold part</strong> in the text</p>
<p align=”center”>This line is centered.</p>
<p>This line is just plain</p>

This process of conversion from the one format to the other is what transformations do!

Configuration

Transformations are mainly defined in the “Special Configuration” of the $TCA "types"-configuration. There is detailed description of this in the $TCA section of this document.

In addition transformations can be fine-tuned by Page TSconfig which means that RTE behaviour can be determined even on page branch level! Details about this are found later in this chapter about the RTE API.

Where transformations are performed

The transformations you can do with TYPO3 is done in the class “t3lib_parsehtml_proc”. There are typically a function for each direction; From DB to RTE (suffixed “_rte”) and from RTE to DB (suffixed “_db”).

The transformations are invoked in two cases:

  1. Before content enters the editing form This is done by the RTE API itself, calling the method t3lib_rteapi::transformContent(). See examples of this in the extensions “rte”, “rtehtmlarea” and “rteekit”. In particular “rteekit” is interesting because it not only calls the system transformations but also does some Ekit-specific processing since a whole HTML document has to be used in “Ekit” Java RTE which means that the HTML document body must be wrapped/stripped off as a part of the transformation process.

  2. Before content is saved in the databaseThis is done in t3lib_tcemain class and the transformation is triggered by a pseudo-field from the submitted form! This field is added by the RTE API (calling t3lib_rteapi::triggerField()). Lets say the fieldname is “data[tt_content][456][bodytext]” then the trigger field is named “data[tt_content][456][_TRANSFORM_bodytext]” and in t3lib_tcemain this pseudo-field will be detected and used to trigger the transformation process from RTE to DB. Of course the pseudo field will never go into the database (since it is not found in $TCA).

The concept of transformations is discussed in more detail a few pages ahead ("Historical perspective on RTE transformations").

Process illustration

The following illustration shows the process of transformations graphically.

Part 1: The RTE Applications

This is the various possible RTE applications. They can be based on DHTML, Active-X, Java, Flash or whatever.

Part 2: The RTE Specific Transformation

Some RTEs might need to apply additional transformation of the content in addition to the general transformation. An example is "rteekit" which requires a full HTML document for editing (and which will return a full document). In that case the RTE specific transformation must add/remove this html-document wrapper.

RTE specific transformations is normally programmed directly into the rte-api extension class. In the case of "rteekit" that is "tx_rteekit_base" which extends "t3lib_rteapi"

Part 3: The Main Transformation

The main transformation of content between browser format for RTEs and the database storage format. This is general for all RTEs. Normally consists of converting links and image references from absolute to relative and further HTML processing as needed. This is the kind of transformation specifically described on the coming pages!

The main transformations is done with "t3lib_parsehtml_proc".

Part 4: The Database

The database where the content is stored for use in both backend and frontend.

Part 5: Rendering the website

Content from the database is processed for display on the website. Depending on the storage format this might also involve "transformation" of content. For instance the internal "<link>" tag has to be converted into an HTML <a> tag.

The processing normally takes place with TypoScript Templates, the "CSS Styled Content" extension (TS object path "lib.parseFunc_RTE")

Part 6: The Website

The website made with TYPO3.

Content Examples

This table gives some examples of how content will look in the RTE, in the database and on the final website.

Notice: This is only examples! It might not happen exactly like that in real life since it depends on which exact transformations you apply. But it illustrates the point that the content needs to be in different states whether in the RTE, Database or Website frontend.

RTE (#1)

Database (#4)

Website (#6)

Comment

<p>Hello World</p>

Hello World

<p>Hello World</p>

<p> omitted in DB to make it plain-text editable.

<p align="right">Right aligned text</p>

<p align="right">Right aligned text</p>

<p align="right">Right aligned text</p>

Had to keep <p> tag in DB because align attribute was found.

<table ...>....</table>

[stripped out]

-

Tables were not allowed, so stripped.

<a href="http://localhost/.../index.php?id=123">

<link 123>

<a href="Contact_us.123.html">

Links are stored with the <link>-tag and needs processing for both frontend and backend.

<img src="http://localhost/fileadmin/image.jpg">

<img src="fileadmin/image.jpg">

<img src="fileadmin/image.jpg">

References to images must usually be absolute paths in RTEs while relative in database.

Transformation overview

The transformation of the content can be configured by listing which transformation filters to pass it through. The order of the list is the order in which the transformations are performed when saved to the database. The order is reversed when the content is loaded into the RTE again.

Transformation filter:

Description:

ts_transform

Transforms the content with regard to most of the issues related to content elements types 'Text' and 'Text w/Image'. The mode is optimized for the content rendering of the static template “content (default)” which uses old <font> tag style rendering.The mode is a “hybrid” mode which tries to save only the necessary HTML in the database so that content might still be easily edited without the RTE. For instance a text paragraph will be encapsulated in <p> tags while in the database it will just be a single line ended by a line break character.(Supports the “cms” extension)

css_transform

Like “ts_transform”, but headers and bulletlists are preserved as <Hx> tags and <OL> / <UL> (TYPOLIST and TYPOHEAD are still converted to Hx and OL/UL, but not reversely...) and tables are preserved (PROC.preserveTables is disabled).The mode is optimized for the content rendering done by  “css_styled_content” or similar.

ts_preserve

Converts the list of preserved tags - if any - to <SPAN>-tags with a custom parameter 'specialtag' which holds the value of the original tag. Depricated.

ts_images

Checks if any images on the page is from external URLs and if so they are fetched and stored in the uploads/ folder. In addition 'magic' images are evaluated to see if their size has changed and if so the image is recalculated on the server. Finally absolute URLs are converted to relative URLs for all local images.

ts_links

Converts the absolute URLs of links to the TypoScript specific <LINK>-tag. This process is designed to make links in concordance with the typolink function in the TypoScript frontend.

ts_reglinks

Converts the absolute URLs of links to relative. Keeping the <A>-tag.

Meta transformation:

Description:

ts

Meta-mode which is basically a substitute for this list: ts_transform,ts_preserve,ts_images,ts_links.  This is the one used specifically for the two 'Text'-types of the content elements (“cms” extension).

ts_css

Like “ts”, a meta-mode which is a substitute for the list: css_transform,ts_images,ts_links. It is designed to be the new, modern transformation used by most RTE cases, because it converts links between <A> and <LINK> but preserves all other content while still making it as human readable as possible (that means simple <P>-tags are resolved into simple lines.)

In addition, custom transformations can be created. This allows you to create your own tailor made transformations with a PHP class where you can program how content is processed to and from the database. See section later.

Transformation details

The transformations offered by the TYPO3 core are performed by the class “t3lib_parsehtml_proc”. Here follows a technical and detailed description of the transformation filters available:

DB -> RTE

RTE -> DB

ts_transform, css_transform

     

    function t3lib_parseHTML::TS_transform_rte()

    function t3lib_parseHTML::TS_transform_db()

    1. Sections by the tags TABLE,PRE,UL,OL,H1,H2,H3,H4,H5,H6 are not processed and thus just passed on to the RTE.

    2. The content of <BLOCKQUOTE> sections are sent recursively through the ts_transform filter. The tag remains.

    3. <TYPOLIST> sections are converted to <OL> or <UL> sections, the latter is the case if the type parameter is set to 1.

      The conversion of TYPOLIST-tags can be disabled by setting the 'proc.typolist' option. See later.

    4. <TYPOHEAD> sections are converted to <Hx>-tags. The type parameter ranging from 1-5 determines which H-tag will be used. If no type parameter is set, H6 is used.

      The conversion of TYPOHEAD-tags can be disabled by setting the 'proc.typohead' option. See later.

    5. All content outside the tags already mentioned are now processed as follows:

      1. Every line is wrapped in <P>-tags (configurable to DIV), if a line is empty a &nbsp; is set and if the line happens to be wrapped in DIV/P-tags already, it's not wrapped again (this might be the case if align or class parameters has been set).

      2. Then <B> tags are mapped to <STRONG> tags and <I> tags are mapped to <EM> tags (This is how the RTE prefers it).

      3. All content between the P/DIV tags outside of other allowed HTML-tags are htmlspecialchar()'ed. Thus only allowed HTML code is preserved and other “pseudo tags” are mapped to real text.

    1. Sections by the tag PRE are not processed and thus just passed on to the DB.

    2. <TABLE>-sections are dissolved so only the text of the table cells remains. Every cell represents a new line. The reason for this action basically is that tables are not wanted in the 'Text'-types and they may also be nice to get rid of in case you have transferred content from other websites. (This can be disabled.) (Does NOT apply to “css_transform”)

    3. The content of <BLOCKQUOTE> sections are sent recursively through the ts_transform filter. The tag remains.

    4. <OL> and <UL> sections are converted to <TYPOLIST> sections. If the bulletlist is <OL> (ordered list with numbers) the type parameter of the typolist is set to 1. Bulletlists in multiple levels are not supported.

      The conversion of TYPOLIST-tags can be disabled by setting the 'proc.typolist' option. See later.

      (Does NOT apply to “css_transform”)

    5. <Hx> sections are converted to <TYPOHEAD>-tags. The number of the Hx-tag ranging from 1-5 is set as the type-number of the TYPOHEAD tag. <H6> is equal to type=0 (default). Also the align parameter is preserved as well as the class parameter if set.

      The conversion of TYPOHEAD-tags can be disabled by setting the 'proc.typohead' option. In that case the tag is preserved with the parameters align and class. See later.

      (Does NOT apply to “css_transform”)

    6. All content outside these block are now processed as follows:

      1. All <DIV> and <P> sections are dissolved into lines (unless align and/or class parameters are set).

      2. <BR> tags are as well converted into newlines (configurable since this will resolve “soft linebreaks” into paragraphs!).

      3. Then <STRONG> and <EM> tags are remapped to <B> and <I> tags. (This is more human readable. Configurable).

      4. The list of allowed tags (configurable) is preserved - all other tags discarded (thus junk-tags from pasted content will not survive into the database!).

      5. The content outside the allowed tags is de-htmlspecialchar()'ed - thus converted back to human-readable text. Furthermore the nesting of tags inside of P/DIV sections is preserved. For instance this: <P>One <U><B>two</B> three</P></U> will be converted to <P>One <B>two</B> three</P>. That is the U-tags being removed, because they were falsely nested with the <P> tags.

    ts_preserve (depricated)

     

    function t3lib_parseHTML::TS_preserve_rte()

    function t3lib_parseHTML::TS_preserve_db()

    1. If 'proc.preserveTags' are configured those tags are converted to <SPAN specialtag=”...(the preserved tag rawurlencoded)...”>-sections. Those are supposed to be let alone by the RTE.

    1. If 'proc.preserveTags' are configured <SPAN>-tags with the custom 'specialtag' parameter set are converted back to the tag value contained in the specialtag-parameter.

    ts_images

     

    function t3lib_parseHTML::TS_images_rte()

    function t3lib_parseHTML::TS_images_db()

    1. All <IMG>-tags are processed and if the value of the src-parameter happens not to start with 'http' it's expected to be a relative URL and the current site URL is prefixed so the reference is absolute in the RTE as the RTE requires.

    1. All <IMG>-tags are processed and if the first part of the src-parameter is not the same as the current site URL, the image must be a reference to an external image. In that case the image is read from that URL and stored as a 'magic' image in the upload/ folder (can be disabled).

    2. All magic images (that is images stored in the uploads/ folder (configured by TYPO3_CONF_VARS["BE"]["RTE_imageStorageDir"], filenames prefixed with 'RTEmagicC_' (child=actual image) and 'RTEmagicP_' (parent=original image))) are processed to see if the physical dimensions of the image on the server matches the dimensions set in the img-tag. If this is not the case, the user must have changed the dimensions and the image must be re-scaled accordingly.

    3. Finally the absolute reference to the image is converted to a proper relative reference if the image URL is local.

    ts_links

     

    function t3lib_parseHTML::TS_links_rte()

    function t3lib_parseHTML::TS_links_db()

    1. All <LINK>-tags (TypoScript specific) are converted to proper <A>-tags. The parameters of the <LINK>-tags are separated by space. The first parameter is the link reference (see typolink function in TSref for details on the syntax), second is the target if given (if '-' the target is not set), the third parameter is the class (if '-' the class is not set) and the fourth parameter is the title.

    1. All <A>-tags are converted to <LINK> tags, however only if they do not contain any parameters other than href, target and class. These are the only three parameters which can be represented by the TypoScript specific <LINK>-tag.

    ts_reglinks

     

    function t3lib_parseHTML::TS_reglinks()

    function t3lib_parseHTML::TS_reglinks()

    1. All A-tags have URLs converted to absolute URLs if they are relative

    1. All A-tags have their absolute URLs converted to relative if possible (that is the URL is within the current domain).

    Page TSconfig

    The RTEs can be configured by Page TSconfig. There is a top level object name, "RTE", that is used for this. The main object paths looks like this:

    Property:

    Data type:

    Description:

    default.[...]

    config.[tablename].[field].[...]

    config.[tablename].[field].types.[type].[...]

    ->RTEconf

    These objects contain the actual configuration of the RTE interface.  For the properties available, refer to the table below. This is a description of how you can customize in general and override for specific fields/types.

    'RTE.default'  configures the RTE for all tables/fields/types

    'RTE.config.[tablename].[field]' configures a specific field. The values inherit the values from 'RTE.default' in fact this is overriding values.

    'RTE.config.[tablename].[field].types.[type]' configures a specific field in case the 'type'-value of the field matches type. Again this overrides the former settings.

    [individual RTE options]

    -

    There are other options to set for the RTE toplevel object. These depends on the individual RTEs though! So there can be no further reference in this table to these properties.

    Generally the "rte" (classic MSIE RTE) will set the standard for configuration options, so you can refer to the documentation for that RTE for more details. On the top level of the RTE object you will normally find that general collections of classes, styles etc. are configured.

    [page:RTE]

    Configuration examples

    This configuration in "Page TSconfig" will disable the RTE altogether:

    RTE.default.disabled = 1

    In the case below the RTE is still disabled generally, but this is overridden specifically for the table "tt_content" where the RTE is used in the field "bodytext"; The "disabled" flag is set to false again which means that for Content Elements the RTE will be available.

    RTE.default.disabled = 1
    RTE.config.tt_content.bodytext.disabled = 0

    In this example the RTE is still enabled for content elements in generally but if the Content Element type is set to "Text" (text) then the RTE will be disabled again!

    RTE.default.disabled = 1
    RTE.config.tt_content.bodytext.disabled = 0

    RTE.config.tt_content.bodytext.types.text.disabled = 1

    The RTE object in Page TSconfig

    The RTE object contains configuration of the RTE application. There are a few properties which are used externally from the RTE. The property "disabled" will simply disable the rendering of the RTE and "proc" is reserved to contain additional configuration of transformations.

    Property:

    Data type:

    Description:

    disabled

    boolean

    If set, the editor is disabled.

    This option is evaluated in t3lib_TCEforms where it determines if the RTE is rendered or not.

    proc

    ->PROC

    Customization of the server processing of the content - also called 'transformations'. See table below.

    The transformations are only initialized, if they are configured (“rte_transform” must be set for the field in the types-definition in TCA.)

    The "->PROC" object is processed in "t3lib_parsehtml_proc" and is independant of the particular RTE used (like transformations generally is!)

    [individual RTE options]

    -

    Each RTE may use additional properties for the RTE. Typically such properties relates to the features of the RTE application. For instance you could configure which tool bar buttons are available etc.

    [page:->RTEconf]

    Configuration examples
       0: RTE.default >
       1: RTE.default {
       2:   mainStyle_font = Arial, sans-serif
       3:   mainStyle_size = 12
       4:   mainStyle_color = black
       5:   classesParagraph = redText
       6:   classesCharacter = redText
       7:   showButtons = cut,copy,fontstyle,fontsize, textcolor,table,bgcolor

       8:   proc.preserveTables = 1

       9:    

      10:   proc.entryHTMLparser_db = 1

      11:   proc.entryHTMLparser_db {

      12:     keepNonMatchedTags = 1

      13:     xhtml_cleaning = 1

      14:   }

      15:    
      16:   mainStyleOverride_add {
      17:     P =  font-family:Arial, sans-serif; font-size:12;
      18:     H1 =  font-family:Arial, sans-serif; font-size:16;  font-weight:bold; margin-top:0;margin-bottom:10;
      19:     H2 =  font-family:Arial, sans-serif; font-size:12;  font-weight:bold; color:navy; margin-top:0;margin-bottom:10;
      20:     H3 =  font-family:Arial, sans-serif; font-size:18;  font-weight:bold;
      21:     H4 =  font-family:Arial, sans-serif; font-size:24; 
      22:     H5 =  font-family:Arial, sans-serif; font-size:20;  color:navy; font-weight:normal;  margin-top:0;margin-bottom:10;
      23:     H6 =  font-family:Arial, sans-serif; font-size:16;  font-weight:bold;
      24:   }
      25:   disablePCexamples = 0

      26: }

    In this example all the configuration except line 8-14 ("proc." configuration) is defining the RTE applications internal features. These options will vary depending on the RTE used. In this case the configuration is for the classic MSIE Active-X RTE in the extension "rte".

    The ->PROC object

    This object contains configuration of the transformations used. These options are universal for all RTEs and used inside the class "t3lib_parsehtml_proc".

    The main objective of these options is to allow for minor configuration of the transformations. For instance you may disable the mapping between <B>-<STRONG> and <I>-<EM> tags which is done by the 'ts_transform' transformation. Or you could disable the default transfer of images from external URL to the local server. This is all possible through the options.

    Notice how many properties relates to specific transformations only! Also notice that the meta-transformations "ts" and "ts_css" implies other transformations like "ts_transform" and "css_transform" which means that options limited to "ts_transform" will also work for "ts" of course.

    Property:

    Data type:

    Description:

    overruleMode

    List of RTE transformations

    This can overrule the RTE transformation set from TCA.

    Notice, this is a comma list of transformation keys. (Not a "dash-list" like in $TCA).

    typolist

    boolean

    (Applies for “ts_transform” only)

    This enables/disables the conversion between <TYPOLIST> and <UL> sections. Default (if unset) is that "typolist" is enabled.

    Example that disables "typolist":

    typolist = 0

    typohead

    boolean

    (Applies for “ts_transform” only)

    This enables/disables the conversion between <TYPOHEAD> and <Hx> sections.

    Example that disables "typohead":

    typohead = 0

    preserveTags

    list of tags

    (DEPRECATED)

    Here you may specify a list of tags - possibly user-defined pseudo tags - which you wish to preserve from being removed by the RTE. See the information about preservation in the description of transformations.

    Example:

    In the default TypoScript configuration of content rendering the tags typotags <LINK>, <TYPOLIST> and <TYPOHEAD> are the most widely used. However the <TYPOCODE>-tag is also configured to let you define a section being formatted in monospace. Lets also imaging, you have defined a custom tag, <MYTAG>. In order to preserve these tag from removal by the RTE, you should configure like this.

    RTE.default.proc {

      preserveTags = TYPOCODE, MYTAG

    }

    Relates to the transformation 'ts_preserve'

    dontConvBRtoParagraph

    boolean

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    By default <BR> tags in the content are converted to paragraphs. Setting this value will prevent the convertion of <BR>-tags to new-lines (chr(10))

    internalizeFontTags

    boolean

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    This splits the content into font-tag chunks.

    If there are any <P>/<DIV> sections inside of them, the font-tag is wrapped AROUND the content INSIDE of the P/DIV sections and the outer font-tag is removed.

    This functions seems to be a good choice for pre-processing content if it has been pasted into the RTE from eg. star-office.

    In that case the font-tags is normally on the OUTSIDE of the sections.

    allowTagsOutside

    commalist of strings

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    Enter tags which are allowed outside of <P> and <DIV> sections when converted back to database.

    Default is “img”

    Example:

    IMG,HR

    allowTagsInTypolists

    commalist of strings

    (Applies for “ts_transform” only)

    Enter tags which are allowed inside of <typolist> tags when content is sent to the database.

    Default is “br,font,b,i,u,a,img,span”

    allowTags

    commalist of strings

    (Applies for “ts_transform” and "css_transform" only (function getKeepTags))

    Tags to allow. Notice, this list is added to the default list, which you see here:

    b,i,u,a,img,br,div,center,pre,font,hr,sub,sup,p,strong,em,li,ul,ol,blockquote,strike,span

    If you wish to deny some tags, see below.

    denyTags

    commalist of strings

    (Applies for “ts_transform” and "css_transform" only (function getKeepTags))

    Tags from above list to disallow.

    HTMLparser_rte

    HTMLparser_db

    ->HTMLparser

    (Applies for “ts_transform” and "css_transform" only (function getKeepTags))

    This is additional options to the HTML-parser calls which strips of tags when the content is prepared for the RTE and DB respectively. You can configure additional rules, like which other tags to preserve, which attributes to preserve, which values are allowed as attributes of a certain tag etc.

    .nestingGlobal for HTMLparser_db is set by default to “b,i,u,a,center,font,sub,sup,strong,em,strike,span” unless another value is set.

    Also B/I tags are mapped to STRONG/EM tags in the RTE direction and vise versa.

    This parsing is done on a per-line basis, so you cannot expect the paragraph tags (P or DIV) to be included.

    Notice the ->HTMLparser options, “keepNonMatchedTags” and “htmlSpecialChars” is NOT observed. They are preset internally

    dontRemoveUnknownTags_db

    boolean

    (Applies for “ts_transform” and "css_transform" only (function HTMLcleaner_db))

    Direction: To database

    Default is to remove all unknown tags in the content going to the database. (See HTMLparser_db above for default tags). Generally this is a very usefull thing, because all kinds of bogus tags from pasted content like that from Word etc. will be removed to have clean content in the database.

    However this disables that and allows all tags, that are not in the HTMLparser_db-list.

    dontUndoHSC_db

    boolean

    (Applies for “ts_transform” and "css_transform" only (function HTMLcleaner_db))

    Direction: To database

    Default is to re-convert literals to characters (that is &lt; to <) outside of HTML-tags. This is disabled by this boolean. (HSC means HtmlSpecialChars - which is a PHP function)

    dontProtectUnknownTags_rte

    boolean

    (Applies for “ts_transform” and "css_transform" only (function setDivTags))

    Direction: To RTE

    Default is that tags unknown to HTMLparser_rte is “protected” when sent to the RTE. This means they are converted from eg <MYTAG> to &lt;MYTAG&gt;. This is normally very fine, because it can be edited plainly by the editor and when returned to the database the tag is (by default, disabled by .dontUndoHSC_db) converted back.

    Setting this option will prevent unknown tags from becoming protected.

    dontHSC_rte

    boolean

    (Applies for “ts_transform” and "css_transform" only (function setDivTags))

    Direction: To RTE

    Default is that all content outside of HTML-tags is passed through htmlspecialchars(). This will disable that. (opposite to .dontUndoHSC_db)

    This option disables the default htmlspecialchars() conversion.

    dontConvAmpInNBSP_rte

    boolean

    (Applies for “ts_transform” and "css_transform" only (function setDivTags))

    Direction: To RTE

    By default all &nbsp; codes are NOT converted to &amp;nbsp; which they naturally word (unless .dontHSC_rte is set). You can disable that by this flag.

    allowedFontColors

    list of HTMLcolors

    (Applies for “ts_transform” and "css_transform" only (function getKeepTags))

    Direction: To DB

    If set, this is the only colors which will be allowed in font-tags! Case insensitive.

    allowedClasses

    list of strings

    (Applies for “ts_transform” and "css_transform" only (function getKeepTags))

    Direction: To DB

    Allowed general classnames when content is stored in database. Could be a list matching the number of defined classes you have. Case-insensitive.

    This might be a really good idea to do, because when pasting in content from MS word for instance there are a lot of <SPAN> and <P> tags which may have class-names in. So by setting a list of allowed classes, such foreign classnames are removed.

    If a classname is not found in this list, the default is to remove the class-attribute.

    skipAlign

    skipClass

    boolean

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    If set, then the align and class attributes of <P>/<DIV> sections (respectively) will be ignored. Normally <P>/<DIV> tags are preserved if one or both of these attributes are present in the tag. Otherwise it's removed.

    keepPDIVattribs

    list of tag attributes  (strings)

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    “align” and “class” are the only attributes preserved for <P>/<DIV> tags. Here you can specify a list of other attributes to preserve.

    remapParagraphTag

    string / boolean

    (Applies for “ts_transform” and "css_transform" only (function divideIntoLines))

    When <P>/<DIV> sections are converted to be put into the database, the tag - P or DIV - is preserved. However setting this options to either P or DIV will force the section to be converted to the one or the other.

    If the value is set true (1), then it works as a general disable-flag for the whole section-convertion stuff here and the result will be no tags preserved what so ever. Just removed.

    useDIVasParagraphTagForRTE

    string

    (Applies for “ts_transform” only and "css_transform" (function TS_transform_rte))

    Use <DIV>-tags for sections when converting lines from database to RTE. Default is <P>. Applies only to lines which has NO tag wrapped around already.

    preserveTables

    boolean

    (Applies for “ts_transform”)

    If set, tables are preserved

    dontFetchExtPictures

    boolean

    (Applies for “ts_images”)

    If set, images from external urls are not fetched for the page if content is pasted from external sources. Normally this process of copying is done.

    plainImageMode

    boolean/string

    (Applies for “ts_images”)

    If set, all “plain” local images (those that are not magic images) will be cleaned up in some way.

    If the value is just set, then the style attribute will be removed after detecting any special width/height CSS attributes (which is what the RTE will set if you scale the image manually) and the border attribute is set to zero.

    You can also configure with special keywords. So setting “plainImageMode” to any of the value below will perform special processing:

    “lockDimensions” : This will read the real dimensions of the image file and force these values into the <img> tag. Thus this option will prevent any user applied scaling in the image!

    “lockRatio” : This will allow users to scale the image but will automatically correct the height dimension so the aspect ratio from the original image file is preserved.

    “lockRatioWhenSmaller” : Like “lockRatio”, but will not allow any scaling larger than the original size of the image.

    exitHTMLparser_rte

    exitHTMLparser_db

    entryHTMLparser_rte

    entryHTMLparser_db

    boolean/->HTMLparser

    (Applies for all kinds of processing)

    Allows you to enable/disable the HTMLparser for the content before (entry) and after (exit) the content is processed with the predefined processors (eg. ts_images or ts_transform).

    There is no default values set.

    disableUnifyLineBreaks

    boolean

    (Applies for all kinds of processing)

    When entering the processor all \r\n linebreaks are converted to \n (13-10 to 10). When leaving the processor all \n is reconverted to \r\n (10 to 13-10).

    This options disables that processing...

    usertrans.[user-defined transformation key]

    -

    Custom option-space for userdefined transformations.

    See example from section about custom transformations.

    [page:->PROC]

    Custom transformations API

    Instead of using the built-in transformations of TYPO3 you can program your own. This is done by creating a PHP class with two methods for transformation. Additionally you have to define a key (like "css_transform") for your transformation so you can refer to it in the configuration of Rich Text Editors.

    Custom transformation key

    You should pick a custom transformation key which is prefixed with either "tx_" or "user_". Use "tx_[extension key]_[suffix]" if you deliver your transformation inside an extension.

    Notice: If you pick one of the default transformation keys (except the meta-transformations) you will simply override it and your transformation will be called instead!

    Registering the transformation key in the system

    In "ext_localconf.php" you simply set a $TYPO3_CONF_VARS variable to point to the class which contains the transformation methods:

    $TYPO3_CONF_VARS['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['transformation']['tx_myext']

        = 'EXT:myext/custom_transformation.php:user_transformation';

    Here the transformation key is defined to be "tx_myext" (assuming the extension has the extension key "myext") and the value points to a file inside the transformation which will contain the class "user_transformation" (instantiated by t3lib_div::getUserObj())

    This class must contain two methods, "transform_db" and "transform_rte" for each transformation direction.

    Code listing of "user_transformation"

    This code listing shows a simple transformation. When content is delivered to the RTE it will add a <hr/> tag to the end of the content. When the content is stored in the database any <hr/> tag in the end of the content will be removed and substituted with whitespace. This is of totally useless but nevertheless shows the concept of transformations between RTE and DB.

       0: /**

       1:  * Custom RTE transformation

       2:  */

       3: class user_transformation {

       4:

       5:         // object; Reference to the parent object, t3lib_parsehtml_proc

       6:     var $pObj;

       7:

       8:         // Transformation key of self.

       9:     var $transformationKey = 'tx_myext';

      10:

      11:         // Will contain transformation configuration if found:

      12:     var $conf;

      13:

      14:

      15:     /**

      16:      * Setting specific configuration for this transformation

      17:      *

      18:      * @return    void

      19:      */

      20:     function initConfig()    {

      21:         $this->conf = $this->pObj->procOptions['usertrans.'][$this->transformationKey.'.'];

      22:     }

      23:

      24:     /**

      25:      * Reserved method name, called when content is transformed for DB storage

      26:      * If "proc.usertrans.tx_myext.addHrulerInRTE = 1" then a horizontal ruler in the

      27:      * end of the content will be removed (if found)

      28:      *

      29:      * @param    string        RTE HTML to clean for database storage

      30:      * @return    string        Processed input string.

      31:      */

      32:     function transform_db($value)    {

      33:         $this->initConfig();

      34:

      35:         if ($this->conf['addHrulerInRTE'])    {

      36:             $value = eregi_replace('<hr[[:space:]]*[\/]>[[:space:]]*$','',$value);

      37:         }

      38:

      39:         return $value;

      40:     }

      41:

      42:     /**

      43:      * Reserved method name, called when content is transformed for RTE display

      44:      * If "proc.usertrans.tx_myext.addHrulerInRTE = 1" then a horizontal ruler

      45:      * will be added in the end of the content.

      46:      *

      47:      * @param    string        Database content to transform to RTE ready HTML

      48:      * @return    string        Processed input string.

      49:      */

      50:     function transform_rte($value)    {

      51:         $this->initConfig();

      52:

      53:         if ($this->conf['addHrulerInRTE'])    {

      54:             $value.='<hr/>';

      55:         }

      56:

      57:         return $value;

      58:     }

      59: }

    Comments to code listing

    1. The transformation methods "transform_rte" and "transform_db" takes a single argument which is the value to transform. They have to return that value again.

    2. The internal variable $pObj is set to be a reference to the parent object which is an instance of "t3lib_parsehtml_proc". Inside of this object you can access the default transformation functions if you need to and in particular you can read out configuration settings.

    3. The internal variable $transformationKey is automatically set to the transformation key that is active.

    4. Notice that both transformation functions call initConfig() (line 33 and 51) which reads custom configuration.

    Using the transformation

    In order to use the transformation you simply use it in the list of transformations in Special Configuration. Here is an example that works:

       1: 'TEST01' => Array (

       2:     'label' => 'TEST01: Text field',

       3:     'config' => Array (

       4:         'type' => 'text',

       5:     ),

       6:     'defaultExtras' => 'richtext[*]:rte_transform[mode=tx_myext-css_transform]'

       7: ),

    The order is important. The order in this list is the order of calling when the direction is "db". If the order is reversed the <hr/> tag will come out as regular text in the RTE because "css_transform" protects all non-allowed tags with htmlspecialchars().

    Now the transformations should be called correctly. Before the <hr/> will be added/removed we also have to configure through Page TSconfig (because we programmed our transformation to look for this configuration option):

    RTE.default.proc.usertrans.tx_myext.addHrulerInRTE = 1

    That's all!



    TYPO3 Core API

    TSRef

    TYPO3 Coding Guidelines