Text_Highlighter
[ class tree: Text_Highlighter ] [ index: Text_Highlighter ] [ all elements ]
Prev Next

Highlighter XML source

Basics

Creating a new syntax highlighter begins with describing the highlighting rules. There are two basic elements: block and region. A block is just a portion of text matching a regular expression and highlighted with a single color. Keyword is an example of a block. A region is defined by two regular expressions: one for start of region, and another for the end. The main difference from a block is that a region can contain blocks and regions (including same-named regions). An example of a region is a group of statements enclosed in curly brackets (this is used in many languages, for example PHP and C). Also, characters matching start and end of a region may be highlighted with their own color, and region contents with another.

Blocks and regions may be declared as contained. Contained blocks and regions can only appear inside regions. If a region or a block is not declared as contained, it can appear both on top level and inside regions. Block or region declared as not-contained can only appear on top level.

For any region, a list of blocks and regions that can appear inside this region can be specified.


Elements

The toplevel element is <highlight>. Attribute lang is required and denotes the name of the language. Its value is used as a part of generated class name, and must only contain letters, digits and underscores. Optional attribute case, when given value yes, makes the language case sensitive (default is case insensitive). Allowed subelements are:

  • <authors>: Information about the authors of the file.
    • <author>: Information about a single author of the file. (May be used multiple times, one per author.)
      • name="...": Author's name. Required.
      • email="...": Author's email address. Optional.
  • <default>: Default CSS class.
    • innerClass="...": CSS class name. Required.
  • <region>: Region definition
    • name="...": Region name. Required.
    • innerClass="...": Default CSS class of region contents. Required.
    • delimClass="...": CSS class of start and end of region. Optional, defaults to value of innerClass attribute.
    • start="...", end="...": Regular expression matching start and end of region. Required. Regular expression delimiters are optional, but if you need to specify delimiter, use /. The only case when the delimiters are needed, is specifying regular expression modifiers, such as m or U. Examples: \/\* or /$/m.
    • contained="yes": Marks region as contained.
    • never-contained="yes": Marks region as not-contained.
    • <contains>: Elements allowed inside this region.
      • all="yes" Region can contain any other region or block (except not-contained). May be used multiple times.
        • <but> Do not allow certain regions or blocks.
          • region="..." Name of region not allowed within current region.
          • block="..." Name of block not allowed within current region.
      • region="..." Name of region allowed within current region.
      • block="..." Name of block allowed within current region.
    • <onlyin> Only allow this region within certain regions. May be used multiple times.
      • block="..." Name of parent region
  • <block>: Block definition
    • name="...": Block name. Required.
    • innerClass="...": CSS class of block contents. Optional. If not specified, CSS class of parent region or default CSS class will be used. One would only want to omit this attribute if there are keyword groups (see below) inherited from this block, and no special highlighting should apply when the block does not match the keyword.
    • match="..." Regular expression matching the block. Required. Regular expression delimiters are optional, but if you need to specify delimiter, use /. The only case when the delimiters are needed, is specifying regular expression modifiers, such as m or U. Examples: #|\/\/ or /$/m.
    • contained="yes": Marks block as contained.
    • never-contained="yes": Marks block as not-contained.
    • <onlyin> Only allow this block within certain regions. May be used multiple times.
      • block="..." Name of parent region
    • multiline="yes": Marks block as multi-line. By default, whole blocks are assumed to reside in a single line. This make the things faster. If you need to declare a multi-line block, use this attribute.
    • <partClass>: Assigns another CSS class to a part of the block that matched a subpattern.
      • index="n": Subpattern index. Required.
      • innerClass="...": CSS class name. Required.
      This is an example from CSS highlighter: the measure is matched as a whole, but the measurement units are highlighted with different color.
        <block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)" 
           innerClass="number" contained="yes">
        <onlyin region="property"/>
        <partClass index="1" innerClass="string" />
        </block>
        
  • <keywords>: Keyword group definition. Keyword groups are useful when you want to highlight some words that match a condition for a block with a different color. Keywords are defined with literal match, not regular expressions. For example, you have a block named identifier matching a general identifier, and want to highlight reserved words (which match this block as well) with different color. You inherit a keyword group reserved from identifier block.
    • name="...": Keyword group. Required.
    • inherits="...": Inherited block name. Required.
    • innerClass="...": CSS class of keyword group. Required.
    • case="yes|no": Overrides case-sensitivity of the language. Optional, defaults to global value.
    • <keyword>: Single keyword definition.
      • match="..." The keyword. Note: this is not a regular expression, but literal match (possibly case insensitive).


Creating a simple highlighter

Let's try to create a very basic highlighter for PHP language. It will recognize PHP tags (<?php and ?>), square, round and curly brackets, strings, variables, reserved words and comments.

For our purposes we can consider PHP case insensitive. So at the beginning XML file will look like this:
<highlight lang="PHP" case="no">
</highlight>

The very first we declare the PHP tags. This region is always toplevel, and we declare it with never-contained attribute, and allow it to contain any block or region:
  <region name="phpTags" delimClass="inlinetags" innerClass="code"
          start="\&lt;\?(php|=)?" end="\?\>" never-contained="yes">
    <contains all="yes"/>
  </region>

Variables. This is a very simple block.
  <block name="var" match="\$[a-z_]\w*" innerClass="var" contained="yes"/>

Next : strings. This is pretty simple, but there is a pitfall with strings containing escaped quotes. To bypass this, we need to define a block that matches the escaped quotes. We also want to highlight variables inside double-quoted strings.
  <block name="escaped" match="\\\\|\\&quot;|\\'|\\\$" 
         innerClass="string" contained="yes">
    <onlyin region="strSingle"/>
    <onlyin region="strDouble"/>
  </block>

  <region name="strSingle" delimClass="quotes" innerClass="string" 
          start="'" end="'" contained="yes"/>

  <region name="strDouble" delimClass="quotes" innerClass="string" 
          start="&quot;" end="&quot;" contained="yes">
    <contains block="var"/>
  </region>

Brackets of different kinds:
  <region name="block" delimClass="brackets" innerClass="code" 
          start="\{" end="\}" contained="yes">
    <contains all="yes"/>
  </region>

  <region name="brackets" delimClass="brackets" innerClass="code" 
          start="\(" end="\)" contained="yes" >
    <contains all="yes"/>
  </region>

  <region name="sqbrackets" delimClass="brackets" innerClass="code" 
          start="\[" end="\]" contained="yes">
    <contains all="yes"/>
  </region>

Identifiers:
  <block name="identifier" match="[a-z_]\w*" innerClass="identifier" 
         contained="yes"/>

And let's highlight some reserved words with different color:
  <keywords name="reserved" inherits="identifier" innerClass="reserved">
    <keyword match="echo"/>
    <keyword match="foreach"/>
    <keyword match="else"/>
    <keyword match="if"/>
    <keyword match="elseif"/>
    <keyword match="for"/>
    <keyword match="as"/>
    <keyword match="while"/>
    <keyword match="foreach"/>
    <keyword match="break"/>
    <keyword match="continue"/>
    <keyword match="class"/>
    <keyword match="switch"/>
    <keyword match="case"/>
    <keyword match="array"/>
    <keyword match="default"/>
    <keyword match="do"/>
    <keyword match="exit"/>
    <keyword match="function"/>
    <keyword match="global"/>
    <keyword match="include"/>
    <keyword match="include_once"/>
    <keyword match="require"/>
    <keyword match="require_once"/>
    <keyword match="isset"/>
    <keyword match="empty"/>
    <keyword match="list"/>
    <keyword match="new"/>
    <keyword match="static"/>
    <keyword match="var"/>
    <keyword match="return"/>
    <keyword match="NULL"/>
    <keyword match="true"/>
    <keyword match="false"/>
  </keywords> 

And finally, comments
  <block name="comment" match="(#|\/\/).+" innerClass="comment" 
         contained="yes"/>

  <region name="mlcomment" innerClass="comment" 
          start="\/\*" end="\*\/" contained="yes"/>

Well, it seems to be all, but... This highlighter would not recognize escaping from PHP code to HTML and back. Let's correct this:
  <region name="codeEscape" delimClass="inlinetags" innerClass="default" 
          start="\?\>" end="\&lt;\?(php|=)?" contained="yes">
    <onlyin region="block"/>
  </region>

Finally, I add myself as the author, and save the file as php.xml :
<highlight lang="PHP" case="no">

  <authors>
    <author name="Andrey Demenev" email ="demenev@on-line.jar.ru"/>
  </authors>

  <region name="phpTags" delimClass="inlinetags" innerClass="code"
          start="\&lt;\?(php|=)?" end="\?\>" never-contained="yes">
    <contains all="yes"/>
  </region>

  <block name="var" match="\$[a-z_]\w*" innerClass="var" contained="yes"/>

  <block name="escaped" match="\\\\|\\&quot;|\\'|\\\$" 
         innerClass="string" contained="yes">
    <onlyin region="strSingle"/>
    <onlyin region="strDouble"/>
  </block>

  <region name="strSingle" delimClass="quotes" innerClass="string" 
          start="'" end="'" contained="yes"/>

  <region name="strDouble" delimClass="quotes" innerClass="string" 
          start="&quot;" end="&quot;" contained="yes">
    <contains block="var"/>
  </region>

  <region name="block" delimClass="brackets" innerClass="code" 
          start="\{" end="\}" contained="yes">
    <contains all="yes"/>
  </region>

  <region name="brackets" delimClass="brackets" innerClass="code" 
          start="\(" end="\)" contained="yes" >
    <contains all="yes"/>
  </region>

  <region name="sqbrackets" delimClass="brackets" innerClass="code" 
          start="\[" end="\]" contained="yes">
    <contains all="yes"/>
  </region>

  <block name="identifier" match="[a-z_]\w*" innerClass="identifier" 
         contained="yes"/>

  <keywords name="reserved" inherits="identifier" innerClass="reserved">
    <keyword match="echo"/>
    <keyword match="foreach"/>
    <keyword match="else"/>
    <keyword match="if"/>
    <keyword match="elseif"/>
    <keyword match="for"/>
    <keyword match="as"/>
    <keyword match="while"/>
    <keyword match="foreach"/>
    <keyword match="break"/>
    <keyword match="continue"/>
    <keyword match="class"/>
    <keyword match="switch"/>
    <keyword match="case"/>
    <keyword match="array"/>
    <keyword match="default"/>
    <keyword match="do"/>
    <keyword match="exit"/>
    <keyword match="function"/>
    <keyword match="global"/>
    <keyword match="include"/>
    <keyword match="include_once"/>
    <keyword match="require"/>
    <keyword match="require_once"/>
    <keyword match="isset"/>
    <keyword match="empty"/>
    <keyword match="list"/>
    <keyword match="new"/>
    <keyword match="static"/>
    <keyword match="var"/>
    <keyword match="return"/>
    <keyword match="NULL"/>
    <keyword match="true"/>
    <keyword match="false"/>
  </keywords> 

  <block name="comment" match="(#|\/\/).+" innerClass="comment" 
         contained="yes"/>

  <region name="mlcomment" innerClass="comment" 
          start="\/\*" end="\*\/" contained="yes"/>

  <region name="codeEscape" delimClass="inlinetags" innerClass="default" 
          start="\?\>" end="\&lt;\?(php|=)?" contained="yes">
    <onlyin region="block"/>
  </region>

</highlight>


Prev Up Next
Creating a syntax highlighter Creating a syntax highlighter

Documentation generated on Mon, 11 Mar 2019 13:51:46 -0400 by phpDocumentor 1.4.4. PEAR Logo Copyright © PHP Group 2004.