Highlighter XML sourceBasics
Creating a new syntax highlighter begins with describing the
highlighting rules. There are two basic elements:
block and region. A block is
just a portion of text matching a regular expression and highlighted
with a single color. Keyword is an example of a block. A region is
defined by two regular expressions: one for start of region, and
another for the end. The main difference from a block is that a region
can contain blocks and regions (including same-named regions). An
example of a region is a group of statements enclosed in curly brackets
(this is used in many languages, for example PHP and C). Also,
characters matching start and end of a region may be highlighted with
their own color, and region contents with another.
Blocks and regions may be declared as contained.
Contained blocks and regions can only appear inside regions. If a
region or a block is not declared as contained, it can appear both on
top level and inside regions. Block or region declared as
not-contained can only appear on top level.
For any region, a list of blocks and regions that can appear inside
this region can be specified.
Elements
The toplevel element is <highlight>
.
Attribute lang
is required and denotes the name of
the language. Its value is used as a part of generated class name, and
must only contain letters, digits and underscores. Optional attribute
case
, when given value yes
,
makes the language case sensitive (default is case insensitive).
Allowed subelements are:
<authors>
: Information about the authors of
the file.
<author>
: Information about a single author of
the file. (May be used multiple times, one per author.)
name="..."
: Author's name. Required.
email="..."
: Author's email address. Optional.
<default>
: Default CSS class.
innerClass="..."
: CSS class name. Required.
<region>
: Region definition
name="..."
: Region name. Required.
innerClass="..."
: Default CSS class of region
contents. Required.
delimClass="..."
: CSS class of start and end
of region. Optional, defaults to value of
innerClass
attribute.
start="..."
, end="..."
:
Regular expression matching start and end of region. Required.
Regular expression delimiters are optional, but if you need to
specify delimiter, use /
. The only case when
the delimiters are needed, is specifying regular expression
modifiers, such as m
or U
.
Examples: \/\*
or /$/m
.
contained="yes"
: Marks region as contained.
never-contained="yes"
: Marks region as
not-contained.
<contains>
: Elements allowed inside this
region.
all="yes"
Region can contain any other
region or block (except not-contained). May be used multiple
times.
<but>
Do not allow certain regions
or blocks.
region="..."
Name of region not allowed
within current region.
block="..."
Name of block not allowed
within current region.
region="..."
Name of region allowed within
current region.
block="..."
Name of block allowed within
current region.
<onlyin>
Only allow this region within
certain regions. May be used multiple times.
block="..."
Name of parent region
<block>
: Block definition
name="..."
: Block name. Required.
innerClass="..."
: CSS class of block
contents. Optional. If not specified, CSS class of parent region
or default CSS class will be used. One would only want to omit
this attribute if there are keyword groups (see below) inherited
from this block, and no special highlighting should apply when the
block does not match the keyword.
match="..."
Regular expression matching the
block. Required. Regular expression delimiters are optional, but if
you need to specify delimiter, use /
. The only
case when the delimiters are needed, is specifying regular
expression modifiers, such as m
or
U
. Examples: #|\/\/
or
/$/m
.
contained="yes"
: Marks block as contained.
never-contained="yes"
: Marks block as
not-contained.
<onlyin>
Only allow this block within
certain regions. May be used multiple times.
block="..."
Name of parent region
multiline="yes"
: Marks block as
multi-line. By default, whole blocks are assumed to reside in a
single line. This make the things faster. If you need to declare a
multi-line block, use this attribute.
<partClass>
: Assigns another CSS class to
a part of the block that matched a subpattern.
index="n"
: Subpattern index. Required.
innerClass="..."
: CSS class name. Required.
This is an example from CSS highlighter: the measure is matched
as a whole, but the measurement units are highlighted with
different color.
<block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
innerClass="number" contained="yes">
<onlyin region="property"/>
<partClass index="1" innerClass="string" />
</block>
|
<keywords>
: Keyword group definition.
Keyword groups are useful when you want to highlight some words that
match a condition for a block with a different color. Keywords are
defined with literal match, not regular expressions. For example,
you have a block named identifier
matching a
general identifier, and want to highlight reserved words (which
match this block as well) with different color. You inherit a
keyword group reserved
from
identifier
block.
name="..."
: Keyword group. Required.
inherits="..."
: Inherited block name. Required.
innerClass="..."
: CSS class of keyword group.
Required.
case="yes|no"
: Overrides case-sensitivity of
the language. Optional, defaults to global value.
<keyword>
: Single keyword definition.
match="..."
The keyword. Note: this is not a
regular expression, but literal match (possibly case
insensitive).
Creating a simple highlighter
Let's try to create a very basic highlighter for PHP language. It will
recognize PHP tags (<?php
and
?>
), square, round and curly brackets, strings,
variables, reserved words and comments.
For our purposes we can consider PHP case insensitive. So at the
beginning XML file will look like this:
<highlight lang="PHP" case="no">
</highlight>
|
The very first we declare the PHP tags. This region is always
toplevel, and we declare it with never-contained
attribute, and allow it to contain any block or region:
<region name="phpTags" delimClass="inlinetags" innerClass="code"
start="\<\?(php|=)?" end="\?\>" never-contained="yes">
<contains all="yes"/>
</region>
|
Variables. This is a very simple block.
<block name="var" match="\$[a-z_]\w*" innerClass="var" contained="yes"/>
|
Next : strings. This is pretty simple, but there is a pitfall with
strings containing escaped quotes. To bypass this, we need to define a
block that matches the escaped quotes. We also want to highlight
variables inside double-quoted strings.
<block name="escaped" match="\\\\|\\"|\\'|\\\$"
innerClass="string" contained="yes">
<onlyin region="strSingle"/>
<onlyin region="strDouble"/>
</block>
<region name="strSingle" delimClass="quotes" innerClass="string"
start="'" end="'" contained="yes"/>
<region name="strDouble" delimClass="quotes" innerClass="string"
start=""" end=""" contained="yes">
<contains block="var"/>
</region>
|
Brackets of different kinds:
<region name="block" delimClass="brackets" innerClass="code"
start="\{" end="\}" contained="yes">
<contains all="yes"/>
</region>
<region name="brackets" delimClass="brackets" innerClass="code"
start="\(" end="\)" contained="yes" >
<contains all="yes"/>
</region>
<region name="sqbrackets" delimClass="brackets" innerClass="code"
start="\[" end="\]" contained="yes">
<contains all="yes"/>
</region>
|
Identifiers:
<block name="identifier" match="[a-z_]\w*" innerClass="identifier"
contained="yes"/>
|
And let's highlight some reserved words with different color:
<keywords name="reserved" inherits="identifier" innerClass="reserved">
<keyword match="echo"/>
<keyword match="foreach"/>
<keyword match="else"/>
<keyword match="if"/>
<keyword match="elseif"/>
<keyword match="for"/>
<keyword match="as"/>
<keyword match="while"/>
<keyword match="foreach"/>
<keyword match="break"/>
<keyword match="continue"/>
<keyword match="class"/>
<keyword match="switch"/>
<keyword match="case"/>
<keyword match="array"/>
<keyword match="default"/>
<keyword match="do"/>
<keyword match="exit"/>
<keyword match="function"/>
<keyword match="global"/>
<keyword match="include"/>
<keyword match="include_once"/>
<keyword match="require"/>
<keyword match="require_once"/>
<keyword match="isset"/>
<keyword match="empty"/>
<keyword match="list"/>
<keyword match="new"/>
<keyword match="static"/>
<keyword match="var"/>
<keyword match="return"/>
<keyword match="NULL"/>
<keyword match="true"/>
<keyword match="false"/>
</keywords>
|
And finally, comments
<block name="comment" match="(#|\/\/).+" innerClass="comment"
contained="yes"/>
<region name="mlcomment" innerClass="comment"
start="\/\*" end="\*\/" contained="yes"/>
|
Well, it seems to be all, but... This highlighter would not recognize
escaping from PHP code to HTML and back. Let's correct this:
<region name="codeEscape" delimClass="inlinetags" innerClass="default"
start="\?\>" end="\<\?(php|=)?" contained="yes">
<onlyin region="block"/>
</region>
|
Finally, I add myself as the author, and save the file as
php.xml :
<highlight lang="PHP" case="no">
<authors>
<author name="Andrey Demenev" email ="demenev@on-line.jar.ru"/>
</authors>
<region name="phpTags" delimClass="inlinetags" innerClass="code"
start="\<\?(php|=)?" end="\?\>" never-contained="yes">
<contains all="yes"/>
</region>
<block name="var" match="\$[a-z_]\w*" innerClass="var" contained="yes"/>
<block name="escaped" match="\\\\|\\"|\\'|\\\$"
innerClass="string" contained="yes">
<onlyin region="strSingle"/>
<onlyin region="strDouble"/>
</block>
<region name="strSingle" delimClass="quotes" innerClass="string"
start="'" end="'" contained="yes"/>
<region name="strDouble" delimClass="quotes" innerClass="string"
start=""" end=""" contained="yes">
<contains block="var"/>
</region>
<region name="block" delimClass="brackets" innerClass="code"
start="\{" end="\}" contained="yes">
<contains all="yes"/>
</region>
<region name="brackets" delimClass="brackets" innerClass="code"
start="\(" end="\)" contained="yes" >
<contains all="yes"/>
</region>
<region name="sqbrackets" delimClass="brackets" innerClass="code"
start="\[" end="\]" contained="yes">
<contains all="yes"/>
</region>
<block name="identifier" match="[a-z_]\w*" innerClass="identifier"
contained="yes"/>
<keywords name="reserved" inherits="identifier" innerClass="reserved">
<keyword match="echo"/>
<keyword match="foreach"/>
<keyword match="else"/>
<keyword match="if"/>
<keyword match="elseif"/>
<keyword match="for"/>
<keyword match="as"/>
<keyword match="while"/>
<keyword match="foreach"/>
<keyword match="break"/>
<keyword match="continue"/>
<keyword match="class"/>
<keyword match="switch"/>
<keyword match="case"/>
<keyword match="array"/>
<keyword match="default"/>
<keyword match="do"/>
<keyword match="exit"/>
<keyword match="function"/>
<keyword match="global"/>
<keyword match="include"/>
<keyword match="include_once"/>
<keyword match="require"/>
<keyword match="require_once"/>
<keyword match="isset"/>
<keyword match="empty"/>
<keyword match="list"/>
<keyword match="new"/>
<keyword match="static"/>
<keyword match="var"/>
<keyword match="return"/>
<keyword match="NULL"/>
<keyword match="true"/>
<keyword match="false"/>
</keywords>
<block name="comment" match="(#|\/\/).+" innerClass="comment"
contained="yes"/>
<region name="mlcomment" innerClass="comment"
start="\/\*" end="\*\/" contained="yes"/>
<region name="codeEscape" delimClass="inlinetags" innerClass="default"
start="\?\>" end="\<\?(php|=)?" contained="yes">
<onlyin region="block"/>
</region>
</highlight>
|