PHP_LexerGenerator
[ class tree: PHP_LexerGenerator ] [ index: PHP_LexerGenerator ] [ all elements ]

Class: PHP_LexerGenerator

Source Location: /PHP_LexerGenerator-0.4.0/PHP/LexerGenerator.php

Class Overview


The basic home class for the lexer generator. A lexer scans text and organizes it into tokens for usage by a parser.


Author(s):

Version:

  • @package_version@

Copyright:

  • 2006 Gregory Beaver

Variables

Methods


Inherited Variables

Inherited Methods


Class Details

[line 264]
The basic home class for the lexer generator. A lexer scans text and organizes it into tokens for usage by a parser.

Sample Usage:

  1.  require_once 'PHP/LexerGenerator.php';
  2.  $lex = new PHP_LexerGenerator('/path/to/lexerfile.plex');

A file named "/path/to/lexerfile.php" will be created.

File format consists of a PHP file containing specially formatted comments like so:

  1.  /*!lex2php
  2.  */

All lexer definition files must contain at least two lex2php comment blocks:

  • 1 regex declaration block
  • 1 or more rule declaration blocks
The first lex2php comment is the regex declaration block and must contain several processor instruction as well as defining a name for all regular expressions. Processor instructions start with a "%" symbol and must be:

  • %counter
  • %input
  • %token
  • %value
  • %line
token and counter should define the class variables used to define lexer input and the index into the input. token and value should be used to define the class variables used to store the token number and its textual value. Finally, line should be used to define the class variable used to define the current line number of scanning.

For example:

  1.  /*!lex2php
  2.  %counter {$this->N}
  3.  %input {$this->data}
  4.  %token {$this->token}
  5.  %value {$this->value}
  6.  %line {%this->linenumber}
  7.  */

Patterns consist of an identifier containing an letters or an underscore, and a descriptive match pattern.

Descriptive match patterns may either be regular expressions (regexes) or quoted literal strings. Here are some examples:

 pattern = "quoted literal"
 ANOTHER = /[a-zA-Z_]+/
 COMPLEX = @<([a-zA-Z_]+)( +(([a-zA-Z_]+)=((["\'])([^\6]*)\6))+){0,1}>[^<]*@

Quoted strings must escape the \ and " characters with \" and \\.

Regex patterns must be in Perl-compatible regular expression format (preg). special characters (like \t \n or \x3H) can only be used in regexes, all \ will be escaped in literal strings.

Sub-patterns may be defined and back-references (like \1) may be used. Any sub- patterns detected will be passed to the token handler in the variable $yysubmatches.

In addition, lookahead expressions, and once-only expressions are allowed. Lookbehind expressions are impossible (scanning always occurs from the current position forward), and recursion (?R) can't work and is not allowed.

  1.  /*!lex2php
  2.  %counter {$this->N}
  3.  %input {$this->data}
  4.  %token {$this->token}
  5.  %value {$this->value}
  6.  %line {%this->linenumber}
  7.  alpha = /[a-zA-Z]/
  8.  alphaplus = /[a-zA-Z]+/
  9.  number = /[0-9]/
  10.  numerals = /[0-9]+/
  11.  whitespace = /[ \t\n]+/
  12.  blah = "$\""
  13.  blahblah = /a\$/
  14.  GAMEEND = @(?:1\-0|0\-1|1/2\-1/2)@
  15.  PAWNMOVE = /P?[a-h]([2-7]|[18]\=(Q|R|B|N))|P?[a-h]x[a-h]([2-7]|[18]\=(Q|R|B|N))/
  16.  */

All regexes must be delimited. Any legal preg delimiter can be used (as in @ or / in the example above)

Rule lex2php blocks each define a lexer state. You can optionally name the state with the %statename processor instruction. State names can be used to transfer to a new lexer state with the yybegin() method

  1.  /*!lexphp
  2.  %statename INITIAL
  3.  blah {
  4.      $this->yybegin(self::INBLAH);
  5.      // note - $this->yybegin(2) would also work
  6.  }
  7.  */
  8.  /*!lex2php
  9.  %statename INBLAH
  10.  ANYTHING {
  11.      $this->yybegin(self::INITIAL);
  12.      // note - $this->yybegin(1) would also work
  13.  }
  14.  */

You can maintain a parser state stack simply by using yypushstate() and yypopstate() instead of yybegin():

  1.  /*!lexphp
  2.  %statename INITIAL
  3.  blah {
  4.      $this->yypushstate(self::INBLAH);
  5.  }
  6.  */
  7.  /*!lex2php
  8.  %statename INBLAH
  9.  ANYTHING {
  10.      $this->yypopstate();
  11.      // now INBLAH doesn't care where it was called from
  12.  }
  13.  */

Code blocks can choose to skip the current token and cycle to the next token by returning "false"

  1.  /*!lex2php
  2.  WHITESPACE {
  3.      return false;
  4.  }
  5.  */

If you wish to re-process the current token in a new state, simply return true. If you forget to change lexer state, this will cause an unterminated loop, so be careful!

  1.  /*!lex2php
  2.  "(" {
  3.      $this->yypushstate(self::INPARAMS);
  4.      return true;
  5.  }
  6.  */

Lastly, if you wish to cycle to the next matching rule, return any value other than true, false or null:

  1.  /*!lex2php
  2.  "{@" ALPHA {
  3.      if ($this->value == '{@internal') {
  4.          return 'more';
  5.      }
  6.      ...
  7.  }
  8.  "{@internal" {
  9.      ...
  10.  }
  11.  */

Note that this procedure is exceptionally inefficient, and it would be far better to take advantage of PHP_LexerGenerator's top-down precedence and instead code:

  1.  /*!lex2php
  2.  "{@internal" {
  3.      ...
  4.  }
  5.  "{@" ALPHA {
  6.      ...
  7.  }
  8.  */

  • Author: Gregory Beaver <cellog@php.net>
  • Version: @package_version@
  • Copyright: 2006 Gregory Beaver
  • Since: Class available since Release 0.1.0
  • Example: example not found
  • Example: example not found
  • Example: example not found
  • Example: example not found
  • Example: example not found
  • License: PHP License 3.01


[ Top ]


Class Variables

$debug =  false

[line 288]

Debug flag. When set, Parser trace information is generated.
  • Access: public

Type:   boolean


[ Top ]



Method Detail

__construct (Constructor)   [line 296]

PHP_LexerGenerator __construct( [string $lexerfile = ''], [string $outfile = ''])

Create a lexer generator and optionally generate a lexer file.

Parameters:

string   $lexerfile   —  Optional plex file {@see PHP_LexerGenerator::create}.
string   $outfile   —  Optional output file {@see PHP_LexerGenerator::create}.

[ Top ]

create   [line 310]

void create( string $lexerfile, [string $outfile = ''])

Create a lexer file from its skeleton plex file.

Parameters:

string   $lexerfile   —  Path to the plex file.
string   $outfile   —  Optional path to output file. Default is lexerfile with extension of ".php".

[ Top ]


Documentation generated on Mon, 11 Mar 2019 15:40:40 -0400 by phpDocumentor 1.4.4. PEAR Logo Copyright © PHP Group 2004.