Module parser

Parser generator.

A parser is created by

p = Parser {grammar}

and called with

result = p:parse (start_token, token_list[, from])

where start_token is the non-terminal at which to start parsing in the grammar, token_list is a list of tokens of the form

{ty = "token_type", tok = "token_text"}

and from is the token in the list from which to start (the default value is 1).

The output of the parser is a tree, each of whose nodes is of the form:

{ty = symbol, node1 = tree1, node2 = tree2, ... [, list]}

where each nodei is a symbolic name, and list is the list of trees returned if the corresponding token was a list token.

A grammar is a table of rules of the form

non-terminal = {production1, production2, ...}

plus a special item

lexemes = Set {"class1", "class2", ...}

Each production gives a form that a non-terminal may take. A production has the form

production = {"token1", "token2", ..., [action][,abstract]}

A production

  • must not start with the non-terminal being defined (it must not be left-recursive)
  • must not be a prefix of a later production in the same non-terminal

Each token may be

  • a non-terminal, i.e. a token defined by the grammar
    • an optional symbol is indicated by the suffix _opt
    • a list is indicated by the suffix _list, and may be followed by _≤separator-symbol> (default is no separator)
  • a lexeme class
  • a string to match literally

The parse tree for a literal string or lexeme class is the string that was matched. The parse tree for a non-terminal is a table of the form

{ty = "non_terminal_name", tree1, tree2, ...}

where the treei are the parse trees for the corresponding terminals and non-terminals.

An action is of the form

action = function (tree, token, pos) ... return tree_ end

It is passed the parse tree for the current node, the token list, and the current position in the token list, and returns a new parse tree.

An abstract syntax rule is of the form

name = {i1, i2, ...}

where i1, i2, ... are numbers. This results in a parse tree of the form

{ty = "name"; treei1, treei2, ...}

If a production has no abstract syntax rule, the result is the parse node for the current node.

FIXME: Give lexemes as an extra argument to Parser?
FIXME: Rename second argument to parse method to "tokens"?
FIXME: Make start_token an optional argument to parse? (swap with token list) and have it default to the first non-terminal?

Functions

Parser:_clone (grammar) Parser constructor
Parser:parse (start, token, from) Parse a token list.


Functions

Parser:_clone (grammar)
Parser constructor

Parameters

  • grammar: parser grammar

Return value:

parser
Parser:parse (start, token, from)
Parse a token list.

Parameters

  • start: the token at which to start
  • token: the list of tokens
  • from: the index of the token to start from (default: 1)

Return value:

parse tree

Valid XHTML 1.0!