Module parser
Parser generator.
A parser is created by
p = Parser {grammar}
and called with
result = p:parse (start_token, token_list[, from])
where start_token is the non-terminal at which to start parsing in the grammar, token_list is a list of tokens of the form
{ty = "token_type", tok = "token_text"}
and from is the token in the list from which to start (the default value is 1).
The output of the parser is a tree, each of whose nodes is of the form:
{ty = symbol, node1 = tree1, node2 = tree2, ... [, list]}
where each nodei
is a symbolic name, and list is the list of trees returned if the corresponding token was a list token.
A grammar is a table of rules of the form
non-terminal = {production1, production2, ...}
plus a special item
lexemes = Set {"class1", "class2", ...}
Each production gives a form that a non-terminal may take. A production has the form
production = {"token1", "token2", ..., [action][,abstract]}
A production
- must not start with the non-terminal being defined (it must not be left-recursive)
- must not be a prefix of a later production in the same non-terminal
Each token may be
- a non-terminal, i.e. a token defined by the grammar
- an optional symbol is indicated by the suffix
_opt
- a list is indicated by the suffix
_list
, and may be followed by_≤separator-symbol>
(default is no separator) - a lexeme class
- a string to match literally
The parse tree for a literal string or lexeme class is the string that was matched. The parse tree for a non-terminal is a table of the form
{ty = "non_terminal_name", tree1, tree2, ...}
where the treei
are the parse trees for the corresponding terminals and non-terminals.
An action is of the form
action = function (tree, token, pos) ... return tree_ end
It is passed the parse tree for the current node, the token list, and the current position in the token list, and returns a new parse tree.
An abstract syntax rule is of the form
name = {i1, i2, ...}
where i1
, i2
, ... are numbers. This results in a parse tree of the form
{ty = "name"; treei1, treei2, ...}
If a production has no abstract syntax rule, the result is the parse node for the current node.
FIXME: Give lexemes as an extra argument to Parser
?
FIXME: Rename second argument to parse method to "tokens"?
FIXME: Make start_token an optional argument to parse? (swap with token list) and have it default to the first non-terminal?
Functions
Parser:_clone (grammar) | Parser constructor |
Parser:parse (start, token, from) | Parse a token list. |
Functions
- Parser:_clone (grammar)
-
Parser constructor
Parameters
- grammar: parser grammar
Return value:
parser - Parser:parse (start, token, from)
-
Parse a token list.
Parameters
- start: the token at which to start
- token: the list of tokens
- from: the index of the token to start from (default: 1)
Return value:
parse tree