Teuchos - Trilinos Tools Package
Version of the Day
|
The main class for users to define a language using TeuchosParser. More...
#include <Teuchos_Language.hpp>
Public Attributes | |
Tokens | tokens |
vector of tokens More... | |
Productions | productions |
vector of productions More... | |
The main class for users to define a language using TeuchosParser.
This is the second most important user-facing class in TeuchosParser, the first being Teuchos::Reader. With Language, users can define their own language at runtime, which can then be converted into ReaderTables by make_reader_tables and used by Reader to read text streams.
TeuchosParser is heavily inspired by the lex
and yacc
UNIX programs, whose modern implementations are Flex and Bison. These are programs which read a custom file format and output C or C++ source files. At the time of this writing, the C++ support (particularly in Flex) leaves something to be desired, which is part of the reason for creating TeuchosParser. TeuchosParser does a subset of what Flex and Bison can do, but it accepts a Language object as input and outputs ReaderTables, which can later be used by a Reader. All of these are in-memory, pure C++ constructs. TeuchosParser supports LALR(1) grammars for the parser and basic regular expressions in the lexer.
A Language has two portions: the productions and the tokens.
The Language::productions vector encodes a Context Free Grammar as a set of Teuchos::Language::Production objects. A Context Free Grammar consists of a set of terminal symbols (referred to here as tokens), a set of nonterminal symbols, and a set of productions. The productions consist of a single symbol on the left hand side, and a string of symbols on the right hand side.
A production means that the symbol on the left hand side may be substituted for the string on the right hand side, or vice versa. The grammar also has a root nonterminal symbol. Every string acceptable by the grammar can be formed by:
This is the top-down perspective of forming a string of terminals based on the choices of substitutions made. Parsing is the bottom-up equivalent, of taking a string of terminals and trying to deduce what substitutions were made to form it. One can define an Abstract Syntax Tree (AST), as a tree defined by the substitutions made. Each tree node contains a symbol, the root node contains the root nonterminal, and each non-leaf tree node must contain the LHS of a production while its children must contain the symbols in the RHS of that production. The leaf nodes of the AST must contain terminal symbols, and reading the leaf nodes of the tree from left to right spells out the string that is accepted.
Please see the Teuchos::make_lalr1_parser documentation for information on the contraints required of a grammar.
The Language::tokens vector defines each token as a regular expression. The tokens and their regular expressions together form a lexer, which can be a useful pre-processing step to turn raw text into high-level tokens for consumption by the parser. Please see the Teuchos::make_lexer, Teuchos::make_dfa, and Teuchos::Reader documentation for information about requirements and the definition of the lexer.
All symbols in Teuchos::Language are denoted by strings. There are no restrictions on what strings can be used, their contents have no special interpretation. Thus it is wise to choose symbol names which are as convenient to read as possible.
The idiomatic way to construct a Language object is as follows:
If your language doesn't change, it can be good design to store it as a singleton LanguagePtr object.
Please see Teuchos_XML.cpp, Teuchos_YAML.cpp, and Calc.cpp for examples of
Definition at line 120 of file Teuchos_Language.hpp.
Tokens Teuchos::Language::tokens |
vector of tokens
Definition at line 128 of file Teuchos_Language.hpp.
Productions Teuchos::Language::productions |
vector of productions
Definition at line 144 of file Teuchos_Language.hpp.