Teuchos - Trilinos Tools Package  Version of the Day
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Public Attributes | List of all members
Teuchos::Language Struct Reference

The main class for users to define a language using TeuchosParser. More...

#include <Teuchos_Language.hpp>

Public Attributes

Tokens tokens
 vector of tokens More...
 
Productions productions
 vector of productions More...
 

Detailed Description

The main class for users to define a language using TeuchosParser.

This is the second most important user-facing class in TeuchosParser, the first being Teuchos::Reader. With Language, users can define their own language at runtime, which can then be converted into ReaderTables by make_reader_tables and used by Reader to read text streams.

TeuchosParser is heavily inspired by the lex and yacc UNIX programs, whose modern implementations are Flex and Bison. These are programs which read a custom file format and output C or C++ source files. At the time of this writing, the C++ support (particularly in Flex) leaves something to be desired, which is part of the reason for creating TeuchosParser. TeuchosParser does a subset of what Flex and Bison can do, but it accepts a Language object as input and outputs ReaderTables, which can later be used by a Reader. All of these are in-memory, pure C++ constructs. TeuchosParser supports LALR(1) grammars for the parser and basic regular expressions in the lexer.

A Language has two portions: the productions and the tokens.

The Language::productions vector encodes a Context Free Grammar as a set of Teuchos::Language::Production objects. A Context Free Grammar consists of a set of terminal symbols (referred to here as tokens), a set of nonterminal symbols, and a set of productions. The productions consist of a single symbol on the left hand side, and a string of symbols on the right hand side.

A production means that the symbol on the left hand side may be substituted for the string on the right hand side, or vice versa. The grammar also has a root nonterminal symbol. Every string acceptable by the grammar can be formed by:

  1. Start with a string containing only the root nonterminal
  2. Choose a nonterminal in the string
  3. Choose a production whose LHS is the chosen nonterminal
  4. Substitute the nonterminal with the RHS of the production
  5. Repeat 2-4 until there are only terminal symbols in the string

This is the top-down perspective of forming a string of terminals based on the choices of substitutions made. Parsing is the bottom-up equivalent, of taking a string of terminals and trying to deduce what substitutions were made to form it. One can define an Abstract Syntax Tree (AST), as a tree defined by the substitutions made. Each tree node contains a symbol, the root node contains the root nonterminal, and each non-leaf tree node must contain the LHS of a production while its children must contain the symbols in the RHS of that production. The leaf nodes of the AST must contain terminal symbols, and reading the leaf nodes of the tree from left to right spells out the string that is accepted.

Please see the Teuchos::make_lalr1_parser documentation for information on the contraints required of a grammar.

The Language::tokens vector defines each token as a regular expression. The tokens and their regular expressions together form a lexer, which can be a useful pre-processing step to turn raw text into high-level tokens for consumption by the parser. Please see the Teuchos::make_lexer, Teuchos::make_dfa, and Teuchos::Reader documentation for information about requirements and the definition of the lexer.

All symbols in Teuchos::Language are denoted by strings. There are no restrictions on what strings can be used, their contents have no special interpretation. Thus it is wise to choose symbol names which are as convenient to read as possible.

The idiomatic way to construct a Language object is as follows:

  1. Create an enumeration for your productions
  2. Create an enumeration for your tokens
  3. Define each production using the overloaded operators provided:
    lang.productions[PROD_PARENS]("expr1") >> "(", "expr2", ")";
  4. Define each token using the overloaded operator():
    lang.tokens[TOK_LPAREN]("(", "\\)");

If your language doesn't change, it can be good design to store it as a singleton LanguagePtr object.

Please see Teuchos_XML.cpp, Teuchos_YAML.cpp, and Calc.cpp for examples of

Definition at line 120 of file Teuchos_Language.hpp.

Member Data Documentation

Tokens Teuchos::Language::tokens

vector of tokens

Definition at line 128 of file Teuchos_Language.hpp.

Productions Teuchos::Language::productions

vector of productions

Definition at line 144 of file Teuchos_Language.hpp.


The documentation for this struct was generated from the following file: