5 Notational Conventions
5.1 Syntactic and Lexical Grammars
5.1.1 Context-Free Grammars
A context-free grammar consists of a number of productions. Each production has an abstract symbol called a nonterminal as its left-hand side, and a sequence of zero or more nonterminal and terminal symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.
A chain production is a production that has exactly one nonterminal symbol on its right-hand side along with zero or more terminal symbols.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.
5.1.2 The Lexical and RegExp Grammars
A lexical grammar for ECMAScript is given in clause
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the stream of input elements and guide the process of automatic semicolon insertion (/*…*/ regardless of whether it spans more than one line) is likewise simply discarded if it contains no line terminator; but if a
A RegExp grammar for ECMAScript is given in
Productions of the lexical and RegExp grammars are distinguished by having two colons “::” as separating punctuation. The lexical and RegExp grammars share some productions.
5.1.3 The Numeric String Grammar
Another grammar is used for translating Strings into numeric values. This grammar is similar to the part of the lexical grammar having to do with numeric literals and has as its terminal symbols
Productions of the numeric string grammar are distinguished by having three colons “:::” as punctuation.
5.1.4 The Syntactic Grammar
The syntactic grammar for ECMAScript is given in clauses 11, 12, 13, 14, and 15. This grammar has ECMAScript tokens defined by the lexical grammar as its terminal symbols (
When a stream of code points is to be parsed as an ECMAScript
Productions of the syntactic grammar are distinguished by having just one colon “:” as punctuation.
The syntactic grammar as presented in clauses 12, 13, 14 and 15 is not a complete account of which token sequences are accepted as a correct ECMAScript
In certain cases in order to avoid ambiguities the syntactic grammar uses generalized productions that permit token sequences that do not form a valid ECMAScript
5.1.5 Grammar Notation
Terminal symbols of the lexical, RegExp, and numeric string grammars are shown in fixed width font, both in the productions of the grammars and throughout this specification whenever the text directly refers to such a terminal symbol. These are to appear in a script exactly as written. All terminal symbol code points specified in this way are to be understood as the appropriate Unicode code points from the Basic Latin range, as opposed to any similar-looking code points from other Unicode ranges.
Nonterminal symbols are shown in italic
states that the nonterminal while, followed by a left parenthesis token, followed by an
states that an
The subscripted suffix “opt”, which may appear after a terminal or nonterminal, indicates an optional symbol. The alternative containing the optional symbol actually specifies two right-hand sides, one that omits the optional element and one that includes it. This means that:
is a convenient abbreviation for:
and that:
is a convenient abbreviation for:
which in turn is an abbreviation for:
so, in this example, the nonterminal
A production may be parameterized by a subscripted annotation of the form “[parameters]”, which may appear as a suffix to the nonterminal symbol defined by the production. “parameters” may be either a single name or a comma separated list of names. A parameterized production is shorthand for a set of productions defining all combinations of the parameter names, preceded by an underscore, appended to the parameterized nonterminal symbol. This means that:
is a convenient abbreviation for:
and that:
is an abbreviation for:
Multiple parameters produce a combinatory number of productions, not all of which are necessarily referenced in a complete grammar.
References to nonterminals on the right-hand side of a production can also be parameterized. For example:
is equivalent to saying:
A nonterminal reference may have both a parameter list and an “opt” suffix. For example:
is an abbreviation for:
Prefixing a parameter name with “?” on a right-hand side nonterminal reference makes that parameter value dependent upon the occurrence of the parameter name on the reference to the current production's left-hand side symbol. For example:
is an abbreviation for:
If a right-hand side alternative is prefixed with “[+parameter]” that alternative is only available if the named parameter was used in referencing the production's nonterminal symbol. If a right-hand side alternative is prefixed with “[~parameter]” that alternative is only available if the named parameter was not used in referencing the production's nonterminal symbol. This means that:
is an abbreviation for:
and that
is an abbreviation for:
When the words “one of” follow the colon(s) in a grammar definition, they signify that each of the terminal symbols on the following line or lines is an alternative definition. For example, the lexical grammar for ECMAScript contains the production:
which is merely a convenient abbreviation for:
If the phrase “[empty]” appears as the right-hand side of a production, it indicates that the production's right-hand side contains no terminals or nonterminals.
If the phrase “[lookahead ∉ set]” appears in the right-hand side of a production, it indicates that the production may not be used if the immediately following input token sequence is a member of the given set. The set can be written as a comma separated list of one or two element terminal sequences enclosed in curly brackets. For convenience, the set can also be written as a nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand. If the set consists of a single terminal the phrase “[lookahead ≠ terminal]” may be used.
For example, given the definitions
the definition
matches either the letter n followed by one or more decimal digits the first of which is even, or a decimal digit not followed by another decimal digit.
If the phrase “[no
indicates that the production may not be used if a throw token and the
Unless the presence of a
When an alternative in a production of the lexical grammar or the numeric string grammar appears to be a multi-code point token, it represents the sequence of code points that would make up such a token.
The right-hand side of a production may specify that certain expansions are not permitted by using the phrase “but not” and then indicating the expansions to be excluded. For example, the production:
means that the nonterminal
Finally, a few nonterminal symbols are described by a descriptive phrase in sans-serif
5.2 Algorithm Conventions
The specification often uses a numbered list to specify steps in an algorithm. These algorithms are used to precisely specify the required semantics of ECMAScript language constructs. The algorithms are not intended to imply the use of any specific implementation technique. In practice, there may be more efficient algorithms available to implement a given feature.
Algorithms may be explicitly parameterized, in which case the names and usage of the parameters must be provided as part of the algorithm's definition. In order to facilitate their use in multiple parts of this specification, some algorithms, called abstract operations, are named and written in parameterized functional form so that they may be referenced by name from within other algorithms. Abstract operations are typically referenced using a functional application style such as operationName(arg1, arg2). Some abstract operations are treated as polymorphically dispatched methods of class-like specification abstractions. Such method-like abstract operations are typically referenced using a method application style such as someValue.operationName(arg1, arg2).
Calls to abstract operations return ? indicate that
The prefix ! is used to indicate that an abstract operation will never return an
- Let val be operationName().
- Assert: val is never an
abrupt completion . - If val is a
Completion Record , let val be val.[[Value]].
Algorithms may be associated with productions of one of the ECMAScript grammars. A production that has multiple alternative definitions will typically have a distinct algorithm for each alternative. When an algorithm is associated with a grammar production, it may reference the terminal and nonterminal symbols of the production alternative as if they were parameters of the algorithm. When used in this manner, nonterminal symbols refer to the actual alternative definition that is matched when parsing the source text.
When an algorithm is associated with a production alternative, the alternative is typically shown without any “[ ]” grammar annotations. Such annotations should only affect the syntactic recognition of the alternative and have no effect on the associated semantics for the alternative.
Unless explicitly specified otherwise, all chain productions have an implicit definition for every algorithm that might be applied to that production's left-hand side nonterminal. The implicit definition simply reapplies the same algorithm name with the same parameters, if any, to the
but there is no corresponding Evaluation algorithm that is explicitly specified for that production. If in some algorithm there is a statement of the form: “Return the result of evaluating
Runtime Semantics: Evaluation
- Return the result of evaluating
StatementList .
For clarity of expression, algorithm steps may be subdivided into sequential substeps. Substeps are indented and may themselves be further divided into indented substeps. Outline numbering conventions are used to identify substeps with the first level of substeps labelled with lower case alphabetic characters and the second level of substeps labelled with lower case roman numerals. If more than three levels are required these rules repeat with the fourth level using numeric labels. For example:
- Top-level step
- Substep.
- Substep.
- Subsubstep.
- Subsubsubstep
- Subsubsubsubstep
- Subsubsubsubsubstep
- Subsubsubsubstep
- Subsubsubstep
- Subsubstep.
A step or substep may be written as an “if” predicate that conditions its substeps. In this case, the substeps are only applied if the predicate is true. If a step or substep begins with the word “else”, it is a predicate that is the negation of the preceding “if” predicate step at the same level.
A step may specify the iterative application of its substeps.
A step that begins with “Assert:” asserts an invariant condition of its algorithm. Such assertions are used to make explicit algorithmic invariants that would otherwise be implicit. Such assertions add no additional semantic requirements and hence need not be checked by an implementation. They are used simply to clarify algorithms.
Mathematical operations such as addition, subtraction, negation, multiplication, division, and the mathematical functions defined later in this clause should always be understood as computing exact mathematical results on mathematical real numbers, which unless otherwise noted do not include infinities and do not include a negative zero that is distinguished from positive zero. Algorithms in this standard that model floating-point arithmetic include explicit steps, where necessary, to handle infinities and signed zero and to perform rounding. If a mathematical operation or function is applied to a floating-point number, it should be understood as being applied to the exact mathematical value represented by that floating-point number; such a floating-point number must be finite, and if it is
The mathematical function
The mathematical function
The notation “
The mathematical function
5.3 Static Semantic Rules
Context-free grammars are not sufficiently powerful to express all the rules that define whether a stream of input elements form a valid ECMAScript
Static Semantic Rules have names and typically are defined using an algorithm. Named Static Semantic Rules are associated with grammar productions and a production that has multiple alternative definitions will typically have for each alternative a distinct algorithm for each applicable named static semantic rule.
Unless otherwise specified every grammar production alternative in this specification implicitly has a definition for a static semantic rule named Contains which takes an argument named symbol whose value is a terminal or nonterminal of the grammar that includes the associated production. The default definition of Contains is:
- For each terminal and nonterminal grammar symbol, sym, in the definition of this production do
- If sym is the same grammar symbol as symbol, return
true . - If sym is a nonterminal, then
- Let contained be the result of sym Contains symbol.
- If contained is
true , returntrue .
- If sym is the same grammar symbol as symbol, return
- Return
false .
The above definition is explicitly over-ridden for specific productions.
A special kind of static semantic rule is an Early Error Rule.