11 ECMAScript Language: Lexical Grammar
The source text of an ECMAScript
There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar. The
The use of multiple lexical goals ensures that there are no lexical ambiguities that would affect automatic semicolon insertion. For example, there are no syntactic grammar contexts where both a leading division or division-assignment, and a leading
a = b
/hi/g.exec(c).map(d);
where the first non-whitespace, non-comment code point after a
a = b / hi / g.exec(c).map(d);
Syntax
11.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.
U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text these code points may also be used in an
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see
The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in
| Code Point | Name | Abbreviation | Usage |
|---|---|---|---|
U+200C
|
ZERO WIDTH NON-JOINER | <ZWNJ> |
|
U+200D
|
ZERO WIDTH JOINER | <ZWJ> |
|
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE | <ZWNBSP> |
|
11.2 White Space
White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a
The ECMAScript white space code points are listed in
| Code Point | Name | Abbreviation |
|---|---|---|
U+0009
|
CHARACTER TABULATION | <TAB> |
U+000B
|
LINE TABULATION | <VT> |
U+000C
|
FORM FEED (FF) | <FF> |
U+0020
|
SPACE | <SP> |
U+00A0
|
NO-BREAK SPACE | <NBSP> |
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE | <ZWNBSP> |
| Other category “Zs” | Any other Unicode “Space_Separator” code point | <USP> |
ECMAScript implementations must recognize as
Other than for the code points listed in
Syntax
11.3 Line Terminators
Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (
A line terminator can occur within a
Line terminators are included in the set of white space code points that are matched by the \s class in regular expressions.
The ECMAScript line terminator code points are listed in
| Code Point | Unicode Name | Abbreviation |
|---|---|---|
U+000A
|
LINE FEED (LF) | <LF> |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
U+2028
|
LINE SEPARATOR | <LS> |
U+2029
|
PARAGRAPH SEPARATOR | <PS> |
Only the Unicode code points in
Syntax
11.4 Comments
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any Unicode code point except a // marker to the end of the line. However, the
Comments behave like white space and are discarded except that, if a
Syntax
11.5 Tokens
Syntax
The
11.6 Names and Keywords
This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an
Unicode escape sequences are permitted in an \ preceding the u and { } code units, if they appear, do not contribute code points to the \
Two
Syntax
The definitions of the nonterminal
The sets of code points with Unicode properties “ID_Start” and “ID_Continue” include, respectively, the code points with Unicode properties “Other_ID_Start” and “Other_ID_Continue”.
11.6.1 Identifier Names
11.6.1.1 Static Semantics: Early Errors
-
It is a Syntax Error if SV(
UnicodeEscapeSequence ) is none of"$", or"_", or theUTF16Encoding of a code point matched by theUnicodeIDStart lexical grammar production.
-
It is a Syntax Error if SV(
UnicodeEscapeSequence ) is none of"$", or"_", or theUTF16Encoding of either <ZWNJ> or <ZWJ>, or theUTF16Encoding of a Unicode code point that would be matched by theUnicodeIDContinue lexical grammar production.
11.6.1.2 Static Semantics: StringValue
- Return the String value consisting of the sequence of code units corresponding to
IdentifierName . In determining the sequence any occurrences of\UnicodeEscapeSequence are first replaced with the code point represented by theUnicodeEscapeSequence and then the code points of the entireIdentifierName are converted to code units byUTF16Encoding each code point.
11.6.2 Reserved Words
A reserved word is an
Syntax
The \
11.6.2.1 Keywords
The following tokens are ECMAScript keywords and may not be used as
Syntax
11.6.2.2 Future Reserved Words
The following tokens are reserved for use as keywords in future language extensions.
Syntax
Use of the following tokens within
implements
|
package
|
protected
|
|
interface
|
private
|
public
|
11.7 Punctuators
Syntax
11.8 Literals
11.8.1 Null Literals
Syntax
11.8.2 Boolean Literals
Syntax
11.8.3 Numeric Literals
Syntax
The
For example: 3in is an error and not the two input elements 3 and in.
A conforming implementation, when processing
11.8.3.1 Static Semantics: MV
A numeric literal stands for a value of the Number
-
The MV of
is the MV ofNumericLiteral :: DecimalLiteral DecimalLiteral . -
The MV of
is the MV ofNumericLiteral :: BinaryIntegerLiteral BinaryIntegerLiteral . -
The MV of
is the MV ofNumericLiteral :: OctalIntegerLiteral OctalIntegerLiteral . -
The MV of
is the MV ofNumericLiteral :: HexIntegerLiteral HexIntegerLiteral . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral . DecimalIntegerLiteral . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral . DecimalDigits DecimalIntegerLiteral plus (the MV ofDecimalDigits × 10-n), where n is the number of code points inDecimalDigits . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral . ExponentPart DecimalIntegerLiteral × 10e, where e is the MV ofExponentPart . -
The MV of
is (the MV ofDecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart DecimalIntegerLiteral plus (the MV ofDecimalDigits × 10-n)) × 10e, where n is the number of code points inDecimalDigits and e is the MV ofExponentPart . -
The MV of
is the MV ofDecimalLiteral :: . DecimalDigits DecimalDigits × 10-n, where n is the number of code points inDecimalDigits . -
The MV of
is the MV ofDecimalLiteral :: . DecimalDigits ExponentPart DecimalDigits × 10e-n, where n is the number of code points inDecimalDigits and e is the MV ofExponentPart . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral DecimalIntegerLiteral . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral ExponentPart DecimalIntegerLiteral × 10e, where e is the MV ofExponentPart . -
The MV of
is 0.DecimalIntegerLiteral :: 0 -
The MV of
is the MV ofDecimalIntegerLiteral :: NonZeroDigit NonZeroDigit . -
The MV of
is (the MV ofDecimalIntegerLiteral :: NonZeroDigit DecimalDigits NonZeroDigit × 10n) plus the MV ofDecimalDigits , where n is the number of code points inDecimalDigits . -
The MV of
is the MV ofDecimalDigits :: DecimalDigit DecimalDigit . -
The MV of
is (the MV ofDecimalDigits :: DecimalDigits DecimalDigit DecimalDigits × 10) plus the MV ofDecimalDigit . -
The MV of
is the MV ofExponentPart :: ExponentIndicator SignedInteger SignedInteger . -
The MV of
is the MV ofSignedInteger :: DecimalDigits DecimalDigits . -
The MV of
is the MV ofSignedInteger :: + DecimalDigits DecimalDigits . -
The MV of
is the negative of the MV ofSignedInteger :: - DecimalDigits DecimalDigits . -
The MV of
or ofDecimalDigit :: 0 or ofHexDigit :: 0 or ofOctalDigit :: 0 is 0.BinaryDigit :: 0 -
The MV of
or ofDecimalDigit :: 1 or ofNonZeroDigit :: 1 or ofHexDigit :: 1 or ofOctalDigit :: 1 is 1.BinaryDigit :: 1 -
The MV of
or ofDecimalDigit :: 2 or ofNonZeroDigit :: 2 or ofHexDigit :: 2 is 2.OctalDigit :: 2 -
The MV of
or ofDecimalDigit :: 3 or ofNonZeroDigit :: 3 or ofHexDigit :: 3 is 3.OctalDigit :: 3 -
The MV of
or ofDecimalDigit :: 4 or ofNonZeroDigit :: 4 or ofHexDigit :: 4 is 4.OctalDigit :: 4 -
The MV of
or ofDecimalDigit :: 5 or ofNonZeroDigit :: 5 or ofHexDigit :: 5 is 5.OctalDigit :: 5 -
The MV of
or ofDecimalDigit :: 6 or ofNonZeroDigit :: 6 or ofHexDigit :: 6 is 6.OctalDigit :: 6 -
The MV of
or ofDecimalDigit :: 7 or ofNonZeroDigit :: 7 or ofHexDigit :: 7 is 7.OctalDigit :: 7 -
The MV of
or ofDecimalDigit :: 8 or ofNonZeroDigit :: 8 is 8.HexDigit :: 8 -
The MV of
or ofDecimalDigit :: 9 or ofNonZeroDigit :: 9 is 9.HexDigit :: 9 -
The MV of
or ofHexDigit :: a is 10.HexDigit :: A -
The MV of
or ofHexDigit :: b is 11.HexDigit :: B -
The MV of
or ofHexDigit :: c is 12.HexDigit :: C -
The MV of
or ofHexDigit :: d is 13.HexDigit :: D -
The MV of
or ofHexDigit :: e is 14.HexDigit :: E -
The MV of
or ofHexDigit :: f is 15.HexDigit :: F -
The MV of
is the MV ofBinaryIntegerLiteral :: 0b BinaryDigits BinaryDigits . -
The MV of
is the MV ofBinaryIntegerLiteral :: 0B BinaryDigits BinaryDigits . -
The MV of
is the MV ofBinaryDigits :: BinaryDigit BinaryDigit . -
The MV of
is (the MV ofBinaryDigits :: BinaryDigits BinaryDigit BinaryDigits × 2) plus the MV ofBinaryDigit . -
The MV of
is the MV ofOctalIntegerLiteral :: 0o OctalDigits OctalDigits . -
The MV of
is the MV ofOctalIntegerLiteral :: 0O OctalDigits OctalDigits . -
The MV of
is the MV ofOctalDigits :: OctalDigit OctalDigit . -
The MV of
is (the MV ofOctalDigits :: OctalDigits OctalDigit OctalDigits × 8) plus the MV ofOctalDigit . -
The MV of
is the MV ofHexIntegerLiteral :: 0x HexDigits HexDigits . -
The MV of
is the MV ofHexIntegerLiteral :: 0X HexDigits HexDigits . -
The MV of
is the MV ofHexDigits :: HexDigit HexDigit . -
The MV of
is (the MV ofHexDigits :: HexDigits HexDigit HexDigits × 16) plus the MV ofHexDigit .
Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number 0 digit or the Number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th significant digit position. A digit is significant if it is not part of an
-
it is not
0; or -
there is a nonzero digit to its left and there is a nonzero digit, not in the
ExponentPart , to its right.
11.8.4 String Literals
A string literal is zero or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), U+2028 (LINE SEPARATOR), U+2029 (PARAGRAPH SEPARATOR), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in
Syntax
A conforming implementation, when processing
The definition of the nonterminal
A line terminator code point cannot appear in a string literal, except as part of a \n or \u000A.
11.8.4.1 Static Semantics: Early Errors
-
It is a Syntax Error if the MV of
HexDigits > 0x10FFFF.
11.8.4.2 Static Semantics: StringValue
- Return the String value whose elements are the SV of this
StringLiteral .
11.8.4.3 Static Semantics: SV
A string literal stands for a value of the String
-
The SV of
is the empty code unit sequence.StringLiteral :: " " -
The SV of
is the empty code unit sequence.StringLiteral :: ' ' -
The SV of
is the SV ofStringLiteral :: " DoubleStringCharacters " DoubleStringCharacters . -
The SV of
is the SV ofStringLiteral :: ' SingleStringCharacters ' SingleStringCharacters . -
The SV of
is a sequence of one or two code units that is the SV ofDoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacter . -
The SV of
is a sequence of one or two code units that is the SV ofDoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters DoubleStringCharacter followed by all the code units in the SV ofDoubleStringCharacters in order. -
The SV of
is a sequence of one or two code units that is the SV ofSingleStringCharacters :: SingleStringCharacter SingleStringCharacter . -
The SV of
is a sequence of one or two code units that is the SV ofSingleStringCharacters :: SingleStringCharacter SingleStringCharacters SingleStringCharacter followed by all the code units in the SV ofSingleStringCharacters in order. -
The SV of
is theDoubleStringCharacter :: SourceCharacter but notone of " or\ orLineTerminator UTF16Encoding of the code point value ofSourceCharacter . -
The SV of
is the SV of theDoubleStringCharacter :: \ EscapeSequence EscapeSequence . -
The SV of
is the empty code unit sequence.DoubleStringCharacter :: LineContinuation -
The SV of
is theSingleStringCharacter :: SourceCharacter but notone of ' or\ orLineTerminator UTF16Encoding of the code point value ofSourceCharacter . -
The SV of
is the SV of theSingleStringCharacter :: \ EscapeSequence EscapeSequence . -
The SV of
is the empty code unit sequence.SingleStringCharacter :: LineContinuation -
The SV of
is the SV of theEscapeSequence :: CharacterEscapeSequence CharacterEscapeSequence . -
The SV of
is the code unit value 0.EscapeSequence :: 0 -
The SV of
is the SV of theEscapeSequence :: HexEscapeSequence HexEscapeSequence . -
The SV of
is the SV of theEscapeSequence :: UnicodeEscapeSequence UnicodeEscapeSequence . -
The SV of
is the code unit whose value is determined by theCharacterEscapeSequence :: SingleEscapeCharacter SingleEscapeCharacter according toTable 34 .
| Escape Sequence | Code Unit Value | Unicode Character Name | Symbol |
|---|---|---|---|
\b
|
0x0008
|
BACKSPACE | <BS> |
\t
|
0x0009
|
CHARACTER TABULATION | <HT> |
\n
|
0x000A
|
LINE FEED (LF) | <LF> |
\v
|
0x000B
|
LINE TABULATION | <VT> |
\f
|
0x000C
|
FORM FEED (FF) | <FF> |
\r
|
0x000D
|
CARRIAGE RETURN (CR) | <CR> |
\"
|
0x0022
|
QUOTATION MARK |
"
|
\'
|
0x0027
|
APOSTROPHE |
'
|
\\
|
0x005C
|
REVERSE SOLIDUS |
\
|
-
The SV of
is the SV of theCharacterEscapeSequence :: NonEscapeCharacter NonEscapeCharacter . -
The SV of
is theNonEscapeCharacter :: SourceCharacter but notone of EscapeCharacter orLineTerminator UTF16Encoding of the code point value ofSourceCharacter . -
The SV of
is the code unit value that is (16 times the MV of the firstHexEscapeSequence :: x HexDigit HexDigit HexDigit ) plus the MV of the secondHexDigit . -
The SV of
is the SV ofUnicodeEscapeSequence :: u Hex4Digits Hex4Digits . -
The SV of
is the code unit value that is (0x1000 times the MV of the firstHex4Digits :: HexDigit HexDigit HexDigit HexDigit HexDigit ) plus (0x100 times the MV of the secondHexDigit ) plus (0x10 times the MV of the thirdHexDigit ) plus the MV of the fourthHexDigit . -
The SV of
is theUnicodeEscapeSequence :: u{ HexDigits } UTF16Encoding of the MV ofHexDigits .
11.8.5 Regular Expression Literals
A regular expression literal is an input element that is converted to a RegExp object (see === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp or calling the RegExp constructor as a function (see
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the
An implementation may extend the ECMAScript Regular Expression grammar defined in
Syntax
Regular expression literals may not be empty; instead of representing an empty regular expression literal, the code unit sequence // starts a single-line comment. To specify an empty regular expression, use: /(?:)/.
11.8.5.1 Static Semantics: Early Errors
-
It is a Syntax Error if
IdentifierPart contains a Unicode escape sequence.
11.8.5.2 Static Semantics: BodyText
- Return the source text that was recognized as
RegularExpressionBody .
11.8.5.3 Static Semantics: FlagText
- Return the source text that was recognized as
RegularExpressionFlags .
11.8.6 Template Literal Lexical Components
Syntax
A conforming implementation must not use the extended definition of
11.8.6.1 Static Semantics: TV and TRV
A template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV,
-
The TV and TRV of
is the empty code unit sequence.NoSubstitutionTemplate :: ` ` -
The TV and TRV of
is the empty code unit sequence.TemplateHead :: ` ${ -
The TV and TRV of
is the empty code unit sequence.TemplateMiddle :: } ${ -
The TV and TRV of
is the empty code unit sequence.TemplateTail :: } ` -
The TV of
is the TV ofNoSubstitutionTemplate :: ` TemplateCharacters ` TemplateCharacters . -
The TV of
is the TV ofTemplateHead :: ` TemplateCharacters ${ TemplateCharacters . -
The TV of
is the TV ofTemplateMiddle :: } TemplateCharacters ${ TemplateCharacters . -
The TV of
is the TV ofTemplateTail :: } TemplateCharacters ` TemplateCharacters . -
The TV of
is the TV ofTemplateCharacters :: TemplateCharacter TemplateCharacter . -
The TV of
is a sequence consisting of the code units in the TV ofTemplateCharacters :: TemplateCharacter TemplateCharacters TemplateCharacter followed by all the code units in the TV ofTemplateCharacters in order. -
The TV of
is theTemplateCharacter :: SourceCharacter but notone of ` or\ or$ orLineTerminator UTF16Encoding of the code point value ofSourceCharacter . -
The TV of
is the code unit value 0x0024.TemplateCharacter :: $ -
The TV of
is the SV ofTemplateCharacter :: \ EscapeSequence EscapeSequence . -
The TV of
is the TV ofTemplateCharacter :: LineContinuation LineContinuation . -
The TV of
is the TRV ofTemplateCharacter :: LineTerminatorSequence LineTerminatorSequence . -
The TV of
is the empty code unit sequence.LineContinuation :: \ LineTerminatorSequence -
The TRV of
is the TRV ofNoSubstitutionTemplate :: ` TemplateCharacters ` TemplateCharacters . -
The TRV of
is the TRV ofTemplateHead :: ` TemplateCharacters ${ TemplateCharacters . -
The TRV of
is the TRV ofTemplateMiddle :: } TemplateCharacters ${ TemplateCharacters . -
The TRV of
is the TRV ofTemplateTail :: } TemplateCharacters ` TemplateCharacters . -
The TRV of
is the TRV ofTemplateCharacters :: TemplateCharacter TemplateCharacter . -
The TRV of
is a sequence consisting of the code units in the TRV ofTemplateCharacters :: TemplateCharacter TemplateCharacters TemplateCharacter followed by all the code units in the TRV ofTemplateCharacters , in order. -
The TRV of
is theTemplateCharacter :: SourceCharacter but notone of ` or\ or$ orLineTerminator UTF16Encoding of the code point value ofSourceCharacter . -
The TRV of
is the code unit value 0x0024.TemplateCharacter :: $ -
The TRV of
is the sequence consisting of the code unit value 0x005C followed by the code units of TRV ofTemplateCharacter :: \ EscapeSequence EscapeSequence . -
The TRV of
is the TRV ofTemplateCharacter :: LineContinuation LineContinuation . -
The TRV of
is the TRV ofTemplateCharacter :: LineTerminatorSequence LineTerminatorSequence . -
The TRV of
is the TRV of theEscapeSequence :: CharacterEscapeSequence CharacterEscapeSequence . -
The TRV of
is the code unit value 0x0030 (DIGIT ZERO).EscapeSequence :: 0 -
The TRV of
is the TRV of theEscapeSequence :: HexEscapeSequence HexEscapeSequence . -
The TRV of
is the TRV of theEscapeSequence :: UnicodeEscapeSequence UnicodeEscapeSequence . -
The TRV of
is the TRV of theCharacterEscapeSequence :: SingleEscapeCharacter SingleEscapeCharacter . -
The TRV of
is the SV of theCharacterEscapeSequence :: NonEscapeCharacter NonEscapeCharacter . -
The TRV of
is the SV of theSingleEscapeCharacter :: one of ' " \ b f n r t v SourceCharacter that is that single code point. -
The TRV of
is the sequence consisting of code unit value 0x0078 followed by TRV of the firstHexEscapeSequence :: x HexDigit HexDigit HexDigit followed by the TRV of the secondHexDigit . -
The TRV of
is the sequence consisting of code unit value 0x0075 followed by TRV ofUnicodeEscapeSequence :: u Hex4Digits Hex4Digits . -
The TRV of
is the sequence consisting of code unit value 0x0075 followed by code unit value 0x007B followed by TRV ofUnicodeEscapeSequence :: u{ HexDigits } HexDigits followed by code unit value 0x007D. -
The TRV of
is the sequence consisting of the TRV of the firstHex4Digits :: HexDigit HexDigit HexDigit HexDigit HexDigit followed by the TRV of the secondHexDigit followed by the TRV of the thirdHexDigit followed by the TRV of the fourthHexDigit . -
The TRV of
is the TRV ofHexDigits :: HexDigit HexDigit . -
The TRV of
is the sequence consisting of TRV ofHexDigits :: HexDigits HexDigit HexDigits followed by TRV ofHexDigit . -
The TRV of a
HexDigit is the SV of theSourceCharacter that is thatHexDigit . -
The TRV of
is the sequence consisting of the code unit value 0x005C followed by the code units of TRV ofLineContinuation :: \ LineTerminatorSequence LineTerminatorSequence . -
The TRV of
is the code unit value 0x000A.LineTerminatorSequence :: <LF > -
The TRV of
is the code unit value 0x000A.LineTerminatorSequence :: <CR > -
The TRV of
is the code unit value 0x2028.LineTerminatorSequence :: <LS > -
The TRV of
is the code unit value 0x2029.LineTerminatorSequence :: <PS > -
The TRV of
is the sequence consisting of the code unit value 0x000A.LineTerminatorSequence :: <CR ><LF >
TV excludes the code units of
11.9 Automatic Semicolon Insertion
Most ECMAScript statements and declarations must be terminated with a semicolon. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.
11.9.1 Rules of Automatic Semicolon Insertion
In the following rules, “token” means the actual recognized lexical token determined using the current lexical
There are three basic rules of semicolon insertion:
-
When, as the source text is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
-
The offending token is separated from the previous token by at least one
LineTerminator . -
The offending token is
}. -
The previous token is
)and the inserted semicolon would then be parsed as the terminating semicolon of a do-while statement (13.7.2 ).
-
The offending token is separated from the previous token by at least one
- When, as the source text is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single instance of the goal nonterminal, then a semicolon is automatically inserted at the end of the input stream.
-
When, as the source text is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation “[no
LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least oneLineTerminator , then a semicolon is automatically inserted before the restricted token.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see
The following are the only restricted productions in the grammar:
The practical effect of these restricted productions is as follows:
-
When a
++or--token is encountered where the parser would treat it as a postfix operator, and at least oneLineTerminator occurred between the preceding token and the++or--token, then a semicolon is automatically inserted before the++or--token. -
When a
continue,break,return,throw, oryieldtoken is encountered and aLineTerminator is encountered before the next token, a semicolon is automatically inserted after thecontinue,break,return,throw, oryieldtoken.
The resulting practical advice to ECMAScript programmers is:
-
A postfix
++or--operator should appear on the same line as its operand. -
An
Expression in areturnorthrowstatement or anAssignmentExpression in ayieldexpression should start on the same line as thereturn,throw, oryieldtoken. -
A
LabelIdentifier in abreakorcontinuestatement should be on the same line as thebreakorcontinuetoken.
11.9.2 Examples of Automatic Semicolon Insertion
The source
{ 1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source
{ 1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{ 1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for (a; b
)
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the semicolon is needed for the header of a for statement. Automatic semicolon insertion never inserts one of the two semicolons in the header of a for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
The expression a + b is not treated as a value to be returned by the return statement, because a return.
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
The token ++ is not treated as a postfix operator applying to the variable b, because a b and ++.
The source
if (a > b)
else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would then be parsed as an empty statement.
The source
a = b + c
(d + e).print()
is not transformed by automatic semicolon insertion, because the parenthesized expression that begins the second line can be interpreted as an argument list for a function call:
a = b + c(d + e).print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.