12 ECMAScript Language: Lexical Grammar
The source text of an ECMAScript
There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple
The use of multiple lexical goals ensures that there are no lexical ambiguities that would affect automatic semicolon insertion. For example, there are no syntactic grammar contexts where both a leading division or division-assignment, and a leading
a = b
/hi/g.exec(c).map(d);
where the first non-whitespace, non-comment code point after a
a = b / hi / g.exec(c).map(d);
Syntax
12.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals, template literals, and regular expression literals.
U+200C (ZERO WIDTH NON-JOINER) and U+200D (ZERO WIDTH JOINER) are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text these code points may also be used in an
U+FEFF (ZERO WIDTH NO-BREAK SPACE) is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <ZWNBSP> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. In ECMAScript source text <ZWNBSP> code points are treated as white space characters (see
The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in
| Code Point | Name | Abbreviation | Usage |
|---|---|---|---|
U+200C
|
ZERO WIDTH NON-JOINER | <ZWNJ> |
|
U+200D
|
ZERO WIDTH JOINER | <ZWJ> |
|
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE | <ZWNBSP> |
|
12.2 White Space
White space code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space code points may occur between any two tokens and at the start or end of input. White space code points may occur within a
The ECMAScript white space code points are listed in
| Code Point | Name | Abbreviation |
|---|---|---|
U+0009
|
CHARACTER TABULATION | <TAB> |
U+000B
|
LINE TABULATION | <VT> |
U+000C
|
FORM FEED (FF) | <FF> |
U+FEFF
|
ZERO WIDTH NO-BREAK SPACE | <ZWNBSP> |
| Category “Zs” | Any Unicode “Space_Separator” code point | <USP> |
U+0020 (SPACE) and U+00A0 (NO-BREAK SPACE) code points are part of <USP>.
Other than for the code points listed in
Syntax
12.3 Line Terminators
Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (
A line terminator can occur within a
Line terminators are included in the set of white space code points that are matched by the \s class in regular expressions.
The ECMAScript line terminator code points are listed in
| Code Point | Unicode Name | Abbreviation |
|---|---|---|
U+000A
|
LINE FEED (LF) | <LF> |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
U+2028
|
LINE SEPARATOR | <LS> |
U+2029
|
PARAGRAPH SEPARATOR | <PS> |
Only the Unicode code points in
Syntax
12.4 Comments
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any Unicode code point except a // marker to the end of the line. However, the
Comments behave like white space and are discarded except that, if a
Syntax
A number of productions in this section are given alternative definitions in section
12.5 Tokens
Syntax
The
12.6 Names and Keywords
This standard specifies specific code point additions: U+0024 (DOLLAR SIGN) and U+005F (LOW LINE) are permitted anywhere in an
Syntax
The definitions of the nonterminal
The nonterminal _ via
The sets of code points with Unicode properties “ID_Start” and “ID_Continue” include, respectively, the code points with Unicode properties “Other_ID_Start” and “Other_ID_Continue”.
12.6.1 Identifier Names
Unicode escape sequences are permitted in an \ preceding the u and { } code units, if they appear, do not contribute code points to the \
Two
12.6.1.1 Static Semantics: Early Errors
-
It is a Syntax Error if
IdentifierCodePoint ofUnicodeEscapeSequence is not some Unicode code point matched by theIdentifierStartChar lexical grammar production.
-
It is a Syntax Error if
IdentifierCodePoint ofUnicodeEscapeSequence is not some Unicode code point matched by theIdentifierPartChar lexical grammar production.
12.6.1.2 Static Semantics: IdentifierCodePoints
The
- Let cp be
IdentifierCodePoint ofIdentifierStart . - Return « cp ».
- Let cps be IdentifierCodePoints of the derived
IdentifierName . - Let cp be
IdentifierCodePoint ofIdentifierPart . - Return the
list-concatenation of cps and « cp ».
12.6.1.3 Static Semantics: IdentifierCodePoint
The
- Return the code point matched by
IdentifierStartChar .
- Return the code point matched by
IdentifierPartChar .
- Return the code point whose numeric value is the MV of
Hex4Digits .
- Return the code point whose numeric value is the MV of
CodePoint .
12.6.2 Keywords and Reserved Words
A keyword is a token that matches fixed width font, in some syntactic production. The keywords of ECMAScript include if, while, async, await, and many others.
A reserved word is an if and while are reserved words. await is reserved only inside async functions and modules. async is not reserved; it can be used as a variable name or statement label without restriction.
This specification uses a combination of grammatical productions and await and yield, are unconditionally reserved. Exceptions for await and yield are specified in
-
Those that are always allowed as identifiers, and are not keywords, such as
Math,window,toString, and_; -
Those that are never allowed as identifiers, namely the
ReservedWord s listed below exceptawaitandyield; -
Those that are contextually allowed as identifiers, namely
awaitandyield; -
Those that are contextually disallowed as identifiers, in
strict mode code :let,static,implements,interface,package,private,protected, andpublic; -
Those that are always allowed as identifiers, but also appear as keywords within certain syntactic productions, at places where
Identifier is not allowed:as,async,from,get,meta,of,set, andtarget.
The term conditional keyword, or contextual keyword, is sometimes used to refer to the keywords that fall in the last three categories, and thus can be used as identifiers in some contexts and as keywords in others.
Syntax
Per \
An \ els\u{65}. The
enum is not currently used as a keyword in this specification. It is a future reserved word, set aside for use as a keyword in future language extensions.
Similarly, implements, interface, package, private, protected, and public are future reserved words in
12.7 Punctuators
Syntax
12.8 Literals
12.8.1 Null Literals
Syntax
12.8.2 Boolean Literals
Syntax
12.8.3 Numeric Literals
Syntax
The
For example: 3in is an error and not the two input elements 3 and in.
12.8.3.1 Static Semantics: Early Errors
- It is a Syntax Error if the
source text matched by this production isstrict mode code .
12.8.3.2 Static Semantics: MV
A numeric literal stands for a value of the Number type or the BigInt type.
-
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral . DecimalDigits DecimalIntegerLiteral plus (the MV ofDecimalDigits × 10-n), where n is the number of code points inDecimalDigits , excluding all occurrences ofNumericLiteralSeparator . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral . ExponentPart DecimalIntegerLiteral × 10e, where e is the MV ofExponentPart . -
The MV of
is (the MV ofDecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart DecimalIntegerLiteral plus (the MV ofDecimalDigits × 10-n)) × 10e, where n is the number of code points inDecimalDigits , excluding all occurrences ofNumericLiteralSeparator and e is the MV ofExponentPart . -
The MV of
is the MV ofDecimalLiteral :: . DecimalDigits DecimalDigits × 10-n, where n is the number of code points inDecimalDigits , excluding all occurrences ofNumericLiteralSeparator . -
The MV of
is the MV ofDecimalLiteral :: . DecimalDigits ExponentPart DecimalDigits × 10e - n, where n is the number of code points inDecimalDigits , excluding all occurrences ofNumericLiteralSeparator , and e is the MV ofExponentPart . -
The MV of
is the MV ofDecimalLiteral :: DecimalIntegerLiteral ExponentPart DecimalIntegerLiteral × 10e, where e is the MV ofExponentPart . -
The MV of
is 0.DecimalIntegerLiteral :: 0 -
The MV of
is (the MV ofDecimalIntegerLiteral :: NonZeroDigit NumericLiteralSeparator opt DecimalDigits NonZeroDigit × 10n) plus the MV ofDecimalDigits , where n is the number of code points inDecimalDigits , excluding all occurrences ofNumericLiteralSeparator . -
The MV of
is (the MV ofDecimalDigits :: DecimalDigits DecimalDigit DecimalDigits × 10) plus the MV ofDecimalDigit . -
The MV of
is (the MV ofDecimalDigits :: DecimalDigits NumericLiteralSeparator DecimalDigit DecimalDigits × 10) plus the MV ofDecimalDigit . -
The MV of
is the MV ofExponentPart :: ExponentIndicator SignedInteger SignedInteger . -
The MV of
is the negative of the MV ofSignedInteger :: - DecimalDigits DecimalDigits . -
The MV of
or ofDecimalDigit :: 0 or ofHexDigit :: 0 or ofOctalDigit :: 0 or ofLegacyOctalEscapeSequence :: 0 is 0.BinaryDigit :: 0 -
The MV of
or ofDecimalDigit :: 1 or ofNonZeroDigit :: 1 or ofHexDigit :: 1 or ofOctalDigit :: 1 is 1.BinaryDigit :: 1 -
The MV of
or ofDecimalDigit :: 2 or ofNonZeroDigit :: 2 or ofHexDigit :: 2 is 2.OctalDigit :: 2 -
The MV of
or ofDecimalDigit :: 3 or ofNonZeroDigit :: 3 or ofHexDigit :: 3 is 3.OctalDigit :: 3 -
The MV of
or ofDecimalDigit :: 4 or ofNonZeroDigit :: 4 or ofHexDigit :: 4 is 4.OctalDigit :: 4 -
The MV of
or ofDecimalDigit :: 5 or ofNonZeroDigit :: 5 or ofHexDigit :: 5 is 5.OctalDigit :: 5 -
The MV of
or ofDecimalDigit :: 6 or ofNonZeroDigit :: 6 or ofHexDigit :: 6 is 6.OctalDigit :: 6 -
The MV of
or ofDecimalDigit :: 7 or ofNonZeroDigit :: 7 or ofHexDigit :: 7 is 7.OctalDigit :: 7 -
The MV of
or ofDecimalDigit :: 8 or ofNonZeroDigit :: 8 or ofNonOctalDigit :: 8 is 8.HexDigit :: 8 -
The MV of
or ofDecimalDigit :: 9 or ofNonZeroDigit :: 9 or ofNonOctalDigit :: 9 is 9.HexDigit :: 9 -
The MV of
or ofHexDigit :: a is 10.HexDigit :: A -
The MV of
or ofHexDigit :: b is 11.HexDigit :: B -
The MV of
or ofHexDigit :: c is 12.HexDigit :: C -
The MV of
or ofHexDigit :: d is 13.HexDigit :: D -
The MV of
or ofHexDigit :: e is 14.HexDigit :: E -
The MV of
or ofHexDigit :: f is 15.HexDigit :: F -
The MV of
is (the MV ofBinaryDigits :: BinaryDigits BinaryDigit BinaryDigits × 2) plus the MV ofBinaryDigit . -
The MV of
is (the MV ofBinaryDigits :: BinaryDigits NumericLiteralSeparator BinaryDigit BinaryDigits × 2) plus the MV ofBinaryDigit . -
The MV of
is (the MV ofOctalDigits :: OctalDigits OctalDigit OctalDigits × 8) plus the MV ofOctalDigit . -
The MV of
is (the MV ofOctalDigits :: OctalDigits NumericLiteralSeparator OctalDigit OctalDigits × 8) plus the MV ofOctalDigit . -
The MV of
is (the MV ofLegacyOctalIntegerLiteral :: LegacyOctalIntegerLiteral OctalDigit LegacyOctalIntegerLiteral times 8) plus the MV ofOctalDigit . -
The MV of
is (the MV ofNonOctalDecimalIntegerLiteral :: LegacyOctalLikeDecimalIntegerLiteral NonOctalDigit LegacyOctalLikeDecimalIntegerLiteral times 10) plus the MV ofNonOctalDigit . -
The MV of
is (the MV ofNonOctalDecimalIntegerLiteral :: NonOctalDecimalIntegerLiteral DecimalDigit NonOctalDecimalIntegerLiteral times 10) plus the MV ofDecimalDigit . -
The MV of
is (the MV ofLegacyOctalLikeDecimalIntegerLiteral :: LegacyOctalLikeDecimalIntegerLiteral OctalDigit LegacyOctalLikeDecimalIntegerLiteral times 10) plus the MV ofOctalDigit . -
The MV of
is (the MV ofHexDigits :: HexDigits HexDigit HexDigits × 16) plus the MV ofHexDigit . -
The MV of
is (the MV ofHexDigits :: HexDigits NumericLiteralSeparator HexDigit HexDigits × 16) plus the MV ofHexDigit .
12.8.3.3 Static Semantics: NumericValue
The
- Return
RoundMVResult (MV ofDecimalLiteral ).
- Return 𝔽(MV of
NonDecimalIntegerLiteral ).
- Return 𝔽(MV of
LegacyOctalIntegerLiteral ).
- Return the BigInt value that represents the MV of
NonDecimalIntegerLiteral .
- Return
0 ℤ.
- Return the BigInt value that represents the MV of
NonZeroDigit .
- Let n be the number of code points in
DecimalDigits , excluding all occurrences ofNumericLiteralSeparator . - Let mv be (the MV of
NonZeroDigit × 10n) plus the MV ofDecimalDigits . - Return ℤ(mv).
12.8.4 String Literals
A string literal is 0 or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in
Syntax
The definition of the nonterminal
<LF> and <CR> cannot appear in a string literal, except as part of a \n or \u000A.
12.8.4.1 Static Semantics: Early Errors
- It is a Syntax Error if the
source text matched by this production isstrict mode code .
It is possible for string literals to precede a
function invalid() { "\7"; "use strict"; }
12.8.4.2 Static Semantics: SV
The
A string literal stands for a value of the String type. SV produces String values for string literals through recursive application on the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a
-
The SV of
is the empty String.StringLiteral :: " " -
The SV of
is the empty String.StringLiteral :: ' ' -
The SV of
is theDoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters string-concatenation of the SV ofDoubleStringCharacter and the SV ofDoubleStringCharacters . -
The SV of
is theSingleStringCharacters :: SingleStringCharacter SingleStringCharacters string-concatenation of the SV ofSingleStringCharacter and the SV ofSingleStringCharacters . -
The SV of
is the result of performingDoubleStringCharacter :: SourceCharacter but notone of " or\ orLineTerminator UTF16EncodeCodePoint on the code point matched bySourceCharacter . -
The SV of
is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).DoubleStringCharacter :: <LS > -
The SV of
is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).DoubleStringCharacter :: <PS > -
The SV of
is the empty String.DoubleStringCharacter :: LineContinuation -
The SV of
is the result of performingSingleStringCharacter :: SourceCharacter but notone of ' or\ orLineTerminator UTF16EncodeCodePoint on the code point matched bySourceCharacter . -
The SV of
is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).SingleStringCharacter :: <LS > -
The SV of
is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).SingleStringCharacter :: <PS > -
The SV of
is the empty String.SingleStringCharacter :: LineContinuation -
The SV of
is the String value consisting of the code unit 0x0000 (NULL).EscapeSequence :: 0 -
The SV of
is the String value consisting of the code unit whose value is determined by theCharacterEscapeSequence :: SingleEscapeCharacter SingleEscapeCharacter according toTable 40 .
| Escape Sequence | Code Unit Value | Unicode Character Name | Symbol |
|---|---|---|---|
\b
|
0x0008
|
BACKSPACE | <BS> |
\t
|
0x0009
|
CHARACTER TABULATION | <HT> |
\n
|
0x000A
|
LINE FEED (LF) | <LF> |
\v
|
0x000B
|
LINE TABULATION | <VT> |
\f
|
0x000C
|
FORM FEED (FF) | <FF> |
\r
|
0x000D
|
CARRIAGE RETURN (CR) | <CR> |
\"
|
0x0022
|
QUOTATION MARK |
"
|
\'
|
0x0027
|
APOSTROPHE |
'
|
\\
|
0x005C
|
REVERSE SOLIDUS |
\
|
-
The SV of
is the result of performingNonEscapeCharacter :: SourceCharacter but notone of EscapeCharacter orLineTerminator UTF16EncodeCodePoint on the code point matched bySourceCharacter . -
The SV of
is the String value consisting of the code unit whose value is the MV ofEscapeSequence :: LegacyOctalEscapeSequence LegacyOctalEscapeSequence . -
The SV of
is the String value consisting of the code unit 0x0038 (DIGIT EIGHT).NonOctalDecimalEscapeSequence :: 8 -
The SV of
is the String value consisting of the code unit 0x0039 (DIGIT NINE).NonOctalDecimalEscapeSequence :: 9 -
The SV of
is the String value consisting of the code unit whose value is the MV ofHexEscapeSequence :: x HexDigit HexDigit HexEscapeSequence . -
The SV of
is the String value consisting of the code unit whose value is the MV ofHex4Digits :: HexDigit HexDigit HexDigit HexDigit Hex4Digits . -
The SV of
is the result of performingUnicodeEscapeSequence :: u{ CodePoint } UTF16EncodeCodePoint on the MV ofCodePoint . -
The SV of
is the String value consisting of the code unit 0x0000 (NULL).TemplateEscapeSequence :: 0
12.8.4.3 Static Semantics: MV
-
The MV of
is (8 times the MV ofLegacyOctalEscapeSequence :: ZeroToThree OctalDigit ZeroToThree ) plus the MV ofOctalDigit . -
The MV of
is (8 times the MV ofLegacyOctalEscapeSequence :: FourToSeven OctalDigit FourToSeven ) plus the MV ofOctalDigit . -
The MV of
is (64 (that is, 82) times the MV ofLegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit ZeroToThree ) plus (8 times the MV of the firstOctalDigit ) plus the MV of the secondOctalDigit . -
The MV of
is 0.ZeroToThree :: 0 -
The MV of
is 1.ZeroToThree :: 1 -
The MV of
is 2.ZeroToThree :: 2 -
The MV of
is 3.ZeroToThree :: 3 -
The MV of
is 4.FourToSeven :: 4 -
The MV of
is 5.FourToSeven :: 5 -
The MV of
is 6.FourToSeven :: 6 -
The MV of
is 7.FourToSeven :: 7 -
The MV of
is (16 times the MV of the firstHexEscapeSequence :: x HexDigit HexDigit HexDigit ) plus the MV of the secondHexDigit . -
The MV of
is (0x1000 × the MV of the firstHex4Digits :: HexDigit HexDigit HexDigit HexDigit HexDigit ) plus (0x100 × the MV of the secondHexDigit ) plus (0x10 × the MV of the thirdHexDigit ) plus the MV of the fourthHexDigit .
12.8.5 Regular Expression Literals
A regular expression literal is an input element that is converted to a RegExp object (see === to each other even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp or calling the RegExp
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The source text comprising the
An implementation may extend the ECMAScript Regular Expression grammar defined in
Syntax
Regular expression literals may not be empty; instead of representing an empty regular expression literal, the code unit sequence // starts a single-line comment. To specify an empty regular expression, use: /(?:)/.
12.8.5.1 Static Semantics: BodyText
The
- Return the source text that was recognized as
RegularExpressionBody .
12.8.5.2 Static Semantics: FlagText
The
- Return the source text that was recognized as
RegularExpressionFlags .
12.8.6 Template Literal Lexical Components
Syntax
12.8.6.1 Static Semantics: TV
The
-
The TV of
is the empty String.NoSubstitutionTemplate :: ` ` -
The TV of
is the empty String.TemplateHead :: ` ${ -
The TV of
is the empty String.TemplateMiddle :: } ${ -
The TV of
is the empty String.TemplateTail :: } ` -
The TV of
isTemplateCharacters :: TemplateCharacter TemplateCharacters undefined if either the TV ofTemplateCharacter isundefined or the TV ofTemplateCharacters isundefined . Otherwise, it is thestring-concatenation of the TV ofTemplateCharacter and the TV ofTemplateCharacters . -
The TV of
is the result of performingTemplateCharacter :: SourceCharacter but notone of ` or\ or$ orLineTerminator UTF16EncodeCodePoint on the code point matched bySourceCharacter . -
The TV of
is the String value consisting of the code unit 0x0024 (DOLLAR SIGN).TemplateCharacter :: $ -
The TV of
is theTemplateCharacter :: \ TemplateEscapeSequence SV ofTemplateEscapeSequence . -
The TV of
isTemplateCharacter :: \ NotEscapeSequence undefined . -
The TV of
is theTemplateCharacter :: LineTerminatorSequence TRV ofLineTerminatorSequence . -
The TV of
is the empty String.LineContinuation :: \ LineTerminatorSequence
12.8.6.2 Static Semantics: TRV
The
-
The TRV of
is the empty String.NoSubstitutionTemplate :: ` ` -
The TRV of
is the empty String.TemplateHead :: ` ${ -
The TRV of
is the empty String.TemplateMiddle :: } ${ -
The TRV of
is the empty String.TemplateTail :: } ` -
The TRV of
is theTemplateCharacters :: TemplateCharacter TemplateCharacters string-concatenation of the TRV ofTemplateCharacter and the TRV ofTemplateCharacters . -
The TRV of
is the result of performingTemplateCharacter :: SourceCharacter but notone of ` or\ or$ orLineTerminator UTF16EncodeCodePoint on the code point matched bySourceCharacter . -
The TRV of
is the String value consisting of the code unit 0x0024 (DOLLAR SIGN).TemplateCharacter :: $ -
The TRV of
is theTemplateCharacter :: \ TemplateEscapeSequence string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV ofTemplateEscapeSequence . -
The TRV of
is theTemplateCharacter :: \ NotEscapeSequence string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV ofNotEscapeSequence . -
The TRV of
is the String value consisting of the code unit 0x0030 (DIGIT ZERO).TemplateEscapeSequence :: 0 -
The TRV of
is theNotEscapeSequence :: 0 DecimalDigit string-concatenation of the code unit 0x0030 (DIGIT ZERO) and the TRV ofDecimalDigit . -
The TRV of
is the String value consisting of the code unit 0x0078 (LATIN SMALL LETTER X).NotEscapeSequence :: x [lookahead ∉ HexDigit ] -
The TRV of
is theNotEscapeSequence :: x HexDigit [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0078 (LATIN SMALL LETTER X) and the TRV ofHexDigit . -
The TRV of
is the String value consisting of the code unit 0x0075 (LATIN SMALL LETTER U).NotEscapeSequence :: u [lookahead ∉ HexDigit ][lookahead ≠ { ] -
The TRV of
is theNotEscapeSequence :: u HexDigit [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the TRV ofHexDigit . -
The TRV of
is theNotEscapeSequence :: u HexDigit HexDigit [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the TRV of the firstHexDigit , and the TRV of the secondHexDigit . -
The TRV of
is theNotEscapeSequence :: u HexDigit HexDigit HexDigit [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the TRV of the firstHexDigit , the TRV of the secondHexDigit , and the TRV of the thirdHexDigit . -
The TRV of
is theNotEscapeSequence :: u { [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the code unit 0x007B (LEFT CURLY BRACKET). -
The TRV of
is theNotEscapeSequence :: u { NotCodePoint [lookahead ∉ HexDigit ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), and the TRV ofNotCodePoint . -
The TRV of
is theNotEscapeSequence :: u { CodePoint [lookahead ∉ HexDigit ][lookahead ≠ } ]string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), and the TRV ofCodePoint . -
The TRV of
is the result of performingDecimalDigit :: one of 0 1 2 3 4 5 6 7 8 9 UTF16EncodeCodePoint on the single code point matched by this production. -
The TRV of
is theCharacterEscapeSequence :: NonEscapeCharacter SV ofNonEscapeCharacter . -
The TRV of
is the result of performingSingleEscapeCharacter :: one of ' " \ b f n r t v UTF16EncodeCodePoint on the single code point matched by this production. -
The TRV of
is theHexEscapeSequence :: x HexDigit HexDigit string-concatenation of the code unit 0x0078 (LATIN SMALL LETTER X), the TRV of the firstHexDigit , and the TRV of the secondHexDigit . -
The TRV of
is theUnicodeEscapeSequence :: u Hex4Digits string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U) and the TRV ofHex4Digits . -
The TRV of
is theUnicodeEscapeSequence :: u{ CodePoint } string-concatenation of the code unit 0x0075 (LATIN SMALL LETTER U), the code unit 0x007B (LEFT CURLY BRACKET), the TRV ofCodePoint , and the code unit 0x007D (RIGHT CURLY BRACKET). -
The TRV of
is theHex4Digits :: HexDigit HexDigit HexDigit HexDigit string-concatenation of the TRV of the firstHexDigit , the TRV of the secondHexDigit , the TRV of the thirdHexDigit , and the TRV of the fourthHexDigit . -
The TRV of
is theHexDigits :: HexDigits HexDigit string-concatenation of the TRV ofHexDigits and the TRV ofHexDigit . -
The TRV of
is the result of performingHexDigit :: one of 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F UTF16EncodeCodePoint on the single code point matched by this production. -
The TRV of
is theLineContinuation :: \ LineTerminatorSequence string-concatenation of the code unit 0x005C (REVERSE SOLIDUS) and the TRV ofLineTerminatorSequence . -
The TRV of
is the String value consisting of the code unit 0x000A (LINE FEED).LineTerminatorSequence :: <LF > -
The TRV of
is the String value consisting of the code unit 0x000A (LINE FEED).LineTerminatorSequence :: <CR > -
The TRV of
is the String value consisting of the code unit 0x2028 (LINE SEPARATOR).LineTerminatorSequence :: <LS > -
The TRV of
is the String value consisting of the code unit 0x2029 (PARAGRAPH SEPARATOR).LineTerminatorSequence :: <PS > -
The TRV of
is the String value consisting of the code unit 0x000A (LINE FEED).LineTerminatorSequence :: <CR > <LF >
12.9 Automatic Semicolon Insertion
Most ECMAScript statements and declarations must be terminated with a semicolon. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.
12.9.1 Rules of Automatic Semicolon Insertion
In the following rules, “token” means the actual recognized lexical token determined using the current lexical
There are three basic rules of semicolon insertion:
-
When, as the source text is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
-
The offending token is separated from the previous token by at least one
LineTerminator . -
The offending token is
}. -
The previous token is
)and the inserted semicolon would then be parsed as the terminating semicolon of a do-while statement (14.7.2 ).
-
The offending token is separated from the previous token by at least one
- When, as the source text is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single instance of the goal nonterminal, then a semicolon is automatically inserted at the end of the input stream.
-
When, as the source text is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation “[no
LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least oneLineTerminator , then a semicolon is automatically inserted before the restricted token.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see
The following are the only restricted productions in the grammar:
The practical effect of these restricted productions is as follows:
-
When a
++or--token is encountered where the parser would treat it as a postfix operator, and at least oneLineTerminator occurred between the preceding token and the++or--token, then a semicolon is automatically inserted before the++or--token. -
When a
continue,break,return,throw, oryieldtoken is encountered and aLineTerminator is encountered before the next token, a semicolon is automatically inserted after thecontinue,break,return,throw, oryieldtoken. -
When arrow function parameter(s) are followed by a
LineTerminator before a=>token, a semicolon is automatically inserted and the punctuator causes a syntax error. -
When an
asynctoken is followed by aLineTerminator before afunctionorIdentifierName or(token, a semicolon is automatically inserted and theasynctoken is not treated as part of the same expression or class element as the following tokens. -
When an
asynctoken is followed by aLineTerminator before a*token, a semicolon is automatically inserted and the punctuator causes a syntax error.
The resulting practical advice to ECMAScript programmers is:
-
A postfix
++or--operator should be on the same line as its operand. -
An
Expression in areturnorthrowstatement or anAssignmentExpression in ayieldexpression should start on the same line as thereturn,throw, oryieldtoken. -
A
LabelIdentifier in abreakorcontinuestatement should be on the same line as thebreakorcontinuetoken. -
The end of an arrow function's parameter(s) and its
=>should be on the same line. -
The
asynctoken preceding an asynchronous function or method should be on the same line as the immediately following token.
12.9.2 Examples of Automatic Semicolon Insertion
The source
{ 1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source
{ 1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{ 1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for (a; b
)
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the semicolon is needed for the header of a for statement. Automatic semicolon insertion never inserts one of the two semicolons in the header of a for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
The expression a + b is not treated as a value to be returned by the return statement, because a return.
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
The token ++ is not treated as a postfix operator applying to the variable b, because a b and ++.
The source
if (a > b)
else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would then be parsed as an empty statement.
The source
a = b + c
(d + e).print()
is not transformed by automatic semicolon insertion, because the parenthesized expression that begins the second line can be interpreted as an argument list for a function call:
a = b + c(d + e).print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.
12.9.3 Interesting Cases of Automatic Semicolon Insertion
ECMAScript programs can be written in a style with very few semicolons by relying on automatic semicolon insertion. As described above, semicolons are not inserted at every newline, and automatic semicolon insertion can depend on multiple tokens across line terminators.
As new syntactic features are added to ECMAScript, additional grammar productions could be added that cause lines relying on automatic semicolon insertion preceding them to change grammar productions when parsed.
For the purposes of this section, a case of automatic semicolon insertion is considered interesting if it is a place where a semicolon may or may not be inserted, depending on the source text which precedes it. The rest of this section describes a number of interesting cases of automatic semicolon insertion in this version of ECMAScript.
12.9.3.1 Interesting Cases of Automatic Semicolon Insertion in Statement Lists
In a
- An opening parenthesis (
(). Without a semicolon, the two lines together are treated as aCallExpression . - An opening square bracket (
[). Without a semicolon, the two lines together are treated as property access, rather than anArrayLiteral orArrayAssignmentPattern . - A template literal (
`). Without a semicolon, the two lines together are interpreted as a tagged Template (13.3.11 ), with the previous expression as theMemberExpression . - Unary
+or-. Without a semicolon, the two lines together are interpreted as a usage of the corresponding binary operator. - A RegExp literal. Without a semicolon, the two lines together may be parsed instead as the
/MultiplicativeOperator , for example if the RegExp has flags.
12.9.3.2 Cases of Automatic Semicolon Insertion and “[no LineTerminator here]”
ECMAScript contains grammar productions which include “[no
The rest of this section describes a number of productions using “[no
12.9.3.2.1 List of Grammar Productions with Optional Operands and “[no LineTerminator here]”
UpdateExpression .ContinueStatement .BreakStatement .ReturnStatement .YieldExpression .- Async Function Definitions (
15.8 ) with relation to Function Definitions (15.2 )