7 Lexical Conventions
The source text of a ECMAScript program is first converted into a sequence of tokens and white space. A token is a sequence of characters that comprise a lexical unit. The source text is scanned from left to right, repeatedly taking the longest possible sequence of characters as the next token.
7.1 White Space
White space characters are used to improve source text readability and to separate tokens (indivisible lexical units) from each other but are otherwise insignificant. White space may occur between any two tokens, and may occur within strings (where they are considered significant characters forming part of the literal string value), but cannot appear within any other kind of token.
The following characters are considered to be white space:
| Unicode Value | Name | Formal Name |
|---|---|---|
| \u0009 | Tab | <TAB> |
| \u000B | Vertical Tab | <VT> |
| \u000C | Form Feed | <FF> |
| \u0020 | Space | <SP> |
Syntax
- WhiteSpace ::
- <TAB>
<VT>
<FF>
<SP>
<NBSP>
<USP>
7.2 Line Terminators
Line terminator characters, like whitespace characters, are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. Unlike whitespace characters, line terminators have some influence over the behavior of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. A line terminator cannot occur within any token, not even a string. Line terminators also affect the process of automatic semicolon insertion (see section 7.8.2).
The following characters are considered to be line terminators:
| Unicode Value | Name | Formal Name |
|---|---|---|
| \u000A | Line Feed | <LF> |
| \u000D | Carriage Return | <CR> |
Syntax
- LineTerminator ::
- <LF>
<CR>
7.3 Comments
Description
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any character except a LineTerminator character, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all characters from the // marker to the end of the line. However, the LineTerminator at the end of the line is not considered to be part of the single-line comment; it is recognized separately by the lexical grammar and becomes part of the stream of input elements for the syntactic grammar. This point is very important, because it implies that the presence or absence of single-line comments does not affect the process of automatic semicolon insertion (see section 7.8.2).
Syntax
- Comment ::
- MultiLineComment
SingleLineComment
- MultiLineComment ::
- /* MultiLineCommentCharsopt */
- MultiLineCommentChars ::
- MultiLineNotAsteriskChar
MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
- PostAsteriskCommentChars ::
- MultiLineNotForwardSlashOrAsteriskChar
MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
- MultiLineNotAsteriskChar ::
- SourceCharacter but not asterisk *
SourceCharacter but not asterisk *
- MultiLineNotForwardSlashOrAsteriskChar ::
- SourceCharacter but not forward-slash / or asterisk *
SourceCharacter but not forward-slash / or asterisk *
- SingleLineComment ::
- // SingleLineCommentCharsopt
- SingleLineCommentChars ::
- SingleLineCommentChar SingleLineCommentCharsopt
- SingleLineCommentChar ::
- SourceCharacter but not LineTerminator
SourceCharacter but not LineTerminator
7.4 Tokens
Syntax
- Token ::
- ReservedWord
Identifier
Punctuator
Literal
7.4.1 Reserved Words
Description
Reserved words cannot be used as identifiers.
- ReservedWord ::
- Keyword
FutureReservedWord
NullLiteral
BooleanLiteral
7.4.2 Keywords
The following tokens are ECMAScript keywords and may not be used as identifiers in ECMAScript programs.
Syntax
Keyword :: one of
| break | for | new | var |
| continue | function | return | void |
| delete | if | this | while |
| else | in | typeof | with |
7.4.3 Future Reserved Words
The following words are used as keywords in proposed extensions and are therefore reserved to allow for the possibility of future adoption of those extensions.
Syntax
FutureReservedWord :: one of
| case | debugger | export | super |
| catch | default | extends | switch |
| class | do | finally | throw |
| const | enum | import | try |
7.5 Identifiers
Description
An identifier is a character sequence of unlimited length, where each character in the sequence must be a letter, a decimal digit, an underscore ( _ ) character, or a dollar sign ( $ ) character, and the first character may not be a decimal digit. ECMAScript identifiers are case sensitive: identifiers whose characters differ in any way, even if only in case, are considered to be distinct.
Syntax
- Identifier ::
- IdentifierName but not ReservedWord
IdentifierName but not ReservedWord
- IdentifierName ::
- IdentifierLetter
IdentifierName IdentifierLetter
IdentifierName DecimalDigit
- IdentifierLetter :: one of
- a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ $
- DecimalDigit :: one of
- 0 1 2 3 4 5 6 7 8 9
7.6 Punctuators
Syntax
Punctuator :: one of
| = | > | < | == | <= | >= |
| !=
. |
,
&& |
!
|| |
~
++ |
?
-- |
:
+ |
| -
% |
*
<< |
/
>> |
&
>> |
|
+ |
^ |
| *=
&= |
/=
>= |
&=
>>= |
|=
( |
^=
) |
%=
{ |
| <<=
} |
>>=
[ |
] |
; |
7.7 Literals
Syntax
- Literal ::
- NullLiteral
BooleanLiteral
NumericLiteral
StringLiteral
7.7.1 Null Literals
Syntax
- NullLiteral ::
- null
Semantics
The value of the null literal null is the sole value of the Null type, namely null .
7.7.2 Boolean Literals
Syntax
- BooleanLiteral ::
- true
false
Semantics
The value of the Boolean literal true is a value of the Boolean type, namely true .
The value of the Boolean literal false is a value of the Boolean type, namely false .
7.7.3 Numeric Literals
Syntax
- NumericLiteral ::
- DecimalLiteral
HexIntegerLiteral
- DecimalLiteral ::
- DecimalIntegerLiteral .
DecimalDigitsopt ExponentPartopt
. DecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
- DecimalIntegerLiteral ::
- 0
NonZeroDigit DecimalDigitsopt
- DecimalDigits ::
- DecimalDigit
DecimalDigits DecimalDigit
- NonZeroDigit :: one of
- 1 2 3 4 5 6 7 8 9
- ExponentPart :::
- ExponentIndicator SignedInteger
- ExponentIndicator :: one of
- e E
- SignedInteger ::
- DecimalDigits
+ DecimalDigits
- DecimalDigits
- HexIntegerLiteral ::
- 0x HexDigit
0X HexDigit
HexIntegerLiteral HexDigit
- HexDigit :: one of
- 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
- OctalIntegerLiteral ::
- 0 OctalDigit
OctalIntegerLiteral OctalDigit
- OctalDigit :: one of
01234567
Semantics
A numeric literal stands for a value of the Number type. This value is determined in two steps: first, a mathematical value (MV) is derived from the literal; second, this mathematical value is rounded, ideally using IEEE 754 round-to-nearest mode, to a representable value of the number type.
- The MV of NumericLiteral :: DecimalLiteral is the MV of DecimalLiteral .
- The MV of NumericLiteral :: HexIntegerLiteral is the MV of HexIntegerLiteral .
- The MV of NumericLiteral :: OctalIntegerLiteral is the MV of OctalIntegerLiteral .
- The MV of DecimalLiteral :: DecimalIntegerLiteral . is the MV of DecimalIntegerLiteral .
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits is the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits times 10 − n ), where n is the number of characters in DecimalDigit s.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . ExponentPart is the MV of DecimalIntegerLiteral times 10 e , where e is the MV of ExponentPart .
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart is (the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits times 10 − n )) times 10 e , where n is the number of characters in DecimalDigit s and e is the MV of ExponentPart .
- The MV of DecimalLiteral ::. DecimalDigits is the MV of DecimalDigits times 10 − n , where n is the number of characters in DecimalDigit s.
- The MV of DecimalLiteral ::. DecimalDigits ExponentPart is the MV of DecimalDigits times 10 e − n , where n is the number of characters in DecimalDigit s and e is the MV of ExponentPart .
- The MV of DecimalLiteral :: DecimalIntegerLiteral is the MV of DecimalIntegerLiteral .
- The MV of DecimalLiteral :: DecimalIntegerLiteral ExponentPart is the MV of DecimalIntegerLiteral times 10 e , where e is the MV of ExponentPart .
- The MV of DecimalIntegerLiteral :: 0 is 0.
- The MV of DecimalIntegerLiteral :: NonZeroDigit DecimalDigits is (the MV of NonZeroDigit times 10 n ) plus the MV of DecimalDigits , where n is the number of characters in DecimalDigits .
- The MV of DecimalDigits :: DecimalDigit is the MV of DecimalDigit .
- The MV of DecimalDigits :: DecimalDigits DecimalDigit is (the MV of DecimalDigits times 10) plus the MV of DecimalDigit .
- The MV of ExponentPart :: ExponentIndicator SignedInteger is the MV of SignedInteger .
- The MV of SignedInteger :: DecimalDigits is the MV of DecimalDigits .
- The MV of SignedInteger :: + DecimalDigits is the MV of DecimalDigits .
- The MV of SignedInteger :: - DecimalDigits is the negative of the MV of DecimalDigits .
- The MV of DecimalDigit :: 0 or of HexDigit :: 0 or of OctalDigit :: 0 is 0.
- The MV of DecimalDigit :: 1 or of NonZeroDigit :: 1 or of HexDigit :: 1 or of OctalDigit :: 1 is 1.
- The MV of DecimalDigit :: 2 or of NonZeroDigit :: 2 or of HexDigit :: 2 or of OctalDigit :: 2 is 2.
- The MV of DecimalDigit :: 3 or of NonZeroDigit :: 3 or of HexDigit :: 3 or of OctalDigit :: 3 is 3.
- The MV of DecimalDigit :: 4 or of NonZeroDigit :: 4 or of HexDigit :: 4 or of OctalDigit :: 4 is 4.
- The MV of DecimalDigit :: 5 or of NonZeroDigit :: 5 or of HexDigit :: 5 or of OctalDigit :: 5 is 5.
- The MV of DecimalDigit :: 6 or of NonZeroDigit :: 6 or of HexDigit :: 6 or of OctalDigit :: 6 is 6.
- The MV of DecimalDigit :: 7 or of NonZeroDigit :: 7 or of HexDigit :: 7 or of OctalDigit :: 7 is 7.
- The MV of DecimalDigit :: 8 or of NonZeroDigit :: 8 or of HexDigit :: 8 is 8.
- The MV of DecimalDigit :: 9 or of NonZeroDigit :: 9 or of HexDigit :: 9 is 9.
- The MV of HexDigit :: a or of HexDigit :: A is 10.
- The MV of HexDigit :: b or of HexDigit :: B is 11.
- The MV of HexDigit :: c or of HexDigit :: C is 12.
- The MV of HexDigit :: d or of HexDigit :: D is 13.
- The MV of HexDigit :: e or of HexDigit :: E is 14.
- The MV of HexDigit :: f or of HexDigit :: F is 15.
- The MV of HexIntegerLiteral :: 0x HexDigit is the MV of HexDigit .
- The MV of HexIntegerLiteral :: 0X HexDigit is the MV of HexDigit .
- The MV of HexIntegerLiteral :: HexIntegerLiteral HexDigit is (the MV of HexIntegerLiteral times 16) plus the MV of HexDigit .
- The MV of OctalIntegerLiteral :: 0 OctalDigit is the MV of OctalDigit .
- The MV of OctalIntegerLiteral :: OctalIntegerLiteral OctalDigit is (the MV of OctalIntegerLiteral times 8) plus the MV of OctalDigit .
Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number type. If the MV is 0, then the rounded value is +0 ; otherwise, the rounded value must be the number value for the MV (in the sense defined in section 8.4), unless the literal is a DecimalLiteral and the literal has more than 20 significant digits, in which case the number value may be either the number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit or the number value for the MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then incrementing the literal at the 20th digit position. A digit is significant if it is not part of an ExponentPart and (either it is not 0 or (there is a nonzero digit to its left and there is a nonzero digit, not in the ExponentPart , to its right)).
7.7.4 String Literals
A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence.
Syntax
- StringLiteral ::
- " DoubleStringCharactersopt "
' SingleStringCharactersopt '
- DoubleStringCharacters ::
- DoubleStringCharacter DoubleStringCharactersopt
- SingleStringCharacters ::
- SingleStringCharacter SingleStringCharactersopt
- DoubleStringCharacter ::
- SourceCharacter but not double-quote " or backslash \ or LineTerminator
EscapeSequence
- SingleStringCharacter ::
- SourceCharacter but not single-quote ' or backslash \ or LineTerminator
EscapeSequence
- EscapeSequence ::
- CharacterEscapeSequence
OctalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
- CharacterEscapeSequence ::
- \ SingleEscapeCharacter
\ NonEscapeCharacter
- SingleEscapeCharacter :: one of
- ' " \ b f n r t
- NonEscapeCharacter ::
- SourceCharacter but not EscapeCharacter or LineTerminator
- EscapeCharacter ::
- SingleEscapeCharacter
OctalDigit
x
u
- HexEscapeSequence ::
- \x HexDigit HexDigit
- OctalEscapeSequence ::
- \ OctalDigit
\ OctalDigit OctalDigit
\ ZeroToThree OctalDigit OctalDigit
- ZeroToThree :: one of
- 0 1 2 3
- UnicodeEscapeSequence ::
- \u HexDigit HexDigit HexDigit HexDigit
The definitions of the nonterminals HexDigit and OctalDigit are given in section 7.7.3.
A string literal stands for a value of the String type. The string value (SV) of the literal is described in terms of character values (CV) contributed by the various parts of the string literal. As part of this process, some characters within the string literal are interpeted as having a mathematical value (MV), as described below or in section 7.7.3.
- The SV of StringLiteral :: "" is the empty character sequence .
- The SV of StringLiteral :: '' is the empty character sequence.
- The SV of StringLiteral :: " DoubleStringCharacters " is the SV of DoubleStringCharacters .
- The SV of StringLiteral :: ' SingleStringCharacters ' is the SV of SingleStringCharacters .
- The SV of DoubleStringCharacters :: DoubleStringCharacter is a sequence of one character, the CV of DoubleStringCharacter .
- The SV of DoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters is a sequence of the CV of DoubleStringCharacter followed by all the characters in the SV of DoubleStringCharacters in order.
- The SV of SingleStringCharacters :: SingleStringCharacter is a sequence of one character, the CV of SingleStringCharacter .
- The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a sequence of the CV of SingleStringCharacter followed by all the characters in the SV of SingleStringCharacters in order.
- The CV of DoubleStringCharacter :: SourceCharacter but not double-quote " or backslash \ or LineTerminator is the SourceCharacter character itself.
- The CV of DoubleStringCharacter :: EscapeSequence is the CV of the EscapeSequence .
- The CV of SingleStringCharacter :: SourceCharacter but not single-quote ' or backslash \ or LineTerminator is the SourceCharacter character itself.
- The CV of SingleStringCharacter :: EscapeSequence is the CV of the EscapeSequence .
- The CV of EscapeSequence :: CharacterEscapeSequence is the CV of the CharacterEscapeSequence .
- The CV of EscapeSequence :: OctalEscapeSequence is the CV of the OctalEscapeSequence .
- The CV of EscapeSequence :: HexEscapeSequence is the CV of the HexEscapeSequence .
- The CV of EscapeSequence :: UnicodeEscapeSequence is the CV of the UnicodeEscapeSequence .
- The CV of CharacterEscapeSequence :: \ SingleEscapeCharacter is the Unicode character whose Unicode value is determined by the SingleEscapeCharacter according to the following table:
| Escape Sequence | Unicode Value | Name | Symbol |
|---|---|---|---|
| \b | \u0008 | backspace |
|
| \t | \u0009 | horizontal tab |
|
| \n | \u000A | line feed (new line) |
|
| \f | \u000C | form feed |
|
| \r | \u000D | carriage return |
|
| \" | \u0022 | double quote | " |
| \' | \u0027 | single quote | ' |
| \\ | \u005C | backslash | \ |
- The CV of CharacterEscapeSequence :: \ NonEscapeCharacter is the CV of the NonEscapeCharacter .
- The CV of NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or LineTerminator is the SourceCharacter character itself.
- The CV of HexEscapeSequence :: \x HexDigit HexDigit is the Unicode character whose code is (16 times the MV of the first HexDigit ) plus the MV of the second HexDigit .
- The CV of OctalEscapeSequence :: \ OctalDigit is the Unicode character whose code is the MV of the OctalDigit .
- The CV of OctalEscapeSequence :: \ OctalDigit OctalDigit is the Unicode character whose code is (8 times the MV of the first OctalDigit ) plus the MV of the second OctalDigit .
- The CV of OctalEscapeSequence :: \ ZeroToThree OctalDigit OctalDigit is the Unicode character whose code is (64 (that is, 8 2 ) times the MV of the ZeroToThree ) plus (8 times the MV of the first OctalDigit ) plus the MV of the second OctalDigit .
- The MV of ZeroToThree :: 0 is 0.
- The MV of ZeroToThree :: 1 is 1.
- The MV of ZeroToThree :: 2 is 2.
- The MV of ZeroToThree :: 3 is 3.
- The CV of UnicodeEscapeSequence :: \u HexDigit HexDigit HexDigit HexDigit is the Unicode character whose code is (4096 (that is, 163 ) times the MV of the first HexDigit ) plus (256 (that is, 16 2 ) times the MV of the second HexDigit ) plus (16 times the MV of the third HexDigit ) plus the MV of the fourth HexDigit .
Note that a LineTerminator character cannot appear in a string literal, even if preceded by a backslash \ . The correct way to cause a line terminator character to be part of the string value of a string literal is to use an escape sequence such as \n or \u000A .
7.8 Automatic semicolon insertion
Certain ECMAScript statements (empty statement, variable statement, expression statement, continue statement, break statement, and return statement) must each be terminated with a semicolon. Such a semicolon may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. We describe such situations by saying that semicolons are automatically inserted into the source code token stream in those situations.
7.8.1 Rules of automatic semicolon insertion
-
When, as the program is parsed from left to right, a token (called the
offending token
) is encountered that is not allowed by any production of the grammar and the parser is not currently parsing the header of a
for
statement, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one LineTerminator .
- The offending token is } .
• When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program , then a semicolon is automatically inserted at the end of the input stream.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement.
-
When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a
restricted production
and the token would be the first token for a terminal or nonterminal immediately following the annotation "[no
LineTerminator
here]" within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one
LineTerminator
, then there are two cases:
- If the parser is not currently parsing the header of a for statement, a semicolon is automatically inserted before the restricted token.
- If the parser is currently parsing the header of a for statement, it is a syntax error.
These are all the restricted productions in the grammar:
- PostfixExpression :
- LeftHandSideExpression [no LineTerminator here] ++ LeftHandSideExpression [no LineTerminator here] --
- ReturnStatement :
- return [no LineTerminator here] Expressionopt ;
The practical effect of these restricted productions is as follows:
- When the token ++ or -- is encountered where the parser would treat it as a postfix operator, and at least one LineTerminator occurred between the preceding token and the ++ or -- token, then a semicolon is automatically inserted before the ++ or -- token.
- When the token return is encountered and a LineTerminator is encountered before the next token is encountered, a semicolon is automatically inserted after the token return .
The resulting practical advice to ECMAScript programmers is:
- A postfix ++ or -- operator should appear on the same line as its operand.
- An Expression in a return statement should start on the same line as the return token.
7.8.2 Examples of Automatic Semicolon Insertion
The source
{ 1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source
{ 1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{ 1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for (a; b )
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the place where a semicolon is needed is within the header of a for statement. Automatic semicolon insertion never occurs within the header of a for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
Note that the expression a + b is not treated as a value to be returned by the return statement, because a LineTerminator separates it from the token return .
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
Note that the token ++ is not treated as a postfix operator applying to the variable b , because a LineTerminator occurs between b and ++ .
The source
if (a > b) else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would then be parsed as an empty statement.
The source
a = b + c (d + e).print()
is not transformed by automatic semicolon insertion, because the parenthesized expression that begins the second line can be interpreted as an argument list for a function call:
a = b + c(d + e).print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.