11 ECMAScript Language: Source Text
11.1 Source Text
Syntax
ECMAScript source text is a sequence of Unicode code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in ECMAScript source text where permitted by the ECMAScript grammars. The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification. Regardless of the external source text encoding, a conforming ECMAScript implementation processes the source text as if it was an equivalent sequence of
The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.
In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.
ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence \u000A, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence \u000A occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write \n instead of \u000A to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.
11.1.1 Static Semantics: UTF16EncodeCodePoint ( cp )
The abstract operation UTF16EncodeCodePoint takes argument cp (a Unicode code point) and returns a String. It performs the following steps when called:
Assert : 0 ≤ cp ≤ 0x10FFFF.- If cp ≤ 0xFFFF, return the String value consisting of the code unit whose numeric value is cp.
- Let cu1 be the code unit whose numeric value is
floor ((cp - 0x10000) / 0x400) + 0xD800. - Let cu2 be the code unit whose numeric value is ((cp - 0x10000)
modulo 0x400) + 0xDC00. - Return the
string-concatenation of cu1 and cu2.
11.1.2 Static Semantics: CodePointsToString ( text )
The abstract operation CodePointsToString takes argument text (a sequence of Unicode code points) and returns a String. It converts text into a String value, as described in
- Let result be the empty String.
- For each code point cp of text, do
- Set result to the
string-concatenation of result andUTF16EncodeCodePoint (cp).
- Set result to the
- Return result.
11.1.3 Static Semantics: UTF16SurrogatePairToCodePoint ( lead, trail )
The abstract operation UTF16SurrogatePairToCodePoint takes arguments lead (a code unit) and trail (a code unit) and returns a code point. Two code units that form a UTF-16
Assert : lead is aleading surrogate and trail is atrailing surrogate .- Let cp be (lead - 0xD800) × 0x400 + (trail - 0xDC00) + 0x10000.
- Return the code point cp.
11.1.4 Static Semantics: CodePointAt ( string, position )
The abstract operation CodePointAt takes arguments string (a String) and position (a non-negative
- Let size be the length of string.
Assert : position ≥ 0 and position < size.- Let first be the code unit at index position within string.
- Let cp be the code point whose numeric value is the numeric value of first.
- If first is neither a
leading surrogate nor atrailing surrogate , then- Return the
Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]:false }.
- Return the
- If first is a
trailing surrogate or position + 1 = size, then- Return the
Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]:true }.
- Return the
- Let second be the code unit at index position + 1 within string.
- If second is not a
trailing surrogate , then- Return the
Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]:true }.
- Return the
- Set cp to
UTF16SurrogatePairToCodePoint (first, second). - Return the
Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 2, [[IsUnpairedSurrogate]]:false }.
11.1.5 Static Semantics: StringToCodePoints ( string )
The abstract operation StringToCodePoints takes argument string (a String) and returns a
- Let codePoints be a new empty
List . - Let size be the length of string.
- Let position be 0.
- Repeat, while position < size,
- Let cp be
CodePointAt (string, position). - Append cp.[[CodePoint]] to codePoints.
- Set position to position + cp.[[CodeUnitCount]].
- Let cp be
- Return codePoints.
11.1.6 Static Semantics: ParseText ( sourceText, goalSymbol )
The abstract operation ParseText takes arguments sourceText (a sequence of Unicode code points) and goalSymbol (a nonterminal in one of the ECMAScript grammars) and returns a
- Attempt to parse sourceText using goalSymbol as the
goal symbol , and analyse the parse result for anyearly error conditions. Parsing andearly error detection may be interleaved in animplementation-defined manner. - If the parse succeeded and no
early errors were found, return theParse Node (an instance of goalSymbol) at the root of the parse tree resulting from the parse. - Otherwise, return a
List of one or moreSyntaxError objects representing the parsing errors and/orearly errors . If more than one parsing error orearly error is present, the number and ordering of error objects in the list isimplementation-defined , but at least one must be present.
Consider a text that has an
See also clause
11.2 Types of Source Code
There are four types of ECMAScript code:
-
Global code is source text that is treated as an ECMAScript
Script . The global code of a particularScript does not include any source text that is parsed as part of aFunctionDeclaration ,FunctionExpression ,GeneratorDeclaration ,GeneratorExpression ,AsyncFunctionDeclaration ,AsyncFunctionExpression ,AsyncGeneratorDeclaration ,AsyncGeneratorExpression ,MethodDefinition ,ArrowFunction ,AsyncArrowFunction ,ClassDeclaration , orClassExpression . -
Eval code is the source text supplied to the built-in
evalfunction. More precisely, if the parameter to the built-inevalfunctionis a String , it is treated as an ECMAScriptScript . The eval code for a particular invocation ofevalis the global code portion of thatScript . -
Function code is source text that is parsed to supply the value of the [[ECMAScriptCode]] and [[FormalParameters]] internal slots (see
10.2 ) of an ECMAScriptfunction object . The function code of a particular ECMAScript function does not include any source text that is parsed as the function code of a nestedFunctionDeclaration ,FunctionExpression ,GeneratorDeclaration ,GeneratorExpression ,AsyncFunctionDeclaration ,AsyncFunctionExpression ,AsyncGeneratorDeclaration ,AsyncGeneratorExpression ,MethodDefinition ,ArrowFunction ,AsyncArrowFunction ,ClassDeclaration , orClassExpression .In addition, if the source text referred to above is parsed as:
- the
FormalParameters andFunctionBody of aFunctionDeclaration orFunctionExpression , - the
FormalParameters andGeneratorBody of aGeneratorDeclaration orGeneratorExpression , - the
FormalParameters andAsyncFunctionBody of anAsyncFunctionDeclaration orAsyncFunctionExpression , or - the
FormalParameters andAsyncGeneratorBody of anAsyncGeneratorDeclaration orAsyncGeneratorExpression ,
then the
source text matched by theBindingIdentifier (if any) of that declaration or expression is also included in the function code of the corresponding function. - the
-
Module code is source text that is code that is provided as a
ModuleBody . It is the code that is directly evaluated when a module is initialized. The module code of a particular module does not include any source text that is parsed as part of a nestedFunctionDeclaration ,FunctionExpression ,GeneratorDeclaration ,GeneratorExpression ,AsyncFunctionDeclaration ,AsyncFunctionExpression ,AsyncGeneratorDeclaration ,AsyncGeneratorExpression ,MethodDefinition ,ArrowFunction ,AsyncArrowFunction ,ClassDeclaration , orClassExpression .
Function code is generally provided as the bodies of Function Definitions (
The practical effect of including the
11.2.1 Directive Prologues and the Use Strict Directive
A Directive Prologue is the longest sequence of
A Use Strict Directive is an "use strict" or 'use strict'. A
A
The
11.2.2 Strict Mode Code
An ECMAScript syntactic unit may be processed using either unrestricted or strict mode syntax and semantics (
-
Global code is strict mode code if it begins with aDirective Prologue that contains aUse Strict Directive . -
Module code is always strict mode code. -
All parts of a
ClassDeclaration or aClassExpression are strict mode code. -
Eval code is strict mode code if it begins with aDirective Prologue that contains aUse Strict Directive or if the call toevalis adirect eval that is contained in strict mode code. -
Function code is strict mode code if the associatedFunctionDeclaration ,FunctionExpression ,GeneratorDeclaration ,GeneratorExpression ,AsyncFunctionDeclaration ,AsyncFunctionExpression ,AsyncGeneratorDeclaration ,AsyncGeneratorExpression ,MethodDefinition ,ArrowFunction , orAsyncArrowFunction is contained in strict mode code or if the code that produces the value of the function's [[ECMAScriptCode]] internal slot begins with aDirective Prologue that contains aUse Strict Directive . -
Function code that is supplied as the arguments to the built-in Function, Generator, AsyncFunction, and AsyncGeneratorconstructors is strict mode code if the last argumentis a String that when processed is aFunctionBody that begins with aDirective Prologue that contains aUse Strict Directive .
ECMAScript code that is not strict mode code is called non-strict code.
11.2.3 Non-ECMAScript Functions
An ECMAScript implementation may support the evaluation of function