10 ECMAScript Language: Source Code

10.1 Source Text

Syntax

any Unicode code point

ECMAScript code is expressed using Unicode. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars. The actual encodings used to store and interchange ECMAScript source text is not relevant to this specification. Regardless of the external source text encoding, a conforming ECMAScript implementation processes the source text as if it was an equivalent sequence of SourceCharacter values, each SourceCharacter being a Unicode code point. Conforming ECMAScript implementations are not required to perform any normalization of source text, or behave as though they were performing normalization of source text.

The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.

Note

In string literals, regular expression literals, template literals and identifiers, any Unicode code point may also be expressed using Unicode escape sequences that explicitly express a code point's numeric value. Within a comment, such an escape sequence is effectively ignored as part of the comment.

ECMAScript differs from the Java programming language in the behaviour of Unicode escape sequences. In a Java program, if the Unicode escape sequence \u000A, for example, occurs within a single-line comment, it is interpreted as a line terminator (Unicode code point U+000A is LINE FEED (LF)) and therefore the next code point is not part of the comment. Similarly, if the Unicode escape sequence \u000A occurs within a string literal in a Java program, it is likewise interpreted as a line terminator, which is not allowed within a string literal—one must write \n instead of \u000A to cause a LINE FEED (LF) to be part of the String value of a string literal. In an ECMAScript program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an ECMAScript program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

10.1.1 Static Semantics: UTF16Encoding ( `cp` )

The UTF16Encoding of a numeric code point value, cp, is determined as follows:

Assert: 0 ≤ cp ≤ 0x10FFFF.
If cp ≤ 0xFFFF, return cp.
Let cu1 be floor((cp - 0x10000) / 0x400) + 0xD800.
Let cu2 be ((cp - 0x10000) modulo 0x400) + 0xDC00.
Return the code unit sequence consisting of cu1 followed by cu2.

10.1.2 Static Semantics: UTF16Encode ( `text` )

This abstract operation converts text, a sequence of Unicode code points, into a String value, as described in 6.1.4.

Return the string-concatenation of the code units that are the UTF16Encoding of each code point in text, in order.

10.1.3 Static Semantics: UTF16DecodeSurrogatePair ( `lead`, `trail` )

Two code units, lead and trail, that form a UTF-16 surrogate pair are converted to a code point by performing the following steps:

Assert: lead is a leading surrogate and trail is a trailing surrogate.
Let cp be (lead - 0xD800) × 0x400 + (trail - 0xDC00) + 0x10000.
Return the code point cp.

10.1.4 Static Semantics: CodePointAt ( `string`, `position` )

The abstract operation CodePointAt interprets a String string as a sequence of UTF-16 encoded code points, as described in 6.1.4, and reads from it a single code point starting with the code unit at index position. When called, the following steps are performed:

Let size be the length of string.
Assert: position ≥ 0 and position < size.
Let first be the code unit at index position within string.
Let cp be the code point whose numeric value is that of first.
If first is not a leading surrogate or trailing surrogate, then
1. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: false }.
If first is a trailing surrogate or position + 1 = size, then
1. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.
Let second be the code unit at index position + 1 within string.
If second is not a trailing surrogate, then
1. Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 1, [[IsUnpairedSurrogate]]: true }.
Set cp to ! UTF16DecodeSurrogatePair(first, second).
Return the Record { [[CodePoint]]: cp, [[CodeUnitCount]]: 2, [[IsUnpairedSurrogate]]: false }.

10.1.5 Static Semantics: UTF16DecodeString ( `string` )

This abstract operation accepts a String value string and returns the sequence of Unicode code points that results from interpreting it as UTF-16 encoded Unicode text as described in 6.1.4.

Let codePoints be a new empty List.
Let size be the length of string.
Let position be 0.
Repeat, while position < size,
1. Let cp be ! CodePointAt(string, position).
2. Append cp.[[CodePoint]] to codePoints.
3. Set position to position + cp.[[CodeUnitCount]].
Return codePoints.

10.2 Types of Source Code

There are four types of ECMAScript code:

Global code is source text that is treated as an ECMAScript Script. The global code of a particular Script does not include any source text that is parsed as part of a FunctionDeclaration, FunctionExpression, GeneratorDeclaration, GeneratorExpression, AsyncFunctionDeclaration, AsyncFunctionExpression, AsyncGeneratorDeclaration, AsyncGeneratorExpression, MethodDefinition, ArrowFunction, AsyncArrowFunction, ClassDeclaration, or ClassExpression.
Eval code is the source text supplied to the built-in eval function. More precisely, if the parameter to the built-in eval function is a String, it is treated as an ECMAScript Script. The eval code for a particular invocation of eval is the global code portion of that Script.
Function code is source text that is parsed to supply the value of the [[ECMAScriptCode]] and [[FormalParameters]] internal slots (see 9.2) of an ECMAScript function object. The function code of a particular ECMAScript function does not include any source text that is parsed as the function code of a nested FunctionDeclaration, FunctionExpression, GeneratorDeclaration, GeneratorExpression, AsyncFunctionDeclaration, AsyncFunctionExpression, AsyncGeneratorDeclaration, AsyncGeneratorExpression, MethodDefinition, ArrowFunction, AsyncArrowFunction, ClassDeclaration, or ClassExpression.

In addition, if the source text referred to above is parsed as:
- the FormalParameters and FunctionBody of a FunctionDeclaration or FunctionExpression,
- the FormalParameters and GeneratorBody of a GeneratorDeclaration or GeneratorExpression,
- the FormalParameters and AsyncFunctionBody of an AsyncFunctionDeclaration or AsyncFunctionExpression, or
- the FormalParameters and AsyncGeneratorBody of an AsyncGeneratorDeclaration or AsyncGeneratorExpression,
then the source text matching the BindingIdentifier (if any) of that declaration or expression is also included in the function code of the corresponding function.
Module code is source text that is code that is provided as a ModuleBody. It is the code that is directly evaluated when a module is initialized. The module code of a particular module does not include any source text that is parsed as part of a nested FunctionDeclaration, FunctionExpression, GeneratorDeclaration, GeneratorExpression, AsyncFunctionDeclaration, AsyncFunctionExpression, AsyncGeneratorDeclaration, AsyncGeneratorExpression, MethodDefinition, ArrowFunction, AsyncArrowFunction, ClassDeclaration, or ClassExpression.

Note 1

Function code is generally provided as the bodies of Function Definitions (14.1), Arrow Function Definitions (14.2), Method Definitions (14.3), Generator Function Definitions (14.4), Async Function Definitions (14.7), Async Generator Function Definitions (14.5), and Async Arrow Functions (14.8). Function code is also derived from the arguments to the Function constructor (19.2.1.1), the GeneratorFunction constructor (25.2.1.1), and the AsyncFunction constructor (25.7.1.1).

Note 2

The practical effect of including the BindingIdentifier in function code is that the Early Errors for strict mode code are applied to a BindingIdentifier that is the name of a function whose body contains a "use strict" directive, even if the surrounding code is not strict mode code.

10.2.1 Strict Mode Code

An ECMAScript Script syntactic unit may be processed using either unrestricted or strict mode syntax and semantics. Code is interpreted as strict mode code in the following situations:

Global code is strict mode code if it begins with a Directive Prologue that contains a Use Strict Directive.
Module code is always strict mode code.
All parts of a ClassDeclaration or a ClassExpression are strict mode code.
Eval code is strict mode code if it begins with a Directive Prologue that contains a Use Strict Directive or if the call to eval is a direct eval that is contained in strict mode code.
Function code is strict mode code if the associated FunctionDeclaration, FunctionExpression, GeneratorDeclaration, GeneratorExpression, AsyncFunctionDeclaration, AsyncFunctionExpression, AsyncGeneratorDeclaration, AsyncGeneratorExpression, MethodDefinition, ArrowFunction, or AsyncArrowFunction is contained in strict mode code or if the code that produces the value of the function's [[ECMAScriptCode]] internal slot begins with a Directive Prologue that contains a Use Strict Directive.
Function code that is supplied as the arguments to the built-in Function, Generator, AsyncFunction, and AsyncGenerator constructors is strict mode code if the last argument is a String that when processed is a FunctionBody that begins with a Directive Prologue that contains a Use Strict Directive.

ECMAScript code that is not strict mode code is called non-strict code.

10.2.2 Non-ECMAScript Functions

An ECMAScript implementation may support the evaluation of function exotic objects whose evaluative behaviour is expressed in some implementation-defined form of executable code other than via ECMAScript code. Whether a function object is an ECMAScript code function or a non-ECMAScript function is not semantically observable from the perspective of an ECMAScript code function that calls or is called by such a non-ECMAScript function.

10 ECMAScript Language: Source Code

10.1 Source Text

Syntax

10.1.1 Static Semantics: UTF16Encoding ( cp )

10.1.2 Static Semantics: UTF16Encode ( text )

10.1.3 Static Semantics: UTF16DecodeSurrogatePair ( lead, trail )

10.1.4 Static Semantics: CodePointAt ( string, position )

10.1.5 Static Semantics: UTF16DecodeString ( string )