21 Text Processing
21.1 String Objects
21.1.1 The String Constructor
The String constructor is the String property of the String is called as a function rather than as a constructor, it performs a
The String constructor is designed to be subclassable. It may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified String behaviour must include a super call to the String constructor to create and initialize the subclass instance with a [[StringData]] internal slot.
21.1.1.1 String ( value )
When String is called with argument value, the following steps are taken:
- If no arguments were passed to this function invocation, let s be
"". - Else,
- If NewTarget is
undefined andType (value) is Symbol, returnSymbolDescriptiveString (value). - Let s be ?
ToString (value).
- If NewTarget is
- If NewTarget is
undefined , return s. - Return ?
StringCreate (s, ?GetPrototypeFromConstructor (NewTarget,"%StringPrototype%")).
21.1.2 Properties of the String Constructor
The value of the [[Prototype]] internal slot of the String constructor is the intrinsic object
The String constructor has the following properties:
21.1.2.1 String.fromCharCode ( ...codeUnits )
The String.fromCharCode function may be called with any number of arguments which form the rest parameter codeUnits. The following steps are taken:
- Let codeUnits be a
List containing the arguments passed to this function. - Let length be the number of elements in codeUnits.
- Let elements be a new empty
List . - Let nextIndex be 0.
- Repeat, while nextIndex < length
- Let next be codeUnits[nextIndex].
- Let nextCU be ?
ToUint16 (next). - Append nextCU to the end of elements.
- Let nextIndex be nextIndex + 1.
- Return the String value whose elements are, in order, the elements in the
List elements. If length is 0, the empty string is returned.
The length property of the fromCharCode function is 1.
21.1.2.2 String.fromCodePoint ( ...codePoints )
The String.fromCodePoint function may be called with any number of arguments which form the rest parameter codePoints. The following steps are taken:
- Let codePoints be a
List containing the arguments passed to this function. - Let length be the number of elements in codePoints.
- Let elements be a new empty
List . - Let nextIndex be 0.
- Repeat, while nextIndex < length
- Let next be codePoints[nextIndex].
- Let nextCP be ?
ToNumber (next). - If
SameValue (nextCP,ToInteger (nextCP)) isfalse , throw aRangeError exception. - If nextCP < 0 or nextCP > 0x10FFFF, throw a
RangeError exception. - Append the elements of the
UTF16Encoding of nextCP to the end of elements. - Let nextIndex be nextIndex + 1.
- Return the String value whose elements are, in order, the elements in the
List elements. If length is 0, the empty string is returned.
The length property of the fromCodePoint function is 1.
21.1.2.3 String.prototype
The initial value of String.prototype is the intrinsic object
This property has the attributes { [[Writable]]:
21.1.2.4 String.raw ( template, ...substitutions )
The String.raw function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the
- Let substitutions be a
List consisting of all of the arguments passed to this function, starting with the second argument. If fewer than two arguments were passed, theList is empty. - Let numberOfSubstitutions be the number of elements in substitutions.
- Let cooked be ?
ToObject (template). - Let raw be ?
ToObject (?Get (cooked,"raw")). - Let literalSegments be ?
ToLength (?Get (raw,"length")). - If literalSegments ≤ 0, return the empty string.
- Let stringElements be a new empty
List . - Let nextIndex be 0.
- Repeat,
- Let nextKey be !
ToString (nextIndex). - Let nextSeg be ?
ToString (?Get (raw, nextKey)). - Append in order the code unit elements of nextSeg to the end of stringElements.
- If nextIndex + 1 = literalSegments, then
- Return the String value whose code units are, in order, the elements in the
List stringElements. If stringElements has no elements, the empty string is returned.
- Return the String value whose code units are, in order, the elements in the
- If nextIndex < numberOfSubstitutions, let next be substitutions[nextIndex].
- Else, let next be the empty String.
- Let nextSub be ?
ToString (next). - Append in order the code unit elements of nextSub to the end of stringElements.
- Let nextIndex be nextIndex + 1.
- Let nextKey be !
String.raw is intended for use as a tag function of a Tagged Template (
21.1.3 Properties of the String Prototype Object
The String prototype object is the intrinsic object length property whose initial value is 0 and whose attributes are { [[Writable]]:
The value of the [[Prototype]] internal slot of the String prototype object is the intrinsic object
Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the
The abstract operation thisStringValue(value) performs the following steps:
21.1.3.1 String.prototype.charAt ( pos )
Returns a single element String containing the code unit at index pos in the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result is a String value, not a String object.
If pos is a value of Number x.charAt(pos) is equal to the result of x.substring(pos, pos+1).
When the charAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToInteger (pos). - Let size be the number of elements in S.
- If position < 0 or position ≥ size, return the empty String.
- Return a String of length 1, containing one code unit from S, namely the code unit at index position.
The charAt function is intentionally generic; it does not require that its
21.1.3.2 String.prototype.charCodeAt ( pos )
Returns a Number (a nonnegative integer less than 216) that is the code unit value of the string element at index pos in the String resulting from converting this object to a String. If there is no element at that index, the result is
When the charCodeAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToInteger (pos). - Let size be the number of elements in S.
- If position < 0 or position ≥ size, return
NaN . - Return a value of Number
type , whose value is the code unit value of the element at index position in the String S.
The charCodeAt function is intentionally generic; it does not require that its
21.1.3.3 String.prototype.codePointAt ( pos )
Returns a nonnegative integer Number less than 0x110000 that is the code point value of the UTF-16 encoded code point (
When the codePointAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToInteger (pos). - Let size be the number of elements in S.
- If position < 0 or position ≥ size, return
undefined . - Let first be the code unit value of the element at index position in the String S.
- If first < 0xD800 or first > 0xDBFF or position+1 = size, return first.
- Let second be the code unit value of the element at index position+1 in the String S.
- If second < 0xDC00 or second > 0xDFFF, return first.
- Return
UTF16Decode (first, second).
The codePointAt function is intentionally generic; it does not require that its
21.1.3.4 String.prototype.concat ( ...args )
When the concat method is called it returns a String consisting of the code units of the this object (converted to a String) followed by the code units of each of the arguments converted to a String. The result is a String value, not a String object.
When the concat method is called with zero or more arguments, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let args be a
List whose elements are the arguments passed to this function. - Let R be S.
- Repeat, while args is not empty
- Return R.
The length property of the concat method is 1.
The concat function is intentionally generic; it does not require that its
21.1.3.5 String.prototype.constructor
The initial value of String.prototype.constructor is the intrinsic object
21.1.3.6 String.prototype.endsWith ( searchString [ , endPosition ] )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let len be the number of elements in S.
- If endPosition is
undefined , let pos be len, else let pos be ?ToInteger (endPosition). - Let end be
min (max (pos, 0), len). - Let searchLength be the number of elements in searchStr.
- Let start be end - searchLength.
- If start is less than 0, return
false . - If the sequence of elements of S starting at start of length searchLength is the same as the full element sequence of searchStr, return
true . - Otherwise, return
false .
Returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The endsWith function is intentionally generic; it does not require that its
21.1.3.7 String.prototype.includes ( searchString [ , position ] )
The includes method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let pos be ?
ToInteger (position). (If position isundefined , this step produces the value 0.) - Let len be the number of elements in S.
- Let start be
min (max (pos, 0), len). - Let searchLen be the number of elements in searchStr.
- If there exists any integer k not smaller than start such that k + searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr, return
true ; but if there is no such integer k, returnfalse .
If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, return
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The includes function is intentionally generic; it does not require that its
21.1.3.8 String.prototype.indexOf ( searchString [ , position ] )
If searchString appears as a substring of the result of converting this object to a String, at one or more indices that are greater than or equal to position, then the smallest such index is returned; otherwise, -1 is returned. If position is
The indexOf method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let searchStr be ?
ToString (searchString). - Let pos be ?
ToInteger (position). (If position isundefined , this step produces the value 0.) - Let len be the number of elements in S.
- Let start be
min (max (pos, 0), len). - Let searchLen be the number of elements in searchStr.
- Return the smallest possible integer k not smaller than start such that k+searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr; but if there is no such integer k, return the value -1.
The indexOf function is intentionally generic; it does not require that its
21.1.3.9 String.prototype.lastIndexOf ( searchString [ , position ] )
If searchString appears as a substring of the result of converting this object to a String at one or more indices that are smaller than or equal to position, then the greatest such index is returned; otherwise, -1 is returned. If position is
The lastIndexOf method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let searchStr be ?
ToString (searchString). - Let numPos be ?
ToNumber (position). (If position isundefined , this step produces the valueNaN .) - If numPos is
NaN , let pos be+∞ ; otherwise, let pos beToInteger (numPos). - Let len be the number of elements in S.
- Let start be
min (max (pos, 0), len). - Let searchLen be the number of elements in searchStr.
- Return the largest possible nonnegative integer k not larger than start such that k+searchLen is not greater than len, and for all nonnegative integers j less than searchLen, the code unit at index k+j of S is the same as the code unit at index j of searchStr; but if there is no such integer k, return the value -1.
The lastIndexOf function is intentionally generic; it does not require that its
21.1.3.10 String.prototype.localeCompare ( that [ , reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the localeCompare method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the localeCompare method is used.
When the localeCompare method is called with argument that, it returns a Number other than
Before performing the comparisons, the following steps are performed to prepare the Strings:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let That be ?
ToString (that).
The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.
The localeCompare method, if considered as a function of two arguments
The actual return values are implementation-defined to permit implementers to encode additional information in the value, but the function is required to define a total ordering on all Strings. This function must treat Strings that are canonically equivalent according to the Unicode standard as identical and must return 0 when comparing Strings that are considered canonically equivalent.
The localeCompare method itself is not directly suitable as an argument to Array.prototype.sort because the latter requires a function of two arguments.
This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the host environment, and to compare according to the rules of the host environment's current locale. However, regardless of the host provided comparison capabilities, this function must treat Strings that are canonically equivalent according to the Unicode standard as identical. It is recommended that this function should not honour Unicode compatibility equivalences or decompositions. For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms (http://www.unicode.org/reports/tr15/) and Unicode Technical Note #5, Canonical Equivalence in Applications (http://www.unicode.org/notes/tn5/). Also see Unicode Technical Standard #10, Unicode Collation Algorithm (http://www.unicode.org/reports/tr10/).
The localeCompare function is intentionally generic; it does not require that its
21.1.3.11 String.prototype.match ( regexp )
When the match method is called with argument regexp, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If regexp is neither
undefined nornull , then - Let S be ?
ToString (O). - Let rx be ?
RegExpCreate (regexp,undefined ). - Return ?
Invoke (rx, @@match, « S »).
The match function is intentionally generic; it does not require that its
21.1.3.12 String.prototype.normalize ( [ form ] )
When the normalize method is called with one argument form, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - If form is not provided or form is
undefined , let form be"NFC". - Let f be ?
ToString (form). - If f is not one of
"NFC","NFD","NFKC", or"NFKD", throw aRangeError exception. - Let ns be the String value that is the result of normalizing S into the normalization form named by f as specified in http://www.unicode.org/reports/tr15/.
- Return ns.
The normalize function is intentionally generic; it does not require that its
21.1.3.13 String.prototype.padEnd( maxLength [ , fillString ] )
When the padEnd method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let intMaxLength be ?
ToLength (maxLength). - Let stringLength be the number of elements in S.
- If intMaxLength is not greater than stringLength, return S.
- If fillString is
undefined , let filler be a String consisting solely of the code unit 0x0020 (SPACE). - Else, let filler be ?
ToString (fillString). - If filler is the empty String, return S.
- Let fillLen be intMaxLength - stringLength.
- Let truncatedStringFiller be a new String value consisting of repeated concatenations of filler truncated to length fillLen.
- Return a new String value computed by the concatenation of S and truncatedStringFiller.
The first argument maxLength will be clamped such that it can be no smaller than the length of the
The optional second argument fillString defaults to
21.1.3.14 String.prototype.padStart( maxLength [ , fillString ] )
When the padStart method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let intMaxLength be ?
ToLength (maxLength). - Let stringLength be the number of elements in S.
- If intMaxLength is not greater than stringLength, return S.
- If fillString is
undefined , let filler be a String consisting solely of the code unit 0x0020 (SPACE). - Else, let filler be ?
ToString (fillString). - If filler is the empty String, return S.
- Let fillLen be intMaxLength - stringLength.
- Let truncatedStringFiller be a new String value consisting of repeated concatenations of filler truncated to length fillLen.
- Return a new String value computed by the concatenation of truncatedStringFiller and S.
The first argument maxLength will be clamped such that it can be no smaller than the length of the
The optional second argument fillString defaults to
21.1.3.15 String.prototype.repeat ( count )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let n be ?
ToInteger (count). - If n < 0, throw a
RangeError exception. - If n is
+∞ , throw aRangeError exception. - Let T be a String value that is made from n copies of S appended together. If n is 0, T is the empty String.
- Return T.
This method creates a String consisting of the code units of the this object (converted to String) repeated count times.
The repeat function is intentionally generic; it does not require that its
21.1.3.16 String.prototype.replace ( searchValue, replaceValue )
When the replace method is called with arguments searchValue and replaceValue, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If searchValue is neither
undefined nornull , then - Let string be ?
ToString (O). - Let searchString be ?
ToString (searchValue). - Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Let replaceValue be ?
ToString (replaceValue).
- Let replaceValue be ?
- Search string for the first occurrence of searchString and let pos be the index within string of the first code unit of the matched substring and let matched be searchString. If no occurrences of searchString were found, return string.
- If functionalReplace is
true , then - Else,
- Let captures be a new empty
List . - Let replStr be
GetSubstitution (matched, string, pos, captures, replaceValue).
- Let captures be a new empty
- Let tailPos be pos + the number of code units in matched.
- Let newString be the String formed by concatenating the first pos code units of string, replStr, and the trailing substring of string starting at index tailPos. If pos is 0, the first element of the concatenation will be the empty String.
- Return newString.
The replace function is intentionally generic; it does not require that its
21.1.3.16.1 Runtime Semantics: GetSubstitution( matched, str, position, captures, replacement )
The abstract operation GetSubstitution performs the following steps:
Assert :Type (matched) is String.- Let matchLength be the number of code units in matched.
Assert :Type (str) is String.- Let stringLength be the number of code units in str.
Assert : position is a nonnegative integer.Assert : position ≤ stringLength.Assert : captures is a possibly emptyList of Strings.Assert :Type (replacement) is String.- Let tailPos be position + matchLength.
- Let m be the number of elements in captures.
- Let result be a String value derived from replacement by copying code unit elements from replacement to result while performing replacements as specified in
Table 46 . These$replacements are done left-to-right, and, once such a replacement is performed, the new replacement text is not subject to further replacements. - Return result.
| Code units | Unicode Characters | Replacement text |
|---|---|---|
| 0x0024, 0x0024 |
$$
|
$
|
| 0x0024, 0x0026 |
$&
|
matched |
| 0x0024, 0x0060 |
$`
|
If position is 0, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index 0 and whose last code unit is at index position - 1. |
| 0x0024, 0x0027 |
$'
|
If tailPos ≥ stringLength, the replacement is the empty String. Otherwise the replacement is the substring of str that starts at index tailPos and continues to the end of str. |
|
0x0024, N
Where 0x0031 ≤ N ≤ 0x0039 |
$n where
n is one of 1 2 3 4 5 6 7 8 9 and $n is not followed by a decimal digit
|
The nth element of captures, where n is a single digit in the range 1 to 9. If n≤m and the nth element of captures is |
|
0x0024, N, N
Where 0x0030 ≤ N ≤ 0x0039 |
$nn where
n is one of 0 1 2 3 4 5 6 7 8 9
|
The nnth element of captures, where nn is a two-digit decimal number in the range 01 to 99. If nn≤m and the nnth element of captures is |
| 0x0024 |
$ in any context that does not match any of the above.
|
$
|
21.1.3.17 String.prototype.search ( regexp )
When the search method is called with argument regexp, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If regexp is neither
undefined nornull , then - Let string be ?
ToString (O). - Let rx be ?
RegExpCreate (regexp,undefined ). - Return ?
Invoke (rx, @@search, « string »).
The search function is intentionally generic; it does not require that its
21.1.3.18 String.prototype.slice ( start, end )
The slice method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end (or through the end of the String if end is
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let len be the number of elements in S.
- Let intStart be ?
ToInteger (start). - If end is
undefined , let intEnd be len; else let intEnd be ?ToInteger (end). - If intStart < 0, let from be
max (len + intStart, 0); otherwise let from bemin (intStart, len). - If intEnd < 0, let to be
max (len + intEnd, 0); otherwise let to bemin (intEnd, len). - Let span be
max (to - from, 0). - Return a String value containing span consecutive elements from S beginning with the element at index from.
The slice function is intentionally generic; it does not require that its
21.1.3.19 String.prototype.split ( separator, limit )
Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any substring in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a @@split method.
When the split method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If separator is neither
undefined nornull , then - Let S be ?
ToString (O). - Let A be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232-1; else let lim be ?ToUint32 (limit). - Let s be the number of elements in S.
- Let p be 0.
- Let R be ?
ToString (separator). - If lim = 0, return A.
- If separator is
undefined , then- Perform !
CreateDataProperty (A,"0", S). - Return A.
- Perform !
- If s = 0, then
- Let z be
SplitMatch (S, 0, R). - If z is not
false , return A. - Perform !
CreateDataProperty (A,"0", S). - Return A.
- Let z be
- Let q be p.
- Repeat, while q ≠ s
- Let e be
SplitMatch (S, q, R). - If e is
false , let q be q+1. - Else e is an integer index ≤ s,
- If e = p, let q be q+1.
- Else e ≠ p,
- Let T be a String value equal to the substring of S consisting of the code units at indices p (inclusive) through q (exclusive).
- Perform !
CreateDataProperty (A, !ToString (lengthA), T). - Increment lengthA by 1.
- If lengthA = lim, return A.
- Let p be e.
- Let q be p.
- Let e be
- Let T be a String value equal to the substring of S consisting of the code units at indices p (inclusive) through s (exclusive).
- Perform !
CreateDataProperty (A, !ToString (lengthA), T). - Return A.
The value of separator may be an empty String. In this case, separator does not match the empty substring at the beginning or end of the input String, nor does it match the empty substring at the end of the previous separator match. If separator is the empty String, the String is split up into individual code unit elements; the length of the result array equals the length of the String, and each substring contains one code unit.
If the
If separator is
The split function is intentionally generic; it does not require that its
21.1.3.19.1 Runtime Semantics: SplitMatch ( S, q, R )
The abstract operation SplitMatch takes three parameters, a String S, an integer q, and a String R, and performs the following steps in order to return either
Assert :Type (R) is String.- Let r be the number of code units in R.
- Let s be the number of code units in S.
- If q+r > s, return
false . - If there exists an integer i between 0 (inclusive) and r (exclusive) such that the code unit at index q+i of S is different from the code unit at index i of R, return
false . - Return q+r.
21.1.3.20 String.prototype.startsWith ( searchString [ , position ] )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let pos be ?
ToInteger (position). (If position isundefined , this step produces the value 0.) - Let len be the number of elements in S.
- Let start be
min (max (pos, 0), len). - Let searchLength be the number of elements in searchStr.
- If searchLength+start is greater than len, return
false . - If the sequence of elements of S starting at start of length searchLength is the same as the full element sequence of searchStr, return
true . - Otherwise, return
false .
This method returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The startsWith function is intentionally generic; it does not require that its
21.1.3.21 String.prototype.substring ( start, end )
The substring method takes two arguments, start and end, and returns a substring of the result of converting this object to a String, starting from index start and running to, but not including, index end of the String (or through the end of the String if end is
If either argument is
If start is larger than end, they are swapped.
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let len be the number of elements in S.
- Let intStart be ?
ToInteger (start). - If end is
undefined , let intEnd be len; else let intEnd be ?ToInteger (end). - Let finalStart be
min (max (intStart, 0), len). - Let finalEnd be
min (max (intEnd, 0), len). - Let from be
min (finalStart, finalEnd). - Let to be
max (finalStart, finalEnd). - Return a String whose length is to - from, containing code units from S, namely the code units with indices from through to - 1, in ascending order.
The substring function is intentionally generic; it does not require that its
21.1.3.22 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleLowerCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleLowerCase method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the host environment's current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
The toLocaleLowerCase function is intentionally generic; it does not require that its
21.1.3.23 String.prototype.toLocaleUpperCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleUpperCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleUpperCase method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function works exactly the same as toUpperCase except that its result is intended to yield the correct result for the host environment's current locale, rather than a locale-independent result. There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
The toLocaleUpperCase function is intentionally generic; it does not require that its
21.1.3.24 String.prototype.toLowerCase ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let cpList be a
List containing in order the code points as defined in6.1.4 of S, starting at the first element of S. - For each code point c in cpList, if the Unicode Character Database provides a language insensitive lower case equivalent of c, then replace c in cpList with that equivalent code point(s).
- Let cuList be a new empty
List . - For each code point c in cpList, in order, append to cuList the elements of the
UTF16Encoding of c. - Let L be a String whose elements are, in order, the elements of cuList.
- Return L.
The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both toUpperCase and toLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().
The toLowerCase function is intentionally generic; it does not require that its
21.1.3.25 String.prototype.toString ( )
When the toString method is called, the following steps are taken:
- Return ?
thisStringValue (this value).
For a String object, the toString method happens to return the same thing as the valueOf method.
21.1.3.26 String.prototype.toUpperCase ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function behaves in exactly the same way as String.prototype.toLowerCase, except that code points are mapped to their uppercase equivalents as specified in the Unicode Character Database.
The toUpperCase function is intentionally generic; it does not require that its
21.1.3.27 String.prototype.trim ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let T be a String value that is a copy of S with both leading and trailing white space removed. The definition of white space is the union of
WhiteSpace andLineTerminator . When determining whether a Unicode code point is in Unicode general category “Space_Separator” (“Zs”), code unit sequences are interpreted as UTF-16 encoded code point sequences as specified in6.1.4 . - Return T.
The trim function is intentionally generic; it does not require that its
21.1.3.28 String.prototype.valueOf ( )
When the valueOf method is called, the following steps are taken:
- Return ?
thisStringValue (this value).
21.1.3.29 String.prototype [ @@iterator ] ( )
When the @@iterator method is called it returns an Iterator object (
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Return
CreateStringIterator (S).
The value of the name property of this function is "[Symbol.iterator]".
21.1.4 Properties of String Instances
String instances are String exotic objects and have the internal methods specified for such objects. String instances inherit properties from the String prototype object. String instances also have a [[StringData]] internal slot.
String instances have a length property, and a set of enumerable properties with integer indexed names.
21.1.4.1 length
The number of elements in the String value represented by this String object.
Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]:
21.1.5 String Iterator Objects
A String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named constructor for String Iterator objects. Instead, String iterator objects are created by calling certain methods of String instance objects.
21.1.5.1 CreateStringIterator ( string )
Several methods of String objects return Iterator objects. The abstract operation CreateStringIterator with argument string is used to create such iterator objects. It performs the following steps:
Assert :Type (string) is String.- Let iterator be
ObjectCreate (%StringIteratorPrototype% , « [[IteratedString]], [[StringIteratorNextIndex]] »). Set iterator.[[IteratedString]] to string.Set iterator.[[StringIteratorNextIndex]] to 0.- Return iterator.
21.1.5.2 The %StringIteratorPrototype% Object
All String Iterator Objects inherit properties from the
21.1.5.2.1 %StringIteratorPrototype% .next ( )
- Let O be the
this value. - If
Type (O) is not Object, throw aTypeError exception. - If O does not have all of the internal slots of a String Iterator Instance (
21.1.5.3 ), throw aTypeError exception. - Let s be O.[[IteratedString]].
- If s is
undefined , returnCreateIterResultObject (undefined ,true ). - Let position be O.[[StringIteratorNextIndex]].
- Let len be the number of elements in s.
- If position ≥ len, then
Set O.[[IteratedString]] toundefined .- Return
CreateIterResultObject (undefined ,true ).
- Let first be the code unit value at index position in s.
- If first < 0xD800 or first > 0xDBFF or position+1 = len, let resultString be the String consisting of the single code unit first.
- Else,
- Let second be the code unit value at index position+1 in the String S.
- If second < 0xDC00 or second > 0xDFFF, let resultString be the String consisting of the single code unit first.
- Else, let resultString be the String consisting of the code unit first followed by the code unit second.
- Let resultSize be the number of code units in resultString.
Set O.[[StringIteratorNextIndex]] to position + resultSize.- Return
CreateIterResultObject (resultString,false ).
21.1.5.2.2 %StringIteratorPrototype% [ @@toStringTag ]
The initial value of the @@toStringTag property is the String value "String Iterator".
This property has the attributes { [[Writable]]:
21.1.5.3 Properties of String Iterator Instances
String Iterator instances are ordinary objects that inherit properties from the
| Internal Slot | Description |
|---|---|
| [[IteratedString]] | The String value whose elements are being iterated. |
| [[StringIteratorNextIndex]] | The integer index of the next string index to be examined by this iteration. |
21.2 RegExp (Regular Expression) Objects
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
21.2.1 Patterns
The RegExp constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of
Syntax
Each \u u u \u
21.2.1.1 Static Semantics: Early Errors
-
It is a Syntax Error if the MV of
HexDigits > 0x10FFFF.
21.2.2 Pattern Semantics
A regular expression pattern is converted into an internal procedure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The internal procedure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.
A "u". A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of
For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character)
Patterns are passed to the RegExp constructor as ECMAScript String values in which non-BMP characters are UTF-16 encoded. For example, the single character MUSICAL SYMBOL G CLEF pattern, expressed as a String value, is a String of length 2 whose elements were the code units 0xD834 and 0xDD1E. So no further translation of the string would be necessary to process it as a BMP pattern consisting of two pattern characters. However, to process it as a Unicode pattern
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
21.2.2.1 Notation
The descriptions below use the following variables:
-
Input is a
List consisting of all of the characters, in order, of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation Input[n] means the nth character of Input, where n can range between 0 (inclusive) and InputLength (exclusive). - InputLength is the number of characters in Input.
-
NcapturingParens is the total number of left-capturing parentheses (i.e. the total number of
Parse Nodes) in the pattern. A left-capturing parenthesis is anyAtom :: ( Disjunction ) (pattern character that is matched by the(terminal of the production.Atom :: ( Disjunction ) -
IgnoreCase is
true if the RegExp object's [[OriginalFlags]] internal slot contains"i"and otherwise isfalse . -
Multiline is
true if the RegExp object's [[OriginalFlags]] internal slot contains"m"and otherwise isfalse . -
Unicode is
true if the RegExp object's [[OriginalFlags]] internal slot contains"u"and otherwise isfalse .
Furthermore, the descriptions below use the following internal data structures:
- A CharSet is a mathematical set of characters, either code units or code points depending up the state of the Unicode flag. “All characters” means either all code unit values or all code point values also depending upon the state if Unicode.
-
A State is an ordered pair (endIndex, captures) where endIndex is an integer and captures is a
List of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The nth element of captures is either aList that represents the value obtained by the nth set of capturing parentheses orundefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. -
A MatchResult is either a State or the special token
failure that indicates that the match failed. -
A Continuation procedure is an internal closure (i.e. an internal procedure with some arguments already bound to values) that takes one State argument and returns a MatchResult result. If an internal closure references variables which are bound in the function that creates the closure, the closure uses the values that these variables had at the time the closure was created. The Continuation attempts to match the remaining portion (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returns
failure . - A Matcher procedure is an internal closure that takes two arguments — a State and a Continuation — and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's already-bound arguments) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.
-
An AssertionTester procedure is an internal closure that takes a State argument and returns a Boolean result. The assertion tester tests a specific condition (specified by the closure's already-bound arguments) against the current place in Input and returns
true if the condition matched orfalse if not.
21.2.2.2 Pattern
The production
- Evaluate
Disjunction to obtain a Matcher m. - Return an internal closure that takes two arguments, a String str and an integer index, and performs the following steps:
Assert : index ≤ the number of elements in str.- If Unicode is
true , let Input be aList consisting of the sequence of code points of str interpreted as a UTF-16 encoded (6.1.4 ) Unicode string. Otherwise, let Input be aList consisting of the sequence of code units that are the elements of str. Input will be used throughout the algorithms in21.2.2 . Each element of Input is considered to be a character. - Let InputLength be the number of characters contained in Input. This variable will be used throughout the algorithms in
21.2.2 . - Let listIndex be the index into Input of the character that was obtained from element index of str.
- Let c be a Continuation that always returns its State argument as a successful MatchResult.
- Let cap be a
List of NcapturingParensundefined values, indexed 1 through NcapturingParens. - Let x be the State (listIndex, cap).
- Call m(x, c) and return its result.
A Pattern evaluates (“compiles”) to an internal procedure value.
21.2.2.3 Disjunction
The production
- Evaluate
Alternative to obtain a Matcher m. - Return m.
The production
- Evaluate
Alternative to obtain a Matcher m1. - Evaluate
Disjunction to obtain a Matcher m2. - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
- Call m1(x, c) and let r be its result.
- If r is not
failure , return r. - Call m2(x, c) and return its result.
The | regular expression operator separates two alternatives. The pattern first tries to match the left | produce
/a|ab/.exec("abc")
returns the result "a" and not "ab". Moreover,
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
21.2.2.4 Alternative
The production
- Return a Matcher that takes two arguments, a State x and a Continuation c, and returns the result of calling c(x).
The production
- Evaluate
Alternative to obtain a Matcher m1. - Evaluate
Term to obtain a Matcher m2. - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
- Let d be a Continuation that takes a State argument y and returns the result of calling m2(y, c).
- Call m1(x, d) and return its result.
Consecutive
21.2.2.5 Term
The production
- Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
- Evaluate
Assertion to obtain an AssertionTester t. - Call t(x) and let r be the resulting Boolean value.
- If r is
false , returnfailure . - Call c(x) and return its result.
- Evaluate
The production
- Return the Matcher that is the result of evaluating
Atom .
The production
- Evaluate
Atom to obtain a Matcher m. - Evaluate
Quantifier to obtain the three results: an integer min, an integer (or ∞) max, and Boolean greedy. - If max is finite and less than min, throw a
SyntaxError exception. - Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this
Term . This is the total number of Parse Nodes prior to or enclosing thisAtom :: ( Disjunction ) Term . - Let parenCount be the number of left-capturing parentheses in
Atom . This is the total number of Parse Nodes enclosed byAtom :: ( Disjunction ) Atom . - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
- Call
RepeatMatcher (m, min, max, greedy, x, c, parenIndex, parenCount) and return its result.
- Call
21.2.2.5.1 Runtime Semantics: RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )
The abstract operation RepeatMatcher takes eight parameters, a Matcher m, an integer min, an integer (or ∞) max, a Boolean greedy, a State x, a Continuation c, an integer parenIndex, and an integer parenCount, and performs the following steps:
- If max is zero, return c(x).
- Let d be an internal Continuation closure that takes one State argument y and performs the following steps when evaluated:
- If min is zero and y's endIndex is equal to x's endIndex, return
failure . - If min is zero, let min2 be zero; otherwise let min2 be min-1.
- If max is ∞, let max2 be ∞; otherwise let max2 be max-1.
- Call RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount) and return its result.
- If min is zero and y's endIndex is equal to x's endIndex, return
- Let cap be a fresh copy of x's captures
List . - For each integer k that satisfies parenIndex < k and k ≤ parenIndex+parenCount, set cap[k] to
undefined . - Let e be x's endIndex.
- Let xr be the State (e, cap).
- If min is not zero, return m(xr, d).
- If greedy is
false , then- Call c(x) and let z be its result.
- If z is not
failure , return z. - Call m(xr, d) and return its result.
- Call m(xr, d) and let z be its result.
- If z is not
failure , return z. - Call c(x) and return its result.
An
If the
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns "abcde" with
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns "abc".
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/,"$1")
which returns the gcd in unary notation "aaaaa".
Step 4 of the RepeatMatcher clears
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost * clears all captured Strings contained in the quantified
Step 1 of the RepeatMatcher's d closure states that, once the minimum number of repetitions has been satisfied, any more expansions of
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
21.2.2.6 Assertion
The production
- Return an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:
- Let e be x's endIndex.
- If e is zero, return
true . - If Multiline is
false , returnfalse . - If the character Input[e-1] is one of
LineTerminator , returntrue . - Return
false .
Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is
The production
- Return an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:
- Let e be x's endIndex.
- If e is equal to InputLength, return
true . - If Multiline is
false , returnfalse . - If the character Input[e] is one of
LineTerminator , returntrue . - Return
false .
The production
- Return an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:
- Let e be x's endIndex.
- Call
IsWordChar (e-1) and let a be the Boolean result. - Call
IsWordChar (e) and let b be the Boolean result. - If a is
true and b isfalse , returntrue . - If a is
false and b istrue , returntrue . - Return
false .
The production
- Return an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:
- Let e be x's endIndex.
- Call
IsWordChar (e-1) and let a be the Boolean result. - Call
IsWordChar (e) and let b be the Boolean result. - If a is
true and b isfalse , returnfalse . - If a is
false and b istrue , returnfalse . - Return
true .
The production
- Evaluate
Disjunction to obtain a Matcher m. - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
- Let d be a Continuation that always returns its State argument as a successful MatchResult.
- Call m(x, d) and let r be its result.
- If r is
failure , returnfailure . - Let y be r's State.
- Let cap be y's captures
List . - Let xe be x's endIndex.
- Let z be the State (xe, cap).
- Call c(z) and return its result.
The production
- Evaluate
Disjunction to obtain a Matcher m. - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
- Let d be a Continuation that always returns its State argument as a successful MatchResult.
- Call m(x, d) and let r be its result.
- If r is not
failure , returnfailure . - Call c(x) and return its result.
21.2.2.6.1 Runtime Semantics: WordCharacters ( )
The abstract operation WordCharacters performs the following steps:
- Let A be a set of characters containing the sixty-three characters:
- Let U be an empty set.
- For each character c not in set A where
Canonicalize (c) is in A, add c to U. Assert : Unless Unicode and IgnoreCase are bothtrue , U is empty.- Add the characters in set U to set A.
- Return A.
21.2.2.6.2 Runtime Semantics: IsWordChar ( e )
The abstract operation IsWordChar takes an integer parameter e and performs the following steps:
- If e is -1 or e is InputLength, return
false . - Let c be the character Input[e].
- Let wordChars be the result of !
WordCharacters (). - If c is in wordChars, return
true . - Return
false .
21.2.2.7 Quantifier
The production
- Evaluate
QuantifierPrefix to obtain the two results: an integer min and an integer (or ∞) max. - Return the three results min, max, and
true .
The production
- Evaluate
QuantifierPrefix to obtain the two results: an integer min and an integer (or ∞) max. - Return the three results min, max, and
false .
The production
- Return the two results 0 and ∞.
The production
- Return the two results 1 and ∞.
The production
- Return the two results 0 and 1.
The production
- Let i be the MV of
DecimalDigits (see11.8.3 ). - Return the two results i and i.
The production
- Let i be the MV of
DecimalDigits . - Return the two results i and ∞.
The production
- Let i be the MV of the first
DecimalDigits . - Let j be the MV of the second
DecimalDigits . - Return the two results i and j.
21.2.2.8 Atom
The production
- Let ch be the character matched by
PatternCharacter . - Let A be a one-element CharSet containing the character ch.
- Call
CharacterSetMatcher (A,false ) and return its Matcher result.
The production
- Let A be the set of all characters except
LineTerminator . - Call
CharacterSetMatcher (A,false ) and return its Matcher result.
The production
- Return the Matcher that is the result of evaluating
AtomEscape .
The production
- Evaluate
CharacterClass to obtain a CharSet A and a Boolean invert. - Call
CharacterSetMatcher (A, invert) and return its Matcher result.
The production
- Evaluate
Disjunction to obtain a Matcher m. - Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this
Atom . This is the total number of Parse Nodes prior to or enclosing thisAtom :: ( Disjunction ) Atom . - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
- Let d be an internal Continuation closure that takes one State argument y and performs the following steps:
- Call m(x, d) and return its result.
The production
- Return the Matcher that is the result of evaluating
Disjunction .
21.2.2.8.1 Runtime Semantics: CharacterSetMatcher ( A, invert )
The abstract operation CharacterSetMatcher takes two arguments, a CharSet A and a Boolean flag invert, and performs the following steps:
- Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
- Let e be x's endIndex.
- If e is InputLength, return
failure . - Let ch be the character Input[e].
- Let cc be
Canonicalize (ch). - If invert is
false , then- If there does not exist a member a of set A such that
Canonicalize (a) is cc, returnfailure .
- If there does not exist a member a of set A such that
- Else invert is
true ,- If there exists a member a of set A such that
Canonicalize (a) is cc, returnfailure .
- If there exists a member a of set A such that
- Let cap be x's captures
List . - Let y be the State (e+1, cap).
- Call c(y) and return its result.
21.2.2.8.2 Runtime Semantics: Canonicalize ( ch )
The abstract operation Canonicalize takes a character parameter ch and performs the following steps:
- If IgnoreCase is
false , return ch. - If Unicode is
true , then- If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
- Return ch.
- Else,
Assert : ch is a UTF-16 code unit.- Let s be the ECMAScript String value consisting of the single code unit ch.
- Let u be the same result produced as if by performing the algorithm for
String.prototype.toUpperCaseusing s as thethis value. Assert : u is a String value.- If u does not consist of a single code unit, return ch.
- Let cu be u's single code unit element.
- If ch's code unit value ≥ 128 and cu's code unit value < 128, return ch.
- Return cu.
Parentheses of the form ( ) serve both to group the components of the \ followed by a nonzero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching internal procedure. To inhibit the capturing behaviour of parentheses, use the form (?: ) instead.
The form (?= ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?= form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
The form (?! ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
In case-insignificant matches when Unicode is "ß" (U+00DF) to "SS". It may however map a code point outside the Basic Latin range to a character within, for example, "ſ" (U+017F) to "s". Such characters are not mapped if Unicode is /[a-z]/i, but they will match /[a-z]/ui.
21.2.2.9 AtomEscape
The production
- Evaluate
DecimalEscape to obtain an integer n. - If n>NcapturingParens, throw a
SyntaxError exception. - Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
- Let cap be x's captures
List . - Let s be cap[n].
- If s is
undefined , return c(x). - Let e be x's endIndex.
- Let len be s's length.
- Let f be e+len.
- If f>InputLength, return
failure . - If there exists an integer i between 0 (inclusive) and len (exclusive) such that
Canonicalize (s[i]) is not the same character value asCanonicalize (Input[e+i]), returnfailure . - Let y be the State (f, cap).
- Call c(y) and return its result.
- Let cap be x's captures
The production
- Evaluate
CharacterEscape to obtain a character ch. - Let A be a one-element CharSet containing the character ch.
- Call
CharacterSetMatcher (A,false ) and return its Matcher result.
The production
- Evaluate
CharacterClassEscape to obtain a CharSet A. - Call
CharacterSetMatcher (A,false ) and return its Matcher result.
An escape sequence of the form \ followed by a nonzero decimal number n matches the result of the _n_th set of capturing parentheses (
21.2.2.10 CharacterEscape
The production
- Return the character U+0000 (NULL).
\0 represents the <NUL> character and cannot be followed by a decimal digit.
The production
- Return the character according to
Table 48 .
| ControlEscape | Character Value | Code Point | Unicode Name | Symbol |
|---|---|---|---|---|
t
|
9 |
U+0009
|
CHARACTER TABULATION | <HT> |
n
|
10 |
U+000A
|
LINE FEED (LF) | <LF> |
v
|
11 |
U+000B
|
LINE TABULATION | <VT> |
f
|
12 |
U+000C
|
FORM FEED (FF) | <FF> |
r
|
13 |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
The production
- Let ch be the character matched by
ControlLetter . - Let i be ch's character value.
- Let j be the remainder of dividing i by 32.
- Return the character whose character value is j.
The production
- Return the character whose code is the SV of
HexEscapeSequence .
The production
- Return the result of evaluating
RegExpUnicodeEscapeSequence .
The production
- Return the character matched by
IdentityEscape .
The production
- Let lead be the result of evaluating
LeadSurrogate . - Let trail be the result of evaluating
TrailSurrogate . - Let cp be
UTF16Decode (lead, trail). - Return the character whose character value is cp.
The production
- Return the character whose code is the result of evaluating
LeadSurrogate .
The production
- Return the character whose code is the result of evaluating
TrailSurrogate .
The production
- Return the character whose code is the result of evaluating
NonSurrogate .
The production
- Return the character whose code is the SV of
Hex4Digits .
The production
- Return the character whose code is the MV of
HexDigits .
The production
- Return the character whose code is the SV of
Hex4Digits .
The production
- Return the character whose code is the SV of
Hex4Digits .
The production
- Return the character whose code is the SV of
Hex4Digits .
21.2.2.11 DecimalEscape
The production
- Return the MV of
NonZeroDigit .
The production
- Let n be the number of code points in
DecimalDigits . - Return (the MV of
NonZeroDigit × 10n) plus the MV ofDecimalDigits .
The definitions of “the MV of
If \ is followed by a decimal number n whose first digit is not 0, then the escape sequence is considered to be a backreference. It is an error if n is greater than the total number of left-capturing parentheses in the entire regular expression.
21.2.2.12 CharacterClassEscape
The production
- Return the ten-element set of characters containing the characters
0through9inclusive.
The production
- Return the set of all characters not included in the set returned by
.CharacterClassEscape :: d
The production
- Return the set of characters containing the characters that are on the right-hand side of the
WhiteSpace orLineTerminator productions.
The production
- Return the set of all characters not included in the set returned by
.CharacterClassEscape :: s
The production
- Return the set of all characters returned by
WordCharacters ().
The production
- Return the set of all characters not included in the set returned by
.CharacterClassEscape :: w
21.2.2.13 CharacterClass
The production
- Evaluate
ClassRanges to obtain a CharSet A. - Return the two results A and
false .
The production
- Evaluate
ClassRanges to obtain a CharSet A. - Return the two results A and
true .
21.2.2.14 ClassRanges
The production
- Return the empty CharSet.
The production
- Evaluate
NonemptyClassRanges to obtain a CharSet A. - Return A.
21.2.2.15 NonemptyClassRanges
The production
- Return the CharSet that is the result of evaluating
ClassAtom .
The production
- Evaluate
ClassAtom to obtain a CharSet A. - Evaluate
NonemptyClassRangesNoDash to obtain a CharSet B. - Return the union of CharSets A and B.
The production
- Evaluate the first
ClassAtom to obtain a CharSet A. - Evaluate the second
ClassAtom to obtain a CharSet B. - Evaluate
ClassRanges to obtain a CharSet C. - Call
CharacterRange (A, B) and let D be the resulting CharSet. - Return the union of CharSets D and C.
21.2.2.15.1 Runtime Semantics: CharacterRange ( A, B )
The abstract operation CharacterRange takes two CharSet parameters A and B and performs the following steps:
- If A does not contain exactly one character or B does not contain exactly one character, throw a
SyntaxError exception. - Let a be the one character in CharSet A.
- Let b be the one character in CharSet B.
- Let i be the character value of character a.
- Let j be the character value of character b.
- If i > j, throw a
SyntaxError exception. - Return the set containing all characters numbered i through j, inclusive.
21.2.2.16 NonemptyClassRangesNoDash
The production
- Return the CharSet that is the result of evaluating
ClassAtom .
The production
- Evaluate
ClassAtomNoDash to obtain a CharSet A. - Evaluate
NonemptyClassRangesNoDash to obtain a CharSet B. - Return the union of CharSets A and B.
The production
- Evaluate
ClassAtomNoDash to obtain a CharSet A. - Evaluate
ClassAtom to obtain a CharSet B. - Evaluate
ClassRanges to obtain a CharSet C. - Call
CharacterRange (A, B) and let D be the resulting CharSet. - Return the union of CharSets D and C.
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.
A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
21.2.2.17 ClassAtom
The production
- Return the CharSet containing the one character
-.
The production
- Evaluate
ClassAtomNoDash to obtain a CharSet A. - Return A.
21.2.2.18 ClassAtomNoDash
The production
- Return the CharSet containing the character matched by
SourceCharacter .
The production
- Return the CharSet that is the result of evaluating
ClassEscape .
21.2.2.19 ClassEscape
The production
- Return the CharSet containing the single character <BS> U+0008 (BACKSPACE).
The production
- Return the CharSet containing the single character - U+002D (HYPHEN-MINUS).
The production
- Return the CharSet containing the single character that is the result of evaluating
CharacterEscape .
The production
- Return the CharSet that is the result of evaluating
CharacterClassEscape .
A \b, \B, and backreferences. Inside a \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a
21.2.3 The RegExp Constructor
The RegExp constructor is the RegExp property of the RegExp is called as a function rather than as a constructor, it creates and initializes a new RegExp object. Thus the function call RegExp(…) is equivalent to the object creation expression new RegExp(…) with the same arguments.
The RegExp constructor is designed to be subclassable. It may be used as the value of an extends clause of a class definition. Subclass constructors that intend to inherit the specified RegExp behaviour must include a super call to the RegExp constructor to create and initialize subclass instances with the necessary internal slots.
21.2.3.1 RegExp ( pattern, flags )
The following steps are taken:
- Let patternIsRegExp be ?
IsRegExp (pattern). - If NewTarget is not
undefined , let newTarget be NewTarget. - Else,
- Let newTarget be the
active function object . - If patternIsRegExp is
true and flags isundefined , then
- Let newTarget be the
- If
Type (pattern) is Object and pattern has a [[RegExpMatcher]] internal slot, then- Let P be pattern.[[OriginalSource]].
- If flags is
undefined , let F be pattern.[[OriginalFlags]]. - Else, let F be flags.
- Else if patternIsRegExp is
true , then - Else,
- Let P be pattern.
- Let F be flags.
- Let O be ?
RegExpAlloc (newTarget). - Return ?
RegExpInitialize (O, P, F).
If pattern is supplied using a
21.2.3.2 Abstract Operations for the RegExp Constructor
21.2.3.2.1 Runtime Semantics: RegExpAlloc ( newTarget )
When the abstract operation RegExpAlloc with argument newTarget is called, the following steps are taken:
- Let obj be ?
OrdinaryCreateFromConstructor (newTarget,"%RegExpPrototype%", « [[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]] »). - Perform !
DefinePropertyOrThrow (obj,"lastIndex", PropertyDescriptor {[[Writable]]:true , [[Enumerable]]:false , [[Configurable]]:false }). - Return obj.
21.2.3.2.2 Runtime Semantics: RegExpInitialize ( obj, pattern, flags )
When the abstract operation RegExpInitialize with arguments obj, pattern, and flags is called, the following steps are taken:
- If pattern is
undefined , let P be the empty String. - Else, let P be ?
ToString (pattern). - If flags is
undefined , let F be the empty String. - Else, let F be ?
ToString (flags). - If F contains any code unit other than
"g","i","m","u", or"y"or if it contains the same code unit more than once, throw aSyntaxError exception. - If F contains
"u", let BMP befalse ; else let BMP betrue . - If BMP is
true , then- Parse P using the grammars in
21.2.1 and interpreting each of its 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements. Thegoal symbol for the parse isPattern . Throw a[~U] SyntaxError exception if P did not conform to the grammar, if any elements of P were not matched by the parse, or if any Early Error conditions exist. - Let patternCharacters be a
List whose elements are the code unit elements of P.
- Parse P using the grammars in
- Else,
- Parse P using the grammars in
21.2.1 and interpreting P as UTF-16 encoded Unicode code points (6.1.4 ). Thegoal symbol for the parse isPattern . Throw a[+U] SyntaxError exception if P did not conform to the grammar, if any elements of P were not matched by the parse, or if any Early Error conditions exist. - Let patternCharacters be a
List whose elements are the code points resulting from applying UTF-16 decoding to P's sequence of elements.
- Parse P using the grammars in
Set obj.[[OriginalSource]] to P.Set obj.[[OriginalFlags]] to F.Set obj.[[RegExpMatcher]] to the internal procedure that evaluates the above parse of P by applying the semantics provided in21.2.2 using patternCharacters as the pattern'sList ofSourceCharacter values and F as the flag parameters.- Perform ?
Set (obj,"lastIndex", 0,true ). - Return obj.
21.2.3.2.3 Runtime Semantics: RegExpCreate ( P, F )
When the abstract operation RegExpCreate with arguments P and F is called, the following steps are taken:
- Let obj be ?
RegExpAlloc (%RegExp% ). - Return ?
RegExpInitialize (obj, P, F).
21.2.3.2.4 Runtime Semantics: EscapeRegExpPattern ( P, F )
When the abstract operation EscapeRegExpPattern with arguments P and F is called, the following occurs:
- Let S be a String in the form of a
Pattern ([~U] Pattern if F contains[+U] "u") equivalent to P interpreted as UTF-16 encoded Unicode code points (6.1.4 ), in which certain code points are escaped as described below. S may or may not be identical to P; however, the internal procedure that would result from evaluating S as aPattern ([~U] Pattern if F contains[+U] "u") must behave identically to the internal procedure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results. - The code points
/or anyLineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that the String value formed by concatenating the Strings"/", S,"/", and F can be parsed (in an appropriate lexical context) as aRegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is"/", then S could be"\/"or"\u002F", among other possibilities, but not"/", because///followed by F would be parsed as aSingleLineComment rather than aRegularExpressionLiteral . If P is the empty String, this specification can be met by letting S be"(?:)". - Return S.
21.2.4 Properties of the RegExp Constructor
The value of the [[Prototype]] internal slot of the RegExp constructor is the intrinsic object
The RegExp constructor has the following properties:
21.2.4.1 RegExp.prototype
The initial value of RegExp.prototype is the intrinsic object
This property has the attributes { [[Writable]]:
21.2.4.2 get RegExp [ @@species ]
RegExp[@@species] is an accessor property whose set accessor function is
- Return the
this value.
The value of the name property of this function is "get [Symbol.species]".
RegExp prototype methods normally use their this object's constructor to create a derived object. However, a subclass constructor may over-ride that default behaviour by redefining its @@species property.
21.2.5 Properties of the RegExp Prototype Object
The RegExp prototype object is the intrinsic object
The value of the [[Prototype]] internal slot of the RegExp prototype object is the intrinsic object
The RegExp prototype object does not have a valueOf property of its own; however, it inherits the valueOf property from the Object prototype object.
21.2.5.1 RegExp.prototype.constructor
The initial value of RegExp.prototype.constructor is the intrinsic object
21.2.5.2 RegExp.prototype.exec ( string )
Performs a regular expression match of string against the regular expression and returns an Array object containing the results of the match, or
The String
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have a [[RegExpMatcher]] internal slot, throw a
TypeError exception. - Let S be ?
ToString (string). - Return ?
RegExpBuiltinExec (R, S).
21.2.5.2.1 Runtime Semantics: RegExpExec ( R, S )
The abstract operation RegExpExec with arguments R and S performs the following steps:
Assert :Type (R) is Object.Assert :Type (S) is String.- Let exec be ?
Get (R,"exec"). - If
IsCallable (exec) istrue , then - If R does not have a [[RegExpMatcher]] internal slot, throw a
TypeError exception. - Return ?
RegExpBuiltinExec (R, S).
If a callable exec property is not found this algorithm falls back to attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup of exec.
21.2.5.2.2 Runtime Semantics: RegExpBuiltinExec ( R, S )
The abstract operation RegExpBuiltinExec with arguments R and S performs the following steps:
Assert : R is an initialized RegExp instance.Assert :Type (S) is String.- Let length be the number of code units in S.
- Let lastIndex be ?
ToLength (?Get (R,"lastIndex")). - Let flags be R.[[OriginalFlags]].
- If flags contains
"g", let global betrue , else let global befalse . - If flags contains
"y", let sticky betrue , else let sticky befalse . - If global is
false and sticky isfalse , set lastIndex to 0. - Let matcher be R.[[RegExpMatcher]].
- If flags contains
"u", let fullUnicode betrue , else let fullUnicode befalse . - Let matchSucceeded be
false . - Repeat, while matchSucceeded is
false - If lastIndex > length, then
- If global is
true or sticky istrue , then- Perform ?
Set (R,"lastIndex", 0,true ).
- Perform ?
- Return
null .
- If global is
- Let r be matcher(S, lastIndex).
- If r is
failure , then- If sticky is
true , then- Perform ?
Set (R,"lastIndex", 0,true ). - Return
null .
- Perform ?
Set lastIndex toAdvanceStringIndex (S, lastIndex, fullUnicode).
- If sticky is
- Else,
- If lastIndex > length, then
- Let e be r's endIndex value.
- If fullUnicode is
true , then- e is an index into the Input character list, derived from S, matched by matcher. Let eUTF be the smallest index into S that corresponds to the character at element e of Input. If e is greater than or equal to the length of Input, then eUTF is the number of code units in S.
Set e to eUTF.
- If global is
true or sticky istrue , then- Perform ?
Set (R,"lastIndex", e,true ).
- Perform ?
- Let n be the length of r's captures
List . (This is the same value as21.2.2.1 's NcapturingParens.) - Let A be !
ArrayCreate (n + 1). Assert : The value of A's"length"property is n + 1.- Let matchIndex be lastIndex.
- Perform !
CreateDataProperty (A,"index", matchIndex). - Perform !
CreateDataProperty (A,"input", S). - Let matchedSubstr be the matched substring (i.e. the portion of S between offset lastIndex inclusive and offset e exclusive).
- Perform !
CreateDataProperty (A,"0", matchedSubstr). - For each integer i such that i > 0 and i ≤ n, do
- Let captureI be ith element of r's captures
List . - If captureI is
undefined , let capturedValue beundefined . - Else if fullUnicode is
true , thenAssert : captureI is aList of code points.- Let capturedValue be a String value whose code units are the
UTF16Encoding of the code points of captureI.
- Else fullUnicode is
false , - Perform !
CreateDataProperty (A, !ToString (i), capturedValue).
- Let captureI be ith element of r's captures
- Return A.
21.2.5.2.3 AdvanceStringIndex ( S, index, unicode )
The abstract operation AdvanceStringIndex with arguments S, index, and unicode performs the following steps:
Assert :Type (S) is String.Assert : index is an integer such that 0≤index≤253-1.Assert :Type (unicode) is Boolean.- If unicode is
false , return index+1. - Let length be the number of code units in S.
- If index+1 ≥ length, return index+1.
- Let first be the code unit value at index index in S.
- If first < 0xD800 or first > 0xDBFF, return index+1.
- Let second be the code unit value at index index+1 in S.
- If second < 0xDC00 or second > 0xDFFF, return index+1.
- Return index+2.
21.2.5.3 get RegExp.prototype.flags
RegExp.prototype.flags is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - Let result be the empty String.
- Let global be
ToBoolean (?Get (R,"global")). - If global is
true , append"g"as the last code unit of result. - Let ignoreCase be
ToBoolean (?Get (R,"ignoreCase")). - If ignoreCase is
true , append"i"as the last code unit of result. - Let multiline be
ToBoolean (?Get (R,"multiline")). - If multiline is
true , append"m"as the last code unit of result. - Let unicode be
ToBoolean (?Get (R,"unicode")). - If unicode is
true , append"u"as the last code unit of result. - Let sticky be
ToBoolean (?Get (R,"sticky")). - If sticky is
true , append"y"as the last code unit of result. - Return result.
21.2.5.4 get RegExp.prototype.global
RegExp.prototype.global is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit
"g", returntrue . - Return
false .
21.2.5.5 get RegExp.prototype.ignoreCase
RegExp.prototype.ignoreCase is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit
"i", returntrue . - Return
false .
21.2.5.6 RegExp.prototype [ @@match ] ( string )
When the @@match method is called with argument string, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let global be
ToBoolean (?Get (rx,"global")). - If global is
false , then- Return ?
RegExpExec (rx, S).
- Return ?
- Else global is
true ,- Let fullUnicode be
ToBoolean (?Get (rx,"unicode")). - Perform ?
Set (rx,"lastIndex", 0,true ). - Let A be !
ArrayCreate (0). - Let n be 0.
- Repeat,
- Let result be ?
RegExpExec (rx, S). - If result is
null , then- If n=0, return
null . - Return A.
- If n=0, return
- Else result is not
null ,- Let matchStr be ?
ToString (?Get (result,"0")). - Let status be
CreateDataProperty (A, !ToString (n), matchStr). Assert : status istrue .- If matchStr is the empty String, then
- Let thisIndex be ?
ToLength (?Get (rx,"lastIndex")). - Let nextIndex be
AdvanceStringIndex (S, thisIndex, fullUnicode). - Perform ?
Set (rx,"lastIndex", nextIndex,true ).
- Let thisIndex be ?
- Increment n.
- Let matchStr be ?
- Let result be ?
- Let fullUnicode be
The value of the name property of this function is "[Symbol.match]".
The @@match property is used by the
21.2.5.7 get RegExp.prototype.multiline
RegExp.prototype.multiline is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit
"m", returntrue . - Return
false .
21.2.5.8 RegExp.prototype [ @@replace ] ( string, replaceValue )
When the @@replace method is called with arguments string and replaceValue, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let lengthS be the number of code unit elements in S.
- Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Let replaceValue be ?
ToString (replaceValue).
- Let replaceValue be ?
- Let global be
ToBoolean (?Get (rx,"global")). - If global is
true , then - Let results be a new empty
List . - Let done be
false . - Repeat, while done is
false - Let result be ?
RegExpExec (rx, S). - If result is
null , set done totrue . - Else result is not
null ,- Append result to the end of results.
- If global is
false , set done totrue . - Else,
- Let result be ?
- Let accumulatedResult be the empty String value.
- Let nextSourcePosition be 0.
- For each result in results, do
- Let nCaptures be ?
ToLength (?Get (result,"length")). - Let nCaptures be
max (nCaptures - 1, 0). - Let matched be ?
ToString (?Get (result,"0")). - Let matchLength be the number of code units in matched.
- Let position be ?
ToInteger (?Get (result,"index")). - Let position be
max (min (position, lengthS), 0). - Let n be 1.
- Let captures be a new empty
List . - Repeat, while n ≤ nCaptures
- If functionalReplace is
true , then - Else,
- Let replacement be
GetSubstitution (matched, S, position, captures, replaceValue).
- Let replacement be
- If position ≥ nextSourcePosition, then
- NOTE: position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
- Let accumulatedResult be the String formed by concatenating the code units of the current value of accumulatedResult with the substring of S consisting of the code units from nextSourcePosition (inclusive) up to position (exclusive) and with the code units of replacement.
- Let nextSourcePosition be position + matchLength.
- Let nCaptures be ?
- If nextSourcePosition ≥ lengthS, return accumulatedResult.
- Return the String formed by concatenating the code units of accumulatedResult with the substring of S consisting of the code units from nextSourcePosition (inclusive) up through the final code unit of S (inclusive).
The value of the name property of this function is "[Symbol.replace]".
21.2.5.9 RegExp.prototype [ @@search ] ( string )
When the @@search method is called with argument string, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let previousLastIndex be ?
Get (rx,"lastIndex"). - If
SameValue (previousLastIndex, 0) isfalse , then- Perform ?
Set (rx,"lastIndex", 0,true ).
- Perform ?
- Let result be ?
RegExpExec (rx, S). - Let currentLastIndex be ?
Get (rx,"lastIndex"). - If
SameValue (currentLastIndex, previousLastIndex) isfalse , then- Perform ?
Set (rx,"lastIndex", previousLastIndex,true ).
- Perform ?
- If result is
null , return -1. - Return ?
Get (result,"index").
The value of the name property of this function is "[Symbol.search]".
The lastIndex and global properties of this RegExp object are ignored when performing the search. The lastIndex property is left unchanged.
21.2.5.10 get RegExp.prototype.source
RegExp.prototype.source is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalSource]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , return"(?:)". - Otherwise, throw a
TypeError exception.
- If
Assert : R has an [[OriginalFlags]] internal slot.- Let src be R.[[OriginalSource]].
- Let flags be R.[[OriginalFlags]].
- Return
EscapeRegExpPattern (src, flags).
21.2.5.11 RegExp.prototype [ @@split ] ( string, limit )
Returns an Array object into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the
The /a*?/[Symbol.split]("ab") evaluates to the array ["a","b"], while /a*/[Symbol.split]("ab") evaluates to the array ["","b"].)
If the string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If the regular expression contains capturing parentheses, then each time separator is matched the results (including any
/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")
evaluates to the array
["A",undefined,"B","bold","/","B","and",undefined,"CODE","coded","/","CODE",""]
If limit is not
When the @@split method is called, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let C be ?
SpeciesConstructor (rx,%RegExp% ). - Let flags be ?
ToString (?Get (rx,"flags")). - If flags contains
"u", let unicodeMatching betrue . - Else, let unicodeMatching be
false . - If flags contains
"y", let newFlags be flags. - Else, let newFlags be the String that is the concatenation of flags and
"y". - Let splitter be ?
Construct (C, « rx, newFlags »). - Let A be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232-1; else let lim be ?ToUint32 (limit). - Let size be the number of elements in S.
- Let p be 0.
- If lim = 0, return A.
- If size = 0, then
- Let z be ?
RegExpExec (splitter, S). - If z is not
null , return A. - Perform !
CreateDataProperty (A,"0", S). - Return A.
- Let z be ?
- Let q be p.
- Repeat, while q < size
- Perform ?
Set (splitter,"lastIndex", q,true ). - Let z be ?
RegExpExec (splitter, S). - If z is
null , let q beAdvanceStringIndex (S, q, unicodeMatching). - Else z is not
null ,- Let e be ?
ToLength (?Get (splitter,"lastIndex")). - Let e be
min (e, size). - If e = p, let q be
AdvanceStringIndex (S, q, unicodeMatching). - Else e ≠ p,
- Let T be a String value equal to the substring of S consisting of the elements at indices p (inclusive) through q (exclusive).
- Perform !
CreateDataProperty (A, !ToString (lengthA), T). - Let lengthA be lengthA + 1.
- If lengthA = lim, return A.
- Let p be e.
- Let numberOfCaptures be ?
ToLength (?Get (z,"length")). - Let numberOfCaptures be
max (numberOfCaptures-1, 0). - Let i be 1.
- Repeat, while i ≤ numberOfCaptures,
- Let nextCapture be ?
Get (z, !ToString (i)). - Perform !
CreateDataProperty (A, !ToString (lengthA), nextCapture). - Let i be i + 1.
- Let lengthA be lengthA + 1.
- If lengthA = lim, return A.
- Let nextCapture be ?
- Let q be p.
- Let e be ?
- Perform ?
- Let T be a String value equal to the substring of S consisting of the elements at indices p (inclusive) through size (exclusive).
- Perform !
CreateDataProperty (A, !ToString (lengthA), T). - Return A.
The value of the name property of this function is "[Symbol.split]".
The @@split method ignores the value of the global and sticky properties of this RegExp object.
21.2.5.12 get RegExp.prototype.sticky
RegExp.prototype.sticky is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit
"y", returntrue . - Return
false .
21.2.5.13 RegExp.prototype.test ( S )
The following steps are taken:
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - Let string be ?
ToString (S). - Let match be ?
RegExpExec (R, string). - If match is not
null , returntrue ; else returnfalse .
21.2.5.14 RegExp.prototype.toString ( )
The returned String has the form of a
21.2.5.15 get RegExp.prototype.unicode
RegExp.prototype.unicode is an accessor property whose set accessor function is
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExpPrototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit
"u", returntrue . - Return
false .
21.2.6 Properties of RegExp Instances
RegExp instances are ordinary objects that inherit properties from the RegExp prototype object. RegExp instances have internal slots [[RegExpMatcher]], [[OriginalSource]], and [[OriginalFlags]]. The value of the [[RegExpMatcher]] internal slot is an implementation-dependent representation of the
Prior to ECMAScript 2015, RegExp instances were specified as having the own data properties source, global, ignoreCase, and multiline. Those properties are now specified as accessor properties of RegExp.prototype.
RegExp instances also have the following property:
21.2.6.1 lastIndex
The value of the lastIndex property specifies the String index at which to start the next match. It is coerced to an integer when used (see