22 Text Processing
22.1 String Objects
22.1.1 The String Constructor
The String
- is
%String% . - is the initial value of the
"String" property of theglobal object . - creates and initializes a new String object when called as a
constructor . - performs a
type conversion when called as a function rather than as aconstructor . - is designed to be subclassable. It may be used as the value of an
extendsclause of a class definition. Subclass constructors that intend to inherit the specified String behaviour must include asupercall to the Stringconstructor to create and initialize the subclass instance with a [[StringData]] internal slot.
22.1.1.1 String ( value )
When String is called with argument value, the following steps are taken:
- If value is not present, let s be the empty String.
- Else,
- If NewTarget is
undefined andType (value) is Symbol, returnSymbolDescriptiveString (value). - Let s be ?
ToString (value).
- If NewTarget is
- If NewTarget is
undefined , return s. - Return !
StringCreate (s, ?GetPrototypeFromConstructor (NewTarget," )).%String.prototype% "
22.1.2 Properties of the String Constructor
The String
- has a [[Prototype]] internal slot whose value is
%Function.prototype% . - has the following properties:
22.1.2.1 String.fromCharCode ( ...codeUnits )
The String.fromCharCode function may be called with any number of arguments which form the rest parameter codeUnits. The following steps are taken:
- Let length be the number of elements in codeUnits.
- Let elements be a new empty
List . - For each element next of codeUnits, do
- Let nextCU be ℝ(?
ToUint16 (next)). - Append nextCU to the end of elements.
- Let nextCU be ℝ(?
- Return the String value whose code units are the elements in the
List elements. If codeUnits is empty, the empty String is returned.
The fromCharCode function is
22.1.2.2 String.fromCodePoint ( ...codePoints )
The String.fromCodePoint function may be called with any number of arguments which form the rest parameter codePoints. The following steps are taken:
- Let result be the empty String.
- For each element next of codePoints, do
- Let nextCP be ?
ToNumber (next). - If !
IsIntegralNumber (nextCP) isfalse , throw aRangeError exception. - If ℝ(nextCP) < 0 or ℝ(nextCP) > 0x10FFFF, throw a
RangeError exception. Set result to thestring-concatenation of result and !UTF16EncodeCodePoint (ℝ(nextCP)).
- Let nextCP be ?
Assert : If codePoints is empty, then result is the empty String.- Return result.
The fromCodePoint function is
22.1.2.3 String.prototype
The initial value of String.prototype is the
This property has the attributes { [[Writable]]:
22.1.2.4 String.raw ( template, ...substitutions )
The String.raw function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the
- Let numberOfSubstitutions be the number of elements in substitutions.
- Let cooked be ?
ToObject (template). - Let raw be ?
ToObject (?Get (cooked,"raw" )). - Let literalSegments be ?
LengthOfArrayLike (raw). - If literalSegments ≤ 0, return the empty String.
- Let stringElements be a new empty
List . - Let nextIndex be 0.
- Repeat,
- Let nextKey be !
ToString (𝔽(nextIndex)). - Let nextSeg be ?
ToString (?Get (raw, nextKey)). - Append the code unit elements of nextSeg to the end of stringElements.
- If nextIndex + 1 = literalSegments, then
- Return the String value whose code units are the elements in the
List stringElements. If stringElements has no elements, the empty String is returned.
- Return the String value whose code units are the elements in the
- If nextIndex < numberOfSubstitutions, let next be substitutions[nextIndex].
- Else, let next be the empty String.
- Let nextSub be ?
ToString (next). - Append the code unit elements of nextSub to the end of stringElements.
Set nextIndex to nextIndex + 1.
- Let nextKey be !
The raw function is intended for use as a tag function of a Tagged Template (
22.1.3 Properties of the String Prototype Object
The String prototype object:
- is
%String.prototype% . - is a
String exotic object and has the internal methods specified for such objects. - has a [[StringData]] internal slot whose value is the empty String.
- has a
"length" property whose initial value is+0 𝔽 and whose attributes are { [[Writable]]:false , [[Enumerable]]:false , [[Configurable]]:false }. - has a [[Prototype]] internal slot whose value is
%Object.prototype% .
Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the
The abstract operation thisStringValue takes argument value. It performs the following steps when called:
22.1.3.1 String.prototype.charAt ( pos )
Returns a single element String containing the code unit at index pos within the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result is a String value, not a String object.
If pos is an x.charAt(pos) is equivalent to the result of x.substring(pos, pos + 1).
When the charAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToIntegerOrInfinity (pos). - Let size be the length of S.
- If position < 0 or position ≥ size, return the empty String.
- Return the String value of length 1, containing one code unit from S, namely the code unit at index position.
The charAt function is intentionally generic; it does not require that its
22.1.3.2 String.prototype.charCodeAt ( pos )
Returns a Number (a non-negative
When the charCodeAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToIntegerOrInfinity (pos). - Let size be the length of S.
- If position < 0 or position ≥ size, return
NaN . - Return the
Number value for the numeric value of the code unit at index position within the String S.
The charCodeAt function is intentionally generic; it does not require that its
22.1.3.3 String.prototype.codePointAt ( pos )
Returns a non-negative
When the codePointAt method is called with one argument pos, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let position be ?
ToIntegerOrInfinity (pos). - Let size be the length of S.
- If position < 0 or position ≥ size, return
undefined . - Let cp be !
CodePointAt (S, position). - Return 𝔽(cp.[[CodePoint]]).
The codePointAt function is intentionally generic; it does not require that its
22.1.3.4 String.prototype.concat ( ...args )
When the concat method is called it returns the String value consisting of the code units of the
When the concat method is called with zero or more arguments, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let R be S.
- For each element next of args, do
- Let nextString be ?
ToString (next). Set R to thestring-concatenation of R and nextString.
- Let nextString be ?
- Return R.
The concat method is
The concat function is intentionally generic; it does not require that its
22.1.3.5 String.prototype.constructor
The initial value of String.prototype.constructor is
22.1.3.6 String.prototype.endsWith ( searchString [ , endPosition ] )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let len be the length of S.
- If endPosition is
undefined , let pos be len; else let pos be ?ToIntegerOrInfinity (endPosition). - Let end be the result of
clamping pos between 0 and len. - Let searchLength be the length of searchStr.
- If searchLength = 0, return
true . - Let start be end - searchLength.
- If start < 0, return
false . - Let substring be the
substring of S from start to end. - Return !
SameValueNonNumeric (substring, searchStr).
Returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The endsWith function is intentionally generic; it does not require that its
22.1.3.7 String.prototype.includes ( searchString [ , position ] )
The includes method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let pos be ?
ToIntegerOrInfinity (position). Assert : If position isundefined , then pos is 0.- Let len be the length of S.
- Let start be the result of
clamping pos between 0 and len. - Let index be !
StringIndexOf (S, searchStr, start). - If index is not -1, return
true . - Return
false .
If searchString appears as a
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The includes function is intentionally generic; it does not require that its
22.1.3.8 String.prototype.indexOf ( searchString [ , position ] )
If searchString appears as a
The indexOf method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let searchStr be ?
ToString (searchString). - Let pos be ?
ToIntegerOrInfinity (position). Assert : If position isundefined , then pos is 0.- Let len be the length of S.
- Let start be the result of
clamping pos between 0 and len. - Return 𝔽(!
StringIndexOf (S, searchStr, start)).
The indexOf function is intentionally generic; it does not require that its
22.1.3.9 String.prototype.lastIndexOf ( searchString [ , position ] )
If searchString appears as a
The lastIndexOf method takes two arguments, searchString and position, and performs the following steps:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let searchStr be ?
ToString (searchString). - Let numPos be ?
ToNumber (position). Assert : If position isundefined , then numPos isNaN .- If numPos is
NaN , let pos be +∞; otherwise, let pos be !ToIntegerOrInfinity (numPos). - Let len be the length of S.
- Let start be the result of
clamping pos between 0 and len. - Let searchLen be the length of searchStr.
- Let k be the largest possible non-negative
integer not larger than start such that k + searchLen ≤ len, and for all non-negative integers j such that j < searchLen, the code unit at index k + j within S is the same as the code unit at index j within searchStr; but if there is no suchinteger , let k be -1. - Return 𝔽(k).
The lastIndexOf function is intentionally generic; it does not require that its
22.1.3.10 String.prototype.localeCompare ( that [ , reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the localeCompare method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the localeCompare method is used.
When the localeCompare method is called with argument that, it returns a Number other than
Before performing the comparisons, the following steps are performed to prepare the Strings:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let That be ?
ToString (that).
The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.
The localeCompare method, if considered as a function of two arguments
The actual return values are 0 when comparing Strings that are considered canonically equivalent.
The localeCompare method itself is not directly suitable as an argument to Array.prototype.sort because the latter requires a function of two arguments.
This function is intended to rely on whatever language-sensitive comparison functionality is available to the ECMAScript environment from the
The localeCompare function is intentionally generic; it does not require that its
22.1.3.11 String.prototype.match ( regexp )
When the match method is called with argument regexp, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If regexp is neither
undefined nornull , then - Let S be ?
ToString (O). - Let rx be ?
RegExpCreate (regexp,undefined ). - Return ?
Invoke (rx, @@match, « S »).
The match function is intentionally generic; it does not require that its
22.1.3.12 String.prototype.matchAll ( regexp )
Performs a regular expression match of the String representing the
When the matchAll method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If regexp is neither
undefined nornull , then- Let isRegExp be ?
IsRegExp (regexp). - If isRegExp is
true , then- Let flags be ?
Get (regexp,"flags" ). - Perform ?
RequireObjectCoercible (flags). - If ?
ToString (flags) does not contain"g" , throw aTypeError exception.
- Let flags be ?
- Let matcher be ?
GetMethod (regexp, @@matchAll). - If matcher is not
undefined , then- Return ?
Call (matcher, regexp, « O »).
- Return ?
- Let isRegExp be ?
- Let S be ?
ToString (O). - Let rx be ?
RegExpCreate (regexp,"g" ). - Return ?
Invoke (rx, @@matchAll, « S »).
matchAll function is intentionally generic, it does not require that its String.prototype.split, String.prototype.matchAll is designed to typically act without mutating its inputs.22.1.3.13 String.prototype.normalize ( [ form ] )
When the normalize method is called with one argument form, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - If form is
undefined , let f be"NFC" . - Else, let f be ?
ToString (form). - If f is not one of
"NFC" ,"NFD" ,"NFKC" , or"NFKD" , throw aRangeError exception. - Let ns be the String value that is the result of normalizing S into the normalization form named by f as specified in https://unicode.org/reports/tr15/.
- Return ns.
The normalize function is intentionally generic; it does not require that its
22.1.3.14 String.prototype.padEnd ( maxLength [ , fillString ] )
When the padEnd method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Return ?
StringPad (O, maxLength, fillString,end ).
22.1.3.15 String.prototype.padStart ( maxLength [ , fillString ] )
When the padStart method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Return ?
StringPad (O, maxLength, fillString,start ).
22.1.3.15.1 StringPad ( O, maxLength, fillString, placement )
The abstract operation StringPad takes arguments O, maxLength, fillString, and placement. It performs the following steps when called:
Assert : placement isstart orend .- Let S be ?
ToString (O). - Let intMaxLength be ℝ(?
ToLength (maxLength)). - Let stringLength be the length of S.
- If intMaxLength ≤ stringLength, return S.
- If fillString is
undefined , let filler be the String value consisting solely of the code unit 0x0020 (SPACE). - Else, let filler be ?
ToString (fillString). - If filler is the empty String, return S.
- Let fillLen be intMaxLength - stringLength.
- Let truncatedStringFiller be the String value consisting of repeated concatenations of filler truncated to length fillLen.
- If placement is
start , return thestring-concatenation of truncatedStringFiller and S. - Else, return the
string-concatenation of S and truncatedStringFiller.
The argument maxLength will be clamped such that it can be no smaller than the length of S.
The argument fillString defaults to
22.1.3.16 String.prototype.repeat ( count )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let n be ?
ToIntegerOrInfinity (count). - If n < 0 or n is +∞, throw a
RangeError exception. - If n is 0, return the empty String.
- Return the String value that is made from n copies of S appended together.
This method creates the String value consisting of the code units of the
The repeat function is intentionally generic; it does not require that its
22.1.3.17 String.prototype.replace ( searchValue, replaceValue )
When the replace method is called with arguments searchValue and replaceValue, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If searchValue is neither
undefined nornull , then - Let string be ?
ToString (O). - Let searchString be ?
ToString (searchValue). - Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then - Let searchLength be the length of searchString.
- Let position be !
StringIndexOf (string, searchString, 0). - If position is -1, return string.
- Let preserved be the
substring of string from 0 to position. - If functionalReplace is
true , then - Else,
Assert :Type (replaceValue) is String.- Let captures be a new empty
List . - Let replacement be !
GetSubstitution (searchString, string, position, captures,undefined , replaceValue).
- Return the
string-concatenation of preserved, replacement, and thesubstring of string from position + searchLength.
The replace function is intentionally generic; it does not require that its
22.1.3.17.1 GetSubstitution ( matched, str, position, captures, namedCaptures, replacement )
The abstract operation GetSubstitution takes arguments matched, str, position (a non-negative
Assert :Type (matched) is String.- Let matchLength be the number of code units in matched.
Assert :Type (str) is String.- Let stringLength be the number of code units in str.
Assert : position ≤ stringLength.Assert : captures is a possibly emptyList of Strings.Assert :Type (replacement) is String.- Let tailPos be position + matchLength.
- Let m be the number of elements in captures.
- Let result be the String value derived from replacement by copying code unit elements from replacement to result while performing replacements as specified in
Table 54 . These$replacements are done left-to-right, and, once such a replacement is performed, the new replacement text is not subject to further replacements. - Return result.
| Code units | Unicode Characters | Replacement text |
|---|---|---|
| 0x0024, 0x0024 |
$$
|
$
|
| 0x0024, 0x0026 |
$&
|
matched |
| 0x0024, 0x0060 |
$`
|
The replacement is the |
| 0x0024, 0x0027 |
$'
|
If tailPos ≥ stringLength, the replacement is the empty String. Otherwise the replacement is the |
|
0x0024, N
Where 0x0031 ≤ N ≤ 0x0039 |
$n where
n is one of 1 2 3 4 5 6 7 8 9 and $n is not followed by a decimal digit
|
The nth element of captures, where n is a single digit in the range 1 to 9. If n ≤ m and the nth element of captures is |
|
0x0024, N, N
Where 0x0030 ≤ N ≤ 0x0039 |
$nn where
n is one of 0 1 2 3 4 5 6 7 8 9
|
The nnth element of captures, where nn is a two-digit decimal number in the range 01 to 99. If nn ≤ m and the nnth element of captures is |
| 0x0024, 0x003C |
$<
|
|
| 0x0024 |
$ in any context that does not match any of the above.
|
$
|
22.1.3.18 String.prototype.replaceAll ( searchValue, replaceValue )
When the replaceAll method is called with arguments searchValue and replaceValue, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If searchValue is neither
undefined nornull , then- Let isRegExp be ?
IsRegExp (searchValue). - If isRegExp is
true , then- Let flags be ?
Get (searchValue,"flags" ). - Perform ?
RequireObjectCoercible (flags). - If ?
ToString (flags) does not contain"g" , throw aTypeError exception.
- Let flags be ?
- Let replacer be ?
GetMethod (searchValue, @@replace). - If replacer is not
undefined , then- Return ?
Call (replacer, searchValue, « O, replaceValue »).
- Return ?
- Let isRegExp be ?
- Let string be ?
ToString (O). - Let searchString be ?
ToString (searchValue). - Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then - Let searchLength be the length of searchString.
- Let advanceBy be
max (1, searchLength). - Let matchPositions be a new empty
List . - Let position be !
StringIndexOf (string, searchString, 0). - Repeat, while position is not -1,
- Append position to the end of matchPositions.
Set position to !StringIndexOf (string, searchString, position + advanceBy).
- Let endOfLastMatch be 0.
- Let result be the empty String.
- For each element p of matchPositions, do
- Let preserved be the
substring of string from endOfLastMatch to p. - If functionalReplace is
true , then - Else,
Assert :Type (replaceValue) is String.- Let captures be a new empty
List . - Let replacement be !
GetSubstitution (searchString, string, p, captures,undefined , replaceValue).
Set result to thestring-concatenation of result, preserved, and replacement.Set endOfLastMatch to p + searchLength.
- Let preserved be the
- If endOfLastMatch < the length of string, then
Set result to thestring-concatenation of result and thesubstring of string from endOfLastMatch.
- Return result.
22.1.3.19 String.prototype.search ( regexp )
When the search method is called with argument regexp, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If regexp is neither
undefined nornull , then - Let string be ?
ToString (O). - Let rx be ?
RegExpCreate (regexp,undefined ). - Return ?
Invoke (rx, @@search, « string »).
The search function is intentionally generic; it does not require that its
22.1.3.20 String.prototype.slice ( start, end )
The slice method takes two arguments, start and end, and returns a
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let len be the length of S.
- Let intStart be ?
ToIntegerOrInfinity (start). - If intStart is -∞, let from be 0.
- Else if intStart < 0, let from be
max (len + intStart, 0). - Else, let from be
min (intStart, len). - If end is
undefined , let intEnd be len; else let intEnd be ?ToIntegerOrInfinity (end). - If intEnd is -∞, let to be 0.
- Else if intEnd < 0, let to be
max (len + intEnd, 0). - Else, let to be
min (intEnd, len). - If from ≥ to, return the empty String.
- Return the
substring of S from from to to.
The slice function is intentionally generic; it does not require that its
22.1.3.21 String.prototype.split ( separator, limit )
Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any String in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a @@split method.
When the split method is called, the following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - If separator is neither
undefined nornull , then - Let S be ?
ToString (O). - Let A be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232 - 1; else let lim be ℝ(?ToUint32 (limit)). - Let R be ?
ToString (separator). - If lim = 0, return A.
- If separator is
undefined , then- Perform !
CreateDataPropertyOrThrow (A,"0" , S). - Return A.
- Perform !
- Let s be the length of S.
- If s = 0, then
- If R is not the empty String, then
- Perform !
CreateDataPropertyOrThrow (A,"0" , S).
- Perform !
- Return A.
- If R is not the empty String, then
- Let p be 0.
- Let q be p.
- Repeat, while q ≠ s,
- Let e be
SplitMatch (S, q, R). - If e is
not-matched , set q to q + 1. - Else,
- Let e be
- Let T be the
substring of S from p to s. - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽(lengthA)), T). - Return A.
The value of separator may be an empty String. In this case, separator does not match the empty
If the
If separator is
The split function is intentionally generic; it does not require that its
22.1.3.21.1 SplitMatch ( S, q, R )
The abstract operation SplitMatch takes arguments S (a String), q (a non-negative
- Let r be the number of code units in R.
- Let s be the number of code units in S.
- If q + r > s, return
not-matched . - If there exists an
integer i between 0 (inclusive) and r (exclusive) such that the code unit at index q + i within S is different from the code unit at index i within R, returnnot-matched . - Return q + r.
22.1.3.22 String.prototype.startsWith ( searchString [ , position ] )
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let isRegExp be ?
IsRegExp (searchString). - If isRegExp is
true , throw aTypeError exception. - Let searchStr be ?
ToString (searchString). - Let len be the length of S.
- If position is
undefined , let pos be 0; else let pos be ?ToIntegerOrInfinity (position). - Let start be the result of
clamping pos between 0 and len. - Let searchLength be the length of searchStr.
- If searchLength = 0, return
true . - Let end be start + searchLength.
- If end > len, return
false . - Let substring be the
substring of S from start to end. - Return !
SameValueNonNumeric (substring, searchStr).
This method returns
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
The startsWith function is intentionally generic; it does not require that its
22.1.3.23 String.prototype.substring ( start, end )
The substring method takes two arguments, start and end, and returns a
If either argument is
If start is larger than end, they are swapped.
The following steps are taken:
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let len be the length of S.
- Let intStart be ?
ToIntegerOrInfinity (start). - If end is
undefined , let intEnd be len; else let intEnd be ?ToIntegerOrInfinity (end). - Let finalStart be the result of
clamping intStart between 0 and len. - Let finalEnd be the result of
clamping intEnd between 0 and len. - Let from be
min (finalStart, finalEnd). - Let to be
max (finalStart, finalEnd). - Return the
substring of S from from to to.
The substring function is intentionally generic; it does not require that its
22.1.3.24 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleLowerCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleLowerCase method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function works exactly the same as toLowerCase except that its result is intended to yield the correct result for the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
The toLocaleLowerCase function is intentionally generic; it does not require that its
22.1.3.25 String.prototype.toLocaleUpperCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement the toLocaleUpperCase method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of the toLocaleUpperCase method is used.
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function works exactly the same as toUpperCase except that its result is intended to yield the correct result for the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
The toLocaleUpperCase function is intentionally generic; it does not require that its
22.1.3.26 String.prototype.toLowerCase ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
- Let O be ?
RequireObjectCoercible (this value). - Let S be ?
ToString (O). - Let sText be !
StringToCodePoints (S). - Let lowerText be the result of toLowercase(sText), according to the Unicode Default Case Conversion algorithm.
- Let L be !
CodePointsToString (lowerText). - Return L.
The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the UnicodeData.txt file, but also all locale-insensitive mappings in the SpecialCasings.txt file that accompanies it).
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both toUpperCase and toLowerCase have context-sensitive behaviour, the functions are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().
The toLowerCase function is intentionally generic; it does not require that its
22.1.3.27 String.prototype.toString ( )
When the toString method is called, the following steps are taken:
- Return ?
thisStringValue (this value).
For a String object, the toString method happens to return the same thing as the valueOf method.
22.1.3.28 String.prototype.toUpperCase ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
This function behaves in exactly the same way as String.prototype.toLowerCase, except that the String is mapped using the toUppercase algorithm of the Unicode Default Case Conversion.
The toUpperCase function is intentionally generic; it does not require that its
22.1.3.29 String.prototype.trim ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
The following steps are taken:
- Let S be the
this value. - Return ?
TrimString (S,start+end ).
The trim function is intentionally generic; it does not require that its
22.1.3.29.1 TrimString ( string, where )
The abstract operation TrimString takes arguments string and where. It interprets string as a sequence of UTF-16 encoded code points, as described in
- Let str be ?
RequireObjectCoercible (string). - Let S be ?
ToString (str). - If where is
start , let T be the String value that is a copy of S with leading white space removed. - Else if where is
end , let T be the String value that is a copy of S with trailing white space removed. - Else,
Assert : where isstart+end .- Let T be the String value that is a copy of S with both leading and trailing white space removed.
- Return T.
The definition of white space is the union of
22.1.3.30 String.prototype.trimEnd ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
The following steps are taken:
- Let S be the
this value. - Return ?
TrimString (S,end ).
The trimEnd function is intentionally generic; it does not require that its
22.1.3.31 String.prototype.trimStart ( )
This function interprets a String value as a sequence of UTF-16 encoded code points, as described in
The following steps are taken:
- Let S be the
this value. - Return ?
TrimString (S,start ).
The trimStart function is intentionally generic; it does not require that its
22.1.3.32 String.prototype.valueOf ( )
When the valueOf method is called, the following steps are taken:
- Return ?
thisStringValue (this value).
22.1.3.33 String.prototype [ @@iterator ] ( )
When the @@iterator method is called it returns an Iterator object (
- Let O be ?
RequireObjectCoercible (this value). - Let s be ?
ToString (O). - Let closure be a new
Abstract Closure with no parameters that captures s and performs the following steps when called:- Let position be 0.
- Let len be the length of s.
- Repeat, while position < len,
- Let cp be !
CodePointAt (s, position). - Let nextIndex be position + cp.[[CodeUnitCount]].
- Let resultString be the
substring of s from position to nextIndex. Set position to nextIndex.- Perform ?
Yield (resultString).
- Let cp be !
- Return
undefined .
- Return !
CreateIteratorFromClosure (closure," ,%StringIteratorPrototype% "%StringIteratorPrototype% ).
The value of the
22.1.4 Properties of String Instances
String instances are String exotic objects and have the internal methods specified for such objects. String instances inherit properties from the
String instances have a
22.1.4.1 length
The number of elements in the String value represented by this String object.
Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]:
22.1.5 String Iterator Objects
A String Iterator is an object, that represents a specific iteration over some specific String instance object. There is not a named
22.1.5.1 The %StringIteratorPrototype% Object
The
- has properties that are inherited by all String Iterator Objects.
- is an
ordinary object . - has a [[Prototype]] internal slot whose value is
%IteratorPrototype% . - has the following properties:
22.1.5.1.1 %StringIteratorPrototype% .next ( )
- Return ?
GeneratorResume (this value,empty ," ).%StringIteratorPrototype% "
22.1.5.1.2 %StringIteratorPrototype% [ @@toStringTag ]
The initial value of the @@toStringTag property is the String value
This property has the attributes { [[Writable]]:
22.2 RegExp (Regular Expression) Objects
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
22.2.1 Patterns
The RegExp
Syntax
Each \u u u \u
A number of productions in this section are given alternative definitions in section
22.2.1.1 Static Semantics: Early Errors
This section is amended in
- It is a Syntax Error if NcapturingParens ≥ 232 - 1.
-
It is a Syntax Error if
Pattern contains multipleGroupSpecifier s whose enclosedRegExpIdentifierName s have the sameCapturingGroupName .
-
It is a Syntax Error if the MV of the first
DecimalDigits is larger than the MV of the secondDecimalDigits .
-
It is a Syntax Error if the enclosing
Pattern does not contain aGroupSpecifier with an enclosedRegExpIdentifierName whoseCapturingGroupName equals theCapturingGroupName of theRegExpIdentifierName of this production'sGroupName .
-
It is a Syntax Error if the
CapturingGroupNumber ofDecimalEscape is larger than NcapturingParens (22.2.2.1 ).
-
It is a Syntax Error if
IsCharacterClass of the firstClassAtom istrue orIsCharacterClass of the secondClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass of the firstClassAtom isfalse andIsCharacterClass of the secondClassAtom isfalse and theCharacterValue of the firstClassAtom is larger than theCharacterValue of the secondClassAtom .
-
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash istrue orIsCharacterClass ofClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash isfalse andIsCharacterClass ofClassAtom isfalse and theCharacterValue ofClassAtomNoDash is larger than theCharacterValue ofClassAtom .
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the code point value of"$" ,"_" , or some code point matched by theUnicodeIDStart lexical grammar production.
-
It is a Syntax Error if the result of performing
UTF16SurrogatePairToCodePoint on the two code points matched byUnicodeLeadSurrogate andUnicodeTrailSurrogate respectively is not matched by theUnicodeIDStart lexical grammar production.
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the code point value of"$" ,"_" , <ZWNJ>, <ZWJ>, or some code point matched by theUnicodeIDContinue lexical grammar production.
-
It is a Syntax Error if the result of performing
UTF16SurrogatePairToCodePoint on the two code points matched byUnicodeLeadSurrogate andUnicodeTrailSurrogate respectively is not matched by theUnicodeIDContinue lexical grammar production.
-
It is a Syntax Error if the
List of Unicode code points that isSourceText ofUnicodePropertyName is not identical to aList of Unicode code points that is a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 56 . -
It is a Syntax Error if the
List of Unicode code points that isSourceText ofUnicodePropertyValue is not identical to aList of Unicode code points that is a value or value alias for the Unicode property or property alias given bySourceText ofUnicodePropertyName listed in the “Property value and aliases” column of the corresponding tablesTable 58 orTable 59 .
-
It is a Syntax Error if the
List of Unicode code points that isSourceText ofLoneUnicodePropertyNameOrValue is not identical to aList of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column ofTable 58 , nor a binary property or binary property alias listed in the “Property name and aliases” column ofTable 57 .
22.2.1.2 Static Semantics: CapturingGroupNumber
This section is amended in
- Return the MV of
NonZeroDigit .
- Let n be the number of code points in
DecimalDigits . - Return (the MV of
NonZeroDigit × 10n plus the MV ofDecimalDigits ).
The definitions of “the MV of
22.2.1.3 Static Semantics: IsCharacterClass
This section is amended in
- Return
false .
- Return
true .
22.2.1.4 Static Semantics: CharacterValue
This section is amended in
- Return the code point value of U+002D (HYPHEN-MINUS).
- Let ch be the code point matched by
SourceCharacter . - Return the code point value of ch.
- Return the code point value of U+0008 (BACKSPACE).
- Return the code point value of U+002D (HYPHEN-MINUS).
- Return the code point value according to
Table 55 .
| ControlEscape | Code Point Value | Code Point | Unicode Name | Symbol |
|---|---|---|---|---|
t
|
9 |
U+0009
|
CHARACTER TABULATION | <HT> |
n
|
10 |
U+000A
|
LINE FEED (LF) | <LF> |
v
|
11 |
U+000B
|
LINE TABULATION | <VT> |
f
|
12 |
U+000C
|
FORM FEED (FF) | <FF> |
r
|
13 |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
- Let ch be the code point matched by
ControlLetter . - Let i be ch's code point value.
- Return the remainder of dividing i by 32.
- Return the code point value of U+0000 (NULL).
\0 represents the <NUL> character and cannot be followed by a decimal digit.
- Return the MV of
HexEscapeSequence .
- Let lead be the CharacterValue of
HexLeadSurrogate . - Let trail be the CharacterValue of
HexTrailSurrogate . - Let cp be
UTF16SurrogatePairToCodePoint (lead, trail). - Return the code point value of cp.
- Return the MV of
Hex4Digits .
- Return the MV of
CodePoint .
- Return the MV of
HexDigits .
- Let ch be the code point matched by
IdentityEscape . - Return the code point value of ch.
22.2.1.5 Static Semantics: SourceText
- Return the
List , in source text order, of Unicode code points in thesource text matched by this production.
22.2.1.6 Static Semantics: CapturingGroupName
- Let idText be the
source text matched by RegExpIdentifierName . - Let idTextUnescaped be the result of replacing any occurrences of
\RegExpUnicodeEscapeSequence in idText with the code point represented by theRegExpUnicodeEscapeSequence . - Return !
CodePointsToString (idTextUnescaped).
22.2.2 Pattern Semantics
This section is amended in
A regular expression pattern is converted into an
A u. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of
For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character)
Patterns are passed to the RegExp
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
22.2.2.1 Notation
The descriptions below use the following aliases:
-
Input is a
List whose elements are the characters of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved. The notation Input[n] means the nth character of Input, where n can range between 0 (inclusive) and InputLength (exclusive). - InputLength is the number of characters in Input.
-
NcapturingParens is the total number of left-capturing parentheses (i.e. the total number of
Parse Nodes) in the pattern. A left-capturing parenthesis is anyAtom :: ( GroupSpecifier Disjunction ) (pattern character that is matched by the(terminal of the production.Atom :: ( GroupSpecifier Disjunction ) -
DotAll is
true if the RegExp object's [[OriginalFlags]] internal slot contains"s" and otherwise isfalse . -
IgnoreCase is
true if the RegExp object's [[OriginalFlags]] internal slot contains"i" and otherwise isfalse . -
Multiline is
true if the RegExp object's [[OriginalFlags]] internal slot contains"m" and otherwise isfalse . -
Unicode is
true if the RegExp object's [[OriginalFlags]] internal slot contains"u" and otherwise isfalse . -
WordCharacters is the mathematical set that is the union of all sixty-three characters in
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_" (letters, numbers, and U+005F (LOW LINE) in the Unicode Basic Latin block) and all characters c for which c is not in that set butCanonicalize (c) is. WordCharacters cannot contain more than sixty-three characters unless Unicode and IgnoreCase are bothtrue .
Furthermore, the descriptions below use the following internal data structures:
-
A CharSet is a mathematical set of characters. When the Unicode flag is
true , “all characters” means the CharSet containing all code point values; otherwise “all characters” means the CharSet containing all code unit values. -
A State is an ordered pair (endIndex, captures) where endIndex is an
integer and captures is aList of NcapturingParens values. States are used to represent partial match states in the regular expression matching algorithms. The endIndex is one plus the index of the last input character matched so far by the pattern, while captures holds the results of capturing parentheses. The nth element of captures is either aList of characters that represents the value obtained by the nth set of capturing parentheses orundefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, many States may be in use at any time during the matching process. -
A MatchResult is either a State or the special token
failure that indicates that the match failed. -
A Continuation is an
Abstract Closure that takes one State argument and returns a MatchResult result. The Continuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by its State argument. If the match succeeds, the Continuation returns the final State that it reached; if the match fails, the Continuation returnsfailure . -
A Matcher is an
Abstract Closure that takes two arguments—a State and a Continuation—and returns a MatchResult result. A Matcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against Input, starting at the intermediate state given by its State argument. The Continuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a new State, the Matcher then calls Continuation on that new State to test if the rest of the pattern can match as well. If it can, the Matcher returns the State returned by Continuation; if not, the Matcher may try different choices at its choice points, repeatedly calling Continuation until it either succeeds or all possibilities have been exhausted.
22.2.2.2 Pattern
The production
- Evaluate
Disjunction with 1 as its direction argument to obtain a Matcher m. - Return a new
Abstract Closure with parameters (str, index) that captures m and performs the following steps when called:Assert :Type (str) is String.Assert : index is a non-negativeinteger which is ≤ the length of str.- If Unicode is
true , let Input be !StringToCodePoints (str). Otherwise, let Input be aList whose elements are the code units that are the elements of str. Input will be used throughout the algorithms in22.2.2 . Each element of Input is considered to be a character. - Let InputLength be the number of characters contained in Input. This alias will be used throughout the algorithms in
22.2.2 . - Let listIndex be the index into Input of the character that was obtained from element index of str.
- Let c be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
Assert : y is a State.- Return y.
- Let cap be a
List of NcapturingParensundefined values, indexed 1 through NcapturingParens. - Let x be the State (listIndex, cap).
- Return m(x, c).
A Pattern evaluates (“compiles”) to an
22.2.2.3 Disjunction
With parameter direction.
The production
- Evaluate
Alternative with argument direction to obtain a Matcher m. - Return m.
The production
- Evaluate
Alternative with argument direction to obtain a Matcher m1. - Evaluate
Disjunction with argument direction to obtain a Matcher m2. - Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
The | regular expression operator separates two alternatives. The pattern first tries to match the left | produce
/a|ab/.exec("abc")
returns the result
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The order in which the two alternatives are tried is independent of the value of direction.
22.2.2.4 Alternative
With parameter direction.
The production
The production
- Evaluate
Alternative with argument direction to obtain a Matcher m1. - Evaluate
Term with argument direction to obtain a Matcher m2. - If direction = 1, then
- Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
- Else,
Assert : direction is -1.- Return a new Matcher with parameters (x, c) that captures m1 and m2 and performs the following steps when called:
Consecutive
22.2.2.5 Term
With parameter direction.
The production
- Return the Matcher that is the result of evaluating
Assertion .
The resulting Matcher is independent of direction.
The production
- Return the Matcher that is the result of evaluating
Atom with argument direction.
The production
- Evaluate
Atom with argument direction to obtain a Matcher m. - Evaluate
Quantifier to obtain the three results: a non-negativeinteger min, a non-negativeinteger (or +∞) max, and Boolean greedy. Assert : min ≤ max.- Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this
Term . This is the total number of Parse Nodes prior to or enclosing thisAtom :: ( GroupSpecifier Disjunction ) Term . - Let parenCount be the number of left-capturing parentheses in
Atom . This is the total number of Parse Nodes enclosed byAtom :: ( GroupSpecifier Disjunction ) Atom . - Return a new Matcher with parameters (x, c) that captures m, min, max, greedy, parenIndex, and parenCount and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Return !
RepeatMatcher (m, min, max, greedy, x, c, parenIndex, parenCount).
22.2.2.5.1 RepeatMatcher ( m, min, max, greedy, x, c, parenIndex, parenCount )
The abstract operation RepeatMatcher takes arguments m (a Matcher), min (a non-negative
- If max = 0, return c(x).
- Let d be a new Continuation with parameters (y) that captures m, min, max, greedy, x, c, parenIndex, and parenCount and performs the following steps when called:
Assert : y is a State.- If min = 0 and y's endIndex = x's endIndex, return
failure . - If min = 0, let min2 be 0; otherwise let min2 be min - 1.
- If max is +∞, let max2 be +∞; otherwise let max2 be max - 1.
- Return ! RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount).
- Let cap be a copy of x's captures
List . - For each
integer k such that parenIndex < k and k ≤ parenIndex + parenCount, set cap[k] toundefined . - Let e be x's endIndex.
- Let xr be the State (e, cap).
- If min ≠ 0, return m(xr, d).
- If greedy is
false , then- Let z be c(x).
- If z is not
failure , return z. - Return m(xr, d).
- Let z be m(xr, d).
- If z is not
failure , return z. - Return c(x).
An
If the
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")
which returns the gcd in unary notation
Step
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost * clears all captured Strings contained in the quantified
Step
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
22.2.2.6 Assertion
The production
- Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let e be x's endIndex.
- If e = 0, or if Multiline is
true and the character Input[e - 1] is one ofLineTerminator , then- Return c(x).
- Return
failure .
Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is
The production
- Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let e be x's endIndex.
- If e = InputLength, or if Multiline is
true and the character Input[e] is one ofLineTerminator , then- Return c(x).
- Return
failure .
The production
- Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let e be x's endIndex.
- Let a be !
IsWordChar (e - 1). - Let b be !
IsWordChar (e). - If a is
true and b isfalse , or if a isfalse and b istrue , return c(x). - Return
failure .
The production
- Return a new Matcher with parameters (x, c) that captures nothing and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let e be x's endIndex.
- Let a be !
IsWordChar (e - 1). - Let b be !
IsWordChar (e). - If a is
true and b istrue , or if a isfalse and b isfalse , return c(x). - Return
failure .
The production
- Evaluate
Disjunction with 1 as its direction argument to obtain a Matcher m. - Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
Assert : y is a State.- Return y.
- Let r be m(x, d).
- If r is
failure , returnfailure . - Let y be r's State.
- Let cap be y's captures
List . - Let xe be x's endIndex.
- Let z be the State (xe, cap).
- Return c(z).
The production
- Evaluate
Disjunction with 1 as its direction argument to obtain a Matcher m. - Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
The production
- Evaluate
Disjunction with -1 as its direction argument to obtain a Matcher m. - Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let d be a new Continuation with parameters (y) that captures nothing and performs the following steps when called:
Assert : y is a State.- Return y.
- Let r be m(x, d).
- If r is
failure , returnfailure . - Let y be r's State.
- Let cap be y's captures
List . - Let xe be x's endIndex.
- Let z be the State (xe, cap).
- Return c(z).
The production
- Evaluate
Disjunction with -1 as its direction argument to obtain a Matcher m. - Return a new Matcher with parameters (x, c) that captures m and performs the following steps when called:
22.2.2.6.1 IsWordChar ( e )
The abstract operation IsWordChar takes argument e (an
- If e = -1 or e is InputLength, return
false . - Let c be the character Input[e].
- If c is in WordCharacters, return
true . - Return
false .
22.2.2.7 Quantifier
The production
- Evaluate
QuantifierPrefix to obtain the two results: aninteger min and aninteger (or +∞) max. - Return the three results min, max, and
true .
The production
- Evaluate
QuantifierPrefix to obtain the two results: aninteger min and aninteger (or +∞) max. - Return the three results min, max, and
false .
The production
- Return the two results 0 and +∞.
The production
- Return the two results 1 and +∞.
The production
- Return the two results 0 and 1.
The production
- Let i be the MV of
DecimalDigits (see12.8.3 ). - Return the two results i and i.
The production
- Let i be the MV of
DecimalDigits . - Return the two results i and +∞.
The production
- Let i be the MV of the first
DecimalDigits . - Let j be the MV of the second
DecimalDigits . - Return the two results i and j.
22.2.2.8 Atom
With parameter direction.
The production
- Let ch be the character matched by
PatternCharacter . - Let A be a one-element CharSet containing the character ch.
- Return !
CharacterSetMatcher (A,false , direction).
The production
- Let A be the CharSet of all characters.
- If DotAll is not
true , then- Remove from A all characters corresponding to a code point on the right-hand side of the
LineTerminator production.
- Remove from A all characters corresponding to a code point on the right-hand side of the
- Return !
CharacterSetMatcher (A,false , direction).
The production
- Return the Matcher that is the result of evaluating
AtomEscape with argument direction.
The production
- Evaluate
CharacterClass to obtain a CharSet A and a Boolean invert. - Return !
CharacterSetMatcher (A, invert, direction).
The production
- Evaluate
Disjunction with argument direction to obtain a Matcher m. - Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of this
Atom . This is the total number of Parse Nodes prior to or enclosing thisAtom :: ( GroupSpecifier Disjunction ) Atom . - Return a new Matcher with parameters (x, c) that captures direction, m, and parenIndex and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let d be a new Continuation with parameters (y) that captures x, c, direction, and parenIndex and performs the following steps when called:
- Return m(x, d).
The production
- Return the Matcher that is the result of evaluating
Disjunction with argument direction.
22.2.2.8.1 CharacterSetMatcher ( A, invert, direction )
The abstract operation CharacterSetMatcher takes arguments A (a CharSet), invert (a Boolean), and direction (1 or -1). It performs the following steps when called:
- Return a new Matcher with parameters (x, c) that captures A, invert, and direction and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let e be x's endIndex.
- Let f be e + direction.
- If f < 0 or f > InputLength, return
failure . - Let index be
min (e, f). - Let ch be the character Input[index].
- Let cc be
Canonicalize (ch). - If there exists a member a of A such that
Canonicalize (a) is cc, let found betrue . Otherwise, let found befalse . - If invert is
false and found isfalse , returnfailure . - If invert is
true and found istrue , returnfailure . - Let cap be x's captures
List . - Let y be the State (f, cap).
- Return c(y).
22.2.2.8.2 Canonicalize ( ch )
The abstract operation Canonicalize takes argument ch (a character). It performs the following steps when called:
- If Unicode is
true and IgnoreCase istrue , then- If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
- Return ch.
- If IgnoreCase is
false , return ch. Assert : ch is a UTF-16 code unit.- Let cp be the code point whose numeric value is that of ch.
- Let u be the result of toUppercase(« cp »), according to the Unicode Default Case Conversion algorithm.
- Let uStr be !
CodePointsToString (u). - If uStr does not consist of a single code unit, return ch.
- Let cu be uStr's single code unit element.
- If the numeric value of ch ≥ 128 and the numeric value of cu < 128, return ch.
- Return cu.
Parentheses of the form ( ) serve both to group the components of the \ followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching (?: ) instead.
The form (?= ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?= form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
The form (?! ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
In case-insignificant matches when Unicode is ß (U+00DF) to SS. It may however map a code point outside the Basic Latin range to a character within, for example, ſ (U+017F) to s. Such characters are not mapped if Unicode is /[a-z]/i, but they will match /[a-z]/ui.
22.2.2.8.3 UnicodeMatchProperty ( p )
The abstract operation UnicodeMatchProperty takes argument p (a
Assert : p is aList of Unicode code points that is identical to aList of Unicode code points that is a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 56 orTable 57 .- Let c be the canonical
property name of p as given in the “Canonicalproperty name ” column of the corresponding row. - Return the
List of Unicode code points of c.
Implementations must support the Unicode property names and aliases listed in
For example, Script_Extensions (scx (property alias) are valid, but script_extensions or Scx aren't.
The listed properties form a superset of what UTS18 RL1.2 requires.
| Canonical |
|
|---|---|
General_Category |
General_Category |
gc |
|
Script |
Script |
sc |
|
Script_Extensions |
Script_Extensions |
scx |
| Canonical |
|
|---|---|
ASCII |
ASCII |
ASCII_Hex_Digit |
ASCII_Hex_Digit |
AHex |
|
Alphabetic |
Alphabetic |
Alpha |
|
Any |
Any |
Assigned |
Assigned |
Bidi_Control |
Bidi_Control |
Bidi_C |
|
Bidi_Mirrored |
Bidi_Mirrored |
Bidi_M |
|
Case_Ignorable |
Case_Ignorable |
CI |
|
Cased |
Cased |
Changes_When_Casefolded |
Changes_When_Casefolded |
CWCF |
|
Changes_When_Casemapped |
Changes_When_Casemapped |
CWCM |
|
Changes_When_Lowercased |
Changes_When_Lowercased |
CWL |
|
Changes_When_NFKC_Casefolded |
Changes_When_NFKC_Casefolded |
CWKCF |
|
Changes_When_Titlecased |
Changes_When_Titlecased |
CWT |
|
Changes_When_Uppercased |
Changes_When_Uppercased |
CWU |
|
Dash |
Dash |
Default_Ignorable_Code_Point |
Default_Ignorable_Code_Point |
DI |
|
Deprecated |
Deprecated |
Dep |
|
Diacritic |
Diacritic |
Dia |
|
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
EComp |
|
Emoji_Modifier |
Emoji_Modifier |
EMod |
|
Emoji_Modifier_Base |
Emoji_Modifier_Base |
EBase |
|
Emoji_Presentation |
Emoji_Presentation |
EPres |
|
Extended_Pictographic |
Extended_Pictographic |
ExtPict |
|
Extender |
Extender |
Ext |
|
Grapheme_Base |
Grapheme_Base |
Gr_Base |
|
Grapheme_Extend |
Grapheme_Extend |
Gr_Ext |
|
Hex_Digit |
Hex_Digit |
Hex |
|
IDS_Binary_Operator |
IDS_Binary_Operator |
IDSB |
|
IDS_Trinary_Operator |
IDS_Trinary_Operator |
IDST |
|
ID_Continue |
ID_Continue |
IDC |
|
ID_Start |
ID_Start |
IDS |
|
Ideographic |
Ideographic |
Ideo |
|
Join_Control |
Join_Control |
Join_C |
|
Logical_Order_Exception |
Logical_Order_Exception |
LOE |
|
Lowercase |
Lowercase |
Lower |
|
Math |
Math |
Noncharacter_Code_Point |
Noncharacter_Code_Point |
NChar |
|
Pattern_Syntax |
Pattern_Syntax |
Pat_Syn |
|
Pattern_White_Space |
Pattern_White_Space |
Pat_WS |
|
Quotation_Mark |
Quotation_Mark |
QMark |
|
Radical |
Radical |
Regional_Indicator |
Regional_Indicator |
RI |
|
Sentence_Terminal |
Sentence_Terminal |
STerm |
|
Soft_Dotted |
Soft_Dotted |
SD |
|
Terminal_Punctuation |
Terminal_Punctuation |
Term |
|
Unified_Ideograph |
Unified_Ideograph |
UIdeo |
|
Uppercase |
Uppercase |
Upper |
|
Variation_Selector |
Variation_Selector |
VS |
|
White_Space |
White_Space |
space |
|
XID_Continue |
XID_Continue |
XIDC |
|
XID_Start |
XID_Start |
XIDS |
22.2.2.8.4 UnicodeMatchPropertyValue ( p, v )
The abstract operation UnicodeMatchPropertyValue takes arguments p (a
Assert : p is aList of Unicode code points that is identical to aList of Unicode code points that is a canonical, unaliased Unicodeproperty name listed in the “Canonicalproperty name ” column ofTable 56 .Assert : v is aList of Unicode code points that is identical to aList of Unicode code points that is a property value or property value alias for Unicode property p listed in the “Property value and aliases” column ofTable 58 orTable 59 .- Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
- Return the
List of Unicode code points of value.
Implementations must support the Unicode property value names and aliases listed in
For example, Xpeo and Old_Persian are valid Script_Extensions values, but xpeo and Old Persian aren't.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is prefix is not supported.
| Property value and aliases | Canonical property value |
|---|---|
Cased_Letter |
Cased_Letter |
LC |
|
Close_Punctuation |
Close_Punctuation |
Pe |
|
Connector_Punctuation |
Connector_Punctuation |
Pc |
|
Control |
Control |
Cc |
|
cntrl |
|
Currency_Symbol |
Currency_Symbol |
Sc |
|
Dash_Punctuation |
Dash_Punctuation |
Pd |
|
Decimal_Number |
Decimal_Number |
Nd |
|
digit |
|
Enclosing_Mark |
Enclosing_Mark |
Me |
|
Final_Punctuation |
Final_Punctuation |
Pf |
|
Format |
Format |
Cf |
|
Initial_Punctuation |
Initial_Punctuation |
Pi |
|
Letter |
Letter |
L |
|
Letter_Number |
Letter_Number |
Nl |
|
Line_Separator |
Line_Separator |
Zl |
|
Lowercase_Letter |
Lowercase_Letter |
Ll |
|
Mark |
Mark |
M |
|
Combining_Mark |
|
Math_Symbol |
Math_Symbol |
Sm |
|
Modifier_Letter |
Modifier_Letter |
Lm |
|
Modifier_Symbol |
Modifier_Symbol |
Sk |
|
Nonspacing_Mark |
Nonspacing_Mark |
Mn |
|
Number |
Number |
N |
|
Open_Punctuation |
Open_Punctuation |
Ps |
|
Other |
Other |
C |
|
Other_Letter |
Other_Letter |
Lo |
|
Other_Number |
Other_Number |
No |
|
Other_Punctuation |
Other_Punctuation |
Po |
|
Other_Symbol |
Other_Symbol |
So |
|
Paragraph_Separator |
Paragraph_Separator |
Zp |
|
Private_Use |
Private_Use |
Co |
|
Punctuation |
Punctuation |
P |
|
punct |
|
Separator |
Separator |
Z |
|
Space_Separator |
Space_Separator |
Zs |
|
Spacing_Mark |
Spacing_Mark |
Mc |
|
Surrogate |
Surrogate |
Cs |
|
Symbol |
Symbol |
S |
|
Titlecase_Letter |
Titlecase_Letter |
Lt |
|
Unassigned |
Unassigned |
Cn |
|
Uppercase_Letter |
Uppercase_Letter |
Lu |
| Property value and aliases | Canonical property value |
|---|---|
Adlam |
Adlam |
Adlm |
|
Ahom |
Ahom |
Anatolian_Hieroglyphs |
Anatolian_Hieroglyphs |
Hluw |
|
Arabic |
Arabic |
Arab |
|
Armenian |
Armenian |
Armn |
|
Avestan |
Avestan |
Avst |
|
Balinese |
Balinese |
Bali |
|
Bamum |
Bamum |
Bamu |
|
Bassa_Vah |
Bassa_Vah |
Bass |
|
Batak |
Batak |
Batk |
|
Bengali |
Bengali |
Beng |
|
Bhaiksuki |
Bhaiksuki |
Bhks |
|
Bopomofo |
Bopomofo |
Bopo |
|
Brahmi |
Brahmi |
Brah |
|
Braille |
Braille |
Brai |
|
Buginese |
Buginese |
Bugi |
|
Buhid |
Buhid |
Buhd |
|
Canadian_Aboriginal |
Canadian_Aboriginal |
Cans |
|
Carian |
Carian |
Cari |
|
Caucasian_Albanian |
Caucasian_Albanian |
Aghb |
|
Chakma |
Chakma |
Cakm |
|
Cham |
Cham |
Chorasmian |
Chorasmian |
Chrs |
|
Cherokee |
Cherokee |
Cher |
|
Common |
Common |
Zyyy |
|
Coptic |
Coptic |
Copt |
|
Qaac |
|
Cuneiform |
Cuneiform |
Xsux |
|
Cypriot |
Cypriot |
Cprt |
|
Cyrillic |
Cyrillic |
Cyrl |
|
Deseret |
Deseret |
Dsrt |
|
Devanagari |
Devanagari |
Deva |
|
Dives_Akuru |
Dives_Akuru |
Diak |
|
Dogra |
Dogra |
Dogr |
|
Duployan |
Duployan |
Dupl |
|
Egyptian_Hieroglyphs |
Egyptian_Hieroglyphs |
Egyp |
|
Elbasan |
Elbasan |
Elba |
|
Elymaic |
Elymaic |
Elym |
|
Ethiopic |
Ethiopic |
Ethi |
|
Georgian |
Georgian |
Geor |
|
Glagolitic |
Glagolitic |
Glag |
|
Gothic |
Gothic |
Goth |
|
Grantha |
Grantha |
Gran |
|
Greek |
Greek |
Grek |
|
Gujarati |
Gujarati |
Gujr |
|
Gunjala_Gondi |
Gunjala_Gondi |
Gong |
|
Gurmukhi |
Gurmukhi |
Guru |
|
Han |
Han |
Hani |
|
Hangul |
Hangul |
Hang |
|
Hanifi_Rohingya |
Hanifi_Rohingya |
Rohg |
|
Hanunoo |
Hanunoo |
Hano |
|
Hatran |
Hatran |
Hatr |
|
Hebrew |
Hebrew |
Hebr |
|
Hiragana |
Hiragana |
Hira |
|
Imperial_Aramaic |
Imperial_Aramaic |
Armi |
|
Inherited |
Inherited |
Zinh |
|
Qaai |
|
Inscriptional_Pahlavi |
Inscriptional_Pahlavi |
Phli |
|
Inscriptional_Parthian |
Inscriptional_Parthian |
Prti |
|
Javanese |
Javanese |
Java |
|
Kaithi |
Kaithi |
Kthi |
|
Kannada |
Kannada |
Knda |
|
Katakana |
Katakana |
Kana |
|
Kayah_Li |
Kayah_Li |
Kali |
|
Kharoshthi |
Kharoshthi |
Khar |
|
Khitan_Small_Script |
Khitan_Small_Script |
Kits |
|
Khmer |
Khmer |
Khmr |
|
Khojki |
Khojki |
Khoj |
|
Khudawadi |
Khudawadi |
Sind |
|
Lao |
Lao |
Laoo |
|
Latin |
Latin |
Latn |
|
Lepcha |
Lepcha |
Lepc |
|
Limbu |
Limbu |
Limb |
|
Linear_A |
Linear_A |
Lina |
|
Linear_B |
Linear_B |
Linb |
|
Lisu |
Lisu |
Lycian |
Lycian |
Lyci |
|
Lydian |
Lydian |
Lydi |
|
Mahajani |
Mahajani |
Mahj |
|
Makasar |
Makasar |
Maka |
|
Malayalam |
Malayalam |
Mlym |
|
Mandaic |
Mandaic |
Mand |
|
Manichaean |
Manichaean |
Mani |
|
Marchen |
Marchen |
Marc |
|
Medefaidrin |
Medefaidrin |
Medf |
|
Masaram_Gondi |
Masaram_Gondi |
Gonm |
|
Meetei_Mayek |
Meetei_Mayek |
Mtei |
|
Mende_Kikakui |
Mende_Kikakui |
Mend |
|
Meroitic_Cursive |
Meroitic_Cursive |
Merc |
|
Meroitic_Hieroglyphs |
Meroitic_Hieroglyphs |
Mero |
|
Miao |
Miao |
Plrd |
|
Modi |
Modi |
Mongolian |
Mongolian |
Mong |
|
Mro |
Mro |
Mroo |
|
Multani |
Multani |
Mult |
|
Myanmar |
Myanmar |
Mymr |
|
Nabataean |
Nabataean |
Nbat |
|
Nandinagari |
Nandinagari |
Nand |
|
New_Tai_Lue |
New_Tai_Lue |
Talu |
|
Newa |
Newa |
Nko |
Nko |
Nkoo |
|
Nushu |
Nushu |
Nshu |
|
Nyiakeng_Puachue_Hmong |
Nyiakeng_Puachue_Hmong |
Hmnp |
|
Ogham |
Ogham |
Ogam |
|
Ol_Chiki |
Ol_Chiki |
Olck |
|
Old_Hungarian |
Old_Hungarian |
Hung |
|
Old_Italic |
Old_Italic |
Ital |
|
Old_North_Arabian |
Old_North_Arabian |
Narb |
|
Old_Permic |
Old_Permic |
Perm |
|
Old_Persian |
Old_Persian |
Xpeo |
|
Old_Sogdian |
Old_Sogdian |
Sogo |
|
Old_South_Arabian |
Old_South_Arabian |
Sarb |
|
Old_Turkic |
Old_Turkic |
Orkh |
|
Oriya |
Oriya |
Orya |
|
Osage |
Osage |
Osge |
|
Osmanya |
Osmanya |
Osma |
|
Pahawh_Hmong |
Pahawh_Hmong |
Hmng |
|
Palmyrene |
Palmyrene |
Palm |
|
Pau_Cin_Hau |
Pau_Cin_Hau |
Pauc |
|
Phags_Pa |
Phags_Pa |
Phag |
|
Phoenician |
Phoenician |
Phnx |
|
Psalter_Pahlavi |
Psalter_Pahlavi |
Phlp |
|
Rejang |
Rejang |
Rjng |
|
Runic |
Runic |
Runr |
|
Samaritan |
Samaritan |
Samr |
|
Saurashtra |
Saurashtra |
Saur |
|
Sharada |
Sharada |
Shrd |
|
Shavian |
Shavian |
Shaw |
|
Siddham |
Siddham |
Sidd |
|
SignWriting |
SignWriting |
Sgnw |
|
Sinhala |
Sinhala |
Sinh |
|
Sogdian |
Sogdian |
Sogd |
|
Sora_Sompeng |
Sora_Sompeng |
Sora |
|
Soyombo |
Soyombo |
Soyo |
|
Sundanese |
Sundanese |
Sund |
|
Syloti_Nagri |
Syloti_Nagri |
Sylo |
|
Syriac |
Syriac |
Syrc |
|
Tagalog |
Tagalog |
Tglg |
|
Tagbanwa |
Tagbanwa |
Tagb |
|
Tai_Le |
Tai_Le |
Tale |
|
Tai_Tham |
Tai_Tham |
Lana |
|
Tai_Viet |
Tai_Viet |
Tavt |
|
Takri |
Takri |
Takr |
|
Tamil |
Tamil |
Taml |
|
Tangut |
Tangut |
Tang |
|
Telugu |
Telugu |
Telu |
|
Thaana |
Thaana |
Thaa |
|
Thai |
Thai |
Tibetan |
Tibetan |
Tibt |
|
Tifinagh |
Tifinagh |
Tfng |
|
Tirhuta |
Tirhuta |
Tirh |
|
Ugaritic |
Ugaritic |
Ugar |
|
Vai |
Vai |
Vaii |
|
Wancho |
Wancho |
Wcho |
|
Warang_Citi |
Warang_Citi |
Wara |
|
Yezidi |
Yezidi |
Yezi |
|
Yi |
Yi |
Yiii |
|
Zanabazar_Square |
Zanabazar_Square |
Zanb |
22.2.2.9 AtomEscape
With parameter direction.
The production
- Evaluate
DecimalEscape to obtain aninteger n. Assert : n ≤ NcapturingParens.- Return !
BackreferenceMatcher (n, direction).
The production
- Evaluate
CharacterEscape to obtain a character ch. - Let A be a one-element CharSet containing the character ch.
- Return !
CharacterSetMatcher (A,false , direction).
The production
- Evaluate
CharacterClassEscape to obtain a CharSet A. - Return !
CharacterSetMatcher (A,false , direction).
An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (
The production
- Search the enclosing
Pattern for an instance of aGroupSpecifier containing aRegExpIdentifierName which has aCapturingGroupName equal to theCapturingGroupName of theRegExpIdentifierName contained inGroupName . Assert : A unique suchGroupSpecifier is found.- Let parenIndex be the number of left-capturing parentheses in the entire regular expression that occur to the left of the located
GroupSpecifier . This is the total number of Parse Nodes prior to or enclosing the locatedAtom :: ( GroupSpecifier Disjunction ) GroupSpecifier , including its immediately enclosingAtom . - Return !
BackreferenceMatcher (parenIndex, direction).
22.2.2.9.1 BackreferenceMatcher ( n, direction )
The abstract operation BackreferenceMatcher takes arguments n (a positive
Assert : n ≥ 1.- Return a new Matcher with parameters (x, c) that captures n and direction and performs the following steps when called:
Assert : x is a State.Assert : c is a Continuation.- Let cap be x's captures
List . - Let s be cap[n].
- If s is
undefined , return c(x). - Let e be x's endIndex.
- Let len be the number of elements in s.
- Let f be e + direction × len.
- If f < 0 or f > InputLength, return
failure . - Let g be
min (e, f). - If there exists an
integer i between 0 (inclusive) and len (exclusive) such thatCanonicalize (s[i]) is not the same character value asCanonicalize (Input[g + i]), returnfailure . - Let y be the State (f, cap).
- Return c(y).
22.2.2.10 CharacterEscape
The
- Let cv be the
CharacterValue of thisCharacterEscape . - Return the character whose character value is cv.
22.2.2.11 DecimalEscape
The
- Return the
CapturingGroupNumber of thisDecimalEscape .
If \ is followed by a decimal number n whose first digit is not 0, then the escape sequence is considered to be a backreference. It is an error if n is greater than the total number of left-capturing parentheses in the entire regular expression.
22.2.2.12 CharacterClassEscape
The production
- Return the ten-element CharSet containing the characters
0through9inclusive.
The production
- Return the CharSet containing all characters not in the CharSet returned by
.CharacterClassEscape :: d
The production
- Return the CharSet containing all characters corresponding to a code point on the right-hand side of the
WhiteSpace orLineTerminator productions.
The production
- Return the CharSet containing all characters not in the CharSet returned by
.CharacterClassEscape :: s
The production
- Return WordCharacters.
The production
- Return the CharSet containing all characters not in the CharSet returned by
.CharacterClassEscape :: w
The production
- Return the CharSet containing all Unicode code points included in the CharSet returned by
UnicodePropertyValueExpression .
The production
- Return the CharSet containing all Unicode code points not included in the CharSet returned by
UnicodePropertyValueExpression .
The production
- Let ps be
SourceText ofUnicodePropertyName . - Let p be !
UnicodeMatchProperty (ps). Assert : p is a Unicodeproperty name or property alias listed in the “Property name and aliases” column ofTable 56 .- Let vs be
SourceText ofUnicodePropertyValue . - Let v be !
UnicodeMatchPropertyValue (p, vs). - Return the CharSet containing all Unicode code points whose character database definition includes the property p with value v.
The production
- Let s be
SourceText ofLoneUnicodePropertyNameOrValue . - If !
UnicodeMatchPropertyValue (General_Category, s) is identical to aList of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column ofTable 58 , then- Return the CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
- Let p be !
UnicodeMatchProperty (s). Assert : p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column ofTable 57 .- Return the CharSet containing all Unicode code points whose character database definition includes the property p with value “True”.
22.2.2.13 CharacterClass
The production
- Evaluate
ClassRanges to obtain a CharSet A. - Return the two results A and
false .
The production
- Evaluate
ClassRanges to obtain a CharSet A. - Return the two results A and
true .
22.2.2.14 ClassRanges
The production
- Return the empty CharSet.
The production
- Return the CharSet that is the result of evaluating
NonemptyClassRanges .
22.2.2.15 NonemptyClassRanges
The production
- Return the CharSet that is the result of evaluating
ClassAtom .
The production
- Evaluate
ClassAtom to obtain a CharSet A. - Evaluate
NonemptyClassRangesNoDash to obtain a CharSet B. - Return the union of CharSets A and B.
The production
- Evaluate the first
ClassAtom to obtain a CharSet A. - Evaluate the second
ClassAtom to obtain a CharSet B. - Evaluate
ClassRanges to obtain a CharSet C. - Let D be !
CharacterRange (A, B). - Return the union of D and C.
22.2.2.15.1 CharacterRange ( A, B )
The abstract operation CharacterRange takes arguments A (a CharSet) and B (a CharSet). It performs the following steps when called:
Assert : A and B each contain exactly one character.- Let a be the one character in CharSet A.
- Let b be the one character in CharSet B.
- Let i be the character value of character a.
- Let j be the character value of character b.
Assert : i ≤ j.- Return the CharSet containing all characters with a character value greater than or equal to i and less than or equal to j.
22.2.2.16 NonemptyClassRangesNoDash
The production
- Return the CharSet that is the result of evaluating
ClassAtom .
The production
- Evaluate
ClassAtomNoDash to obtain a CharSet A. - Evaluate
NonemptyClassRangesNoDash to obtain a CharSet B. - Return the union of CharSets A and B.
The production
- Evaluate
ClassAtomNoDash to obtain a CharSet A. - Evaluate
ClassAtom to obtain a CharSet B. - Evaluate
ClassRanges to obtain a CharSet C. - Let D be !
CharacterRange (A, B). - Return the union of D and C.
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all upper and lower-case letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.
A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
22.2.2.17 ClassAtom
The production
- Return the CharSet containing the single character
-U+002D (HYPHEN-MINUS).
The production
- Return the CharSet that is the result of evaluating
ClassAtomNoDash .
22.2.2.18 ClassAtomNoDash
The production
- Return the CharSet containing the character matched by
SourceCharacter .
The production
- Return the CharSet that is the result of evaluating
ClassEscape .
22.2.2.19 ClassEscape
The
- Let cv be the
CharacterValue of thisClassEscape . - Let c be the character whose character value is cv.
- Return the CharSet containing the single character c.
- Return the CharSet that is the result of evaluating
CharacterClassEscape .
A \b, \B, and backreferences. Inside a \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a
22.2.3 The RegExp Constructor
The RegExp
- is
%RegExp% . - is the initial value of the
"RegExp" property of theglobal object . - creates and initializes a new RegExp object when called as a function rather than as a
constructor . Thus the function callRegExp(…)is equivalent to the object creation expressionnew RegExp(…)with the same arguments. - is designed to be subclassable. It may be used as the value of an
extendsclause of a class definition. Subclass constructors that intend to inherit the specified RegExp behaviour must include asupercall to the RegExpconstructor to create and initialize subclass instances with the necessary internal slots.
22.2.3.1 RegExp ( pattern, flags )
The following steps are taken:
- Let patternIsRegExp be ?
IsRegExp (pattern). - If NewTarget is
undefined , then- Let newTarget be the
active function object . - If patternIsRegExp is
true and flags isundefined , then
- Let newTarget be the
- Else, let newTarget be NewTarget.
- If
Type (pattern) is Object and pattern has a [[RegExpMatcher]] internal slot, then- Let P be pattern.[[OriginalSource]].
- If flags is
undefined , let F be pattern.[[OriginalFlags]]. - Else, let F be flags.
- Else if patternIsRegExp is
true , then - Else,
- Let P be pattern.
- Let F be flags.
- Let O be ?
RegExpAlloc (newTarget). - Return ?
RegExpInitialize (O, P, F).
If pattern is supplied using a
22.2.3.2 Abstract Operations for the RegExp Constructor
22.2.3.2.1 RegExpAlloc ( newTarget )
The abstract operation RegExpAlloc takes argument newTarget. It performs the following steps when called:
- Let obj be ?
OrdinaryCreateFromConstructor (newTarget," , « [[RegExpMatcher]], [[OriginalSource]], [[OriginalFlags]] »).%RegExp.prototype% " - Perform !
DefinePropertyOrThrow (obj,"lastIndex" , PropertyDescriptor { [[Writable]]:true , [[Enumerable]]:false , [[Configurable]]:false }). - Return obj.
22.2.3.2.2 RegExpInitialize ( obj, pattern, flags )
The abstract operation RegExpInitialize takes arguments obj, pattern, and flags. It performs the following steps when called:
- If pattern is
undefined , let P be the empty String. - Else, let P be ?
ToString (pattern). - If flags is
undefined , let F be the empty String. - Else, let F be ?
ToString (flags). - If F contains any code unit other than
"g" ,"i" ,"m" ,"s" ,"u" , or"y" or if it contains the same code unit more than once, throw aSyntaxError exception. - If F contains
"u" , let u betrue ; else let u befalse . - If u is
true , then- Let patternText be !
StringToCodePoints (P). - Let patternCharacters be a
List whose elements are the code points of patternText.
- Let patternText be !
- Else,
- Let patternText be the result of interpreting each of P's 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
- Let patternCharacters be a
List whose elements are the code unit elements of P.
- Let parseResult be
ParsePattern (patternText, u). - If parseResult is a non-empty
List ofSyntaxError objects, throw aSyntaxError exception. Assert : parseResult is aParse Node forPattern .Set obj.[[OriginalSource]] to P.Set obj.[[OriginalFlags]] to F.Set obj.[[RegExpMatcher]] to theAbstract Closure that evaluates parseResult by applying the semantics provided in22.2.2 using patternCharacters as the pattern'sList ofSourceCharacter values and F as the flag parameters.- Perform ?
Set (obj,"lastIndex" ,+0 𝔽,true ). - Return obj.
22.2.3.2.3 Static Semantics: ParsePattern ( patternText, u )
The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points) and u (a Boolean). It performs the following steps when called:
22.2.3.2.4 RegExpCreate ( P, F )
The abstract operation RegExpCreate takes arguments P and F. It performs the following steps when called:
- Let obj be ?
RegExpAlloc (%RegExp% ). - Return ?
RegExpInitialize (obj, P, F).
22.2.3.2.5 EscapeRegExpPattern ( P, F )
The abstract operation EscapeRegExpPattern takes arguments P and F. It performs the following steps when called:
- Let S be a String in the form of a
Pattern ([~U] Pattern if F contains[+U] "u" ) equivalent to P interpreted as UTF-16 encoded Unicode code points (6.1.4 ), in which certain code points are escaped as described below. S may or may not be identical to P; however, theAbstract Closure that would result from evaluating S as aPattern ([~U] Pattern if F contains[+U] "u" ) must behave identically to theAbstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for P and F must produce identical results. - The code points
/or anyLineTerminator occurring in the pattern shall be escaped in S as necessary to ensure that thestring-concatenation of"/" , S,"/" , and F can be parsed (in an appropriate lexical context) as aRegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if P is"/" , then S could be"\/" or"\u002F" , among other possibilities, but not"/" , because///followed by F would be parsed as aSingleLineComment rather than aRegularExpressionLiteral . If P is the empty String, this specification can be met by letting S be"(?:)" . - Return S.
22.2.4 Properties of the RegExp Constructor
The RegExp
- has a [[Prototype]] internal slot whose value is
%Function.prototype% . - has the following properties:
22.2.4.1 RegExp.prototype
The initial value of RegExp.prototype is the
This property has the attributes { [[Writable]]:
22.2.4.2 get RegExp [ @@species ]
RegExp[@@species] is an
- Return the
this value.
The value of the
RegExp prototype methods normally use their
22.2.5 Properties of the RegExp Prototype Object
The RegExp prototype object:
- is
%RegExp.prototype% . - is an
ordinary object . - is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
- has a [[Prototype]] internal slot whose value is
%Object.prototype% .
The RegExp prototype object does not have a
22.2.5.1 RegExp.prototype.constructor
The initial value of RegExp.prototype.constructor is
22.2.5.2 RegExp.prototype.exec ( string )
Performs a regular expression match of string against the regular expression and returns an Array object containing the results of the match, or
The String
- Let R be the
this value. - Perform ?
RequireInternalSlot (R, [[RegExpMatcher]]). - Let S be ?
ToString (string). - Return ?
RegExpBuiltinExec (R, S).
22.2.5.2.1 RegExpExec ( R, S )
The abstract operation RegExpExec takes arguments R and S. It performs the following steps when called:
Assert :Type (R) is Object.Assert :Type (S) is String.- Let exec be ?
Get (R,"exec" ). - If
IsCallable (exec) istrue , then - Perform ?
RequireInternalSlot (R, [[RegExpMatcher]]). - Return ?
RegExpBuiltinExec (R, S).
If a callable
22.2.5.2.2 RegExpBuiltinExec ( R, S )
The abstract operation RegExpBuiltinExec takes arguments R and S. It performs the following steps when called:
Assert : R is an initialized RegExp instance.Assert :Type (S) is String.- Let length be the number of code units in S.
- Let lastIndex be ℝ(?
ToLength (?Get (R,"lastIndex" ))). - Let flags be R.[[OriginalFlags]].
- If flags contains
"g" , let global betrue ; else let global befalse . - If flags contains
"y" , let sticky betrue ; else let sticky befalse . - If global is
false and sticky isfalse , set lastIndex to 0. - Let matcher be R.[[RegExpMatcher]].
- If flags contains
"u" , let fullUnicode betrue ; else let fullUnicode befalse . - Let matchSucceeded be
false . - Repeat, while matchSucceeded is
false ,- If lastIndex > length, then
- If global is
true or sticky istrue , then- Perform ?
Set (R,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Return
null .
- If global is
- Let r be matcher(S, lastIndex).
- If r is
failure , then- If sticky is
true , then- Perform ?
Set (R,"lastIndex" ,+0 𝔽,true ). - Return
null .
- Perform ?
Set lastIndex toAdvanceStringIndex (S, lastIndex, fullUnicode).
- If sticky is
- Else,
- If lastIndex > length, then
- Let e be r's endIndex value.
- If fullUnicode is
true , then- e is an index into the Input character list, derived from S, matched by matcher. Let eUTF be the smallest index into S that corresponds to the character at element e of Input. If e is greater than or equal to the number of elements in Input, then eUTF is the number of code units in S.
Set e to eUTF.
- If global is
true or sticky istrue , then- Perform ?
Set (R,"lastIndex" , 𝔽(e),true ).
- Perform ?
- Let n be the number of elements in r's captures
List . (This is the same value as22.2.2.1 's NcapturingParens.) Assert : n < 232 - 1.- Let A be !
ArrayCreate (n + 1). Assert : Themathematical value of A's"length" property is n + 1.- Perform !
CreateDataPropertyOrThrow (A,"index" , 𝔽(lastIndex)). - Perform !
CreateDataPropertyOrThrow (A,"input" , S). - Let matchedSubstr be the
substring of S from lastIndex to e. - Perform !
CreateDataPropertyOrThrow (A,"0" , matchedSubstr). - If R contains any
GroupName , then- Let groups be !
OrdinaryObjectCreate (null ).
- Let groups be !
- Else,
- Let groups be
undefined .
- Let groups be
- Perform !
CreateDataPropertyOrThrow (A,"groups" , groups). - For each
integer i such that i ≥ 1 and i ≤ n, do- Let captureI be ith element of r's captures
List . - If captureI is
undefined , let capturedValue beundefined . - Else if fullUnicode is
true , thenAssert : captureI is aList of code points.- Let capturedValue be !
CodePointsToString (captureI).
- Else,
- Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽(i)), capturedValue). - If the ith capture of R was defined with a
GroupName , then- Let s be the
CapturingGroupName of the correspondingRegExpIdentifierName . - Perform !
CreateDataPropertyOrThrow (groups, s, capturedValue).
- Let s be the
- Let captureI be ith element of r's captures
- Return A.
22.2.5.2.3 AdvanceStringIndex ( S, index, unicode )
The abstract operation AdvanceStringIndex takes arguments S (a String), index (a non-negative
Assert : index ≤ 253 - 1.- If unicode is
false , return index + 1. - Let length be the number of code units in S.
- If index + 1 ≥ length, return index + 1.
- Let cp be !
CodePointAt (S, index). - Return index + cp.[[CodeUnitCount]].
22.2.5.3 get RegExp.prototype.dotAll
RegExp.prototype.dotAll is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x0073 (LATIN SMALL LETTER S), return
true . - Return
false .
22.2.5.4 get RegExp.prototype.flags
RegExp.prototype.flags is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - Let result be the empty String.
- Let global be !
ToBoolean (?Get (R,"global" )). - If global is
true , append the code unit 0x0067 (LATIN SMALL LETTER G) as the last code unit of result. - Let ignoreCase be !
ToBoolean (?Get (R,"ignoreCase" )). - If ignoreCase is
true , append the code unit 0x0069 (LATIN SMALL LETTER I) as the last code unit of result. - Let multiline be !
ToBoolean (?Get (R,"multiline" )). - If multiline is
true , append the code unit 0x006D (LATIN SMALL LETTER M) as the last code unit of result. - Let dotAll be !
ToBoolean (?Get (R,"dotAll" )). - If dotAll is
true , append the code unit 0x0073 (LATIN SMALL LETTER S) as the last code unit of result. - Let unicode be !
ToBoolean (?Get (R,"unicode" )). - If unicode is
true , append the code unit 0x0075 (LATIN SMALL LETTER U) as the last code unit of result. - Let sticky be !
ToBoolean (?Get (R,"sticky" )). - If sticky is
true , append the code unit 0x0079 (LATIN SMALL LETTER Y) as the last code unit of result. - Return result.
22.2.5.5 get RegExp.prototype.global
RegExp.prototype.global is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x0067 (LATIN SMALL LETTER G), return
true . - Return
false .
22.2.5.6 get RegExp.prototype.ignoreCase
RegExp.prototype.ignoreCase is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x0069 (LATIN SMALL LETTER I), return
true . - Return
false .
22.2.5.7 RegExp.prototype [ @@match ] ( string )
When the @@match method is called with argument string, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let global be !
ToBoolean (?Get (rx,"global" )). - If global is
false , then- Return ?
RegExpExec (rx, S).
- Return ?
- Else,
Assert : global istrue .- Let fullUnicode be !
ToBoolean (?Get (rx,"unicode" )). - Perform ?
Set (rx,"lastIndex" ,+0 𝔽,true ). - Let A be !
ArrayCreate (0). - Let n be 0.
- Repeat,
- Let result be ?
RegExpExec (rx, S). - If result is
null , then- If n = 0, return
null . - Return A.
- If n = 0, return
- Else,
- Let matchStr be ?
ToString (?Get (result,"0" )). - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽(n)), matchStr). - If matchStr is the empty String, then
- Let thisIndex be ℝ(?
ToLength (?Get (rx,"lastIndex" ))). - Let nextIndex be
AdvanceStringIndex (S, thisIndex, fullUnicode). - Perform ?
Set (rx,"lastIndex" , 𝔽(nextIndex),true ).
- Let thisIndex be ℝ(?
Set n to n + 1.
- Let matchStr be ?
- Let result be ?
The value of the
The @@match property is used by the
22.2.5.8 RegExp.prototype [ @@matchAll ] ( string )
When the @@matchAll method is called with argument string, the following steps are taken:
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let C be ?
SpeciesConstructor (R,%RegExp% ). - Let flags be ?
ToString (?Get (R,"flags" )). - Let matcher be ?
Construct (C, « R, flags »). - Let lastIndex be ?
ToLength (?Get (R,"lastIndex" )). - Perform ?
Set (matcher,"lastIndex" , lastIndex,true ). - If flags contains
"g" , let global betrue . - Else, let global be
false . - If flags contains
"u" , let fullUnicode betrue . - Else, let fullUnicode be
false . - Return !
CreateRegExpStringIterator (matcher, S, global, fullUnicode).
The value of the
22.2.5.9 get RegExp.prototype.multiline
RegExp.prototype.multiline is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x006D (LATIN SMALL LETTER M), return
true . - Return
false .
22.2.5.10 RegExp.prototype [ @@replace ] ( string, replaceValue )
When the @@replace method is called with arguments string and replaceValue, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let lengthS be the number of code unit elements in S.
- Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then - Let global be !
ToBoolean (?Get (rx,"global" )). - If global is
true , then - Let results be a new empty
List . - Let done be
false . - Repeat, while done is
false ,- Let result be ?
RegExpExec (rx, S). - If result is
null , set done totrue . - Else,
- Append result to the end of results.
- If global is
false , set done totrue . - Else,
- Let result be ?
- Let accumulatedResult be the empty String.
- Let nextSourcePosition be 0.
- For each element result of results, do
- Let resultLength be ?
LengthOfArrayLike (result). - Let nCaptures be
max (resultLength - 1, 0). - Let matched be ?
ToString (?Get (result,"0" )). - Let matchLength be the number of code units in matched.
- Let position be ?
ToIntegerOrInfinity (?Get (result,"index" )). Set position to the result ofclamping position between 0 and lengthS.- Let n be 1.
- Let captures be a new empty
List . - Repeat, while n ≤ nCaptures,
- Let namedCaptures be ?
Get (result,"groups" ). - If functionalReplace is
true , then- Let replacerArgs be « matched ».
- Append in
List order the elements of captures to the end of theList replacerArgs. - Append 𝔽(position) and S to replacerArgs.
- If namedCaptures is not
undefined , then- Append namedCaptures as the last element of replacerArgs.
- Let replValue be ?
Call (replaceValue,undefined , replacerArgs). - Let replacement be ?
ToString (replValue).
- Else,
- If namedCaptures is not
undefined , then - Let replacement be ?
GetSubstitution (matched, S, position, captures, namedCaptures, replaceValue).
- If namedCaptures is not
- If position ≥ nextSourcePosition, then
- NOTE: position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of rx. In such cases, the corresponding substitution is ignored.
Set accumulatedResult to thestring-concatenation of accumulatedResult, thesubstring of S from nextSourcePosition to position, and replacement.Set nextSourcePosition to position + matchLength.
- Let resultLength be ?
- If nextSourcePosition ≥ lengthS, return accumulatedResult.
- Return the
string-concatenation of accumulatedResult and thesubstring of S from nextSourcePosition.
The value of the
22.2.5.11 RegExp.prototype [ @@search ] ( string )
When the @@search method is called with argument string, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let previousLastIndex be ?
Get (rx,"lastIndex" ). - If
SameValue (previousLastIndex,+0 𝔽) isfalse , then- Perform ?
Set (rx,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Let result be ?
RegExpExec (rx, S). - Let currentLastIndex be ?
Get (rx,"lastIndex" ). - If
SameValue (currentLastIndex, previousLastIndex) isfalse , then- Perform ?
Set (rx,"lastIndex" , previousLastIndex,true ).
- Perform ?
- If result is
null , return-1 𝔽. - Return ?
Get (result,"index" ).
The value of the
The
22.2.5.12 get RegExp.prototype.source
RegExp.prototype.source is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalSource]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , return"(?:)" . - Otherwise, throw a
TypeError exception.
- If
Assert : R has an [[OriginalFlags]] internal slot.- Let src be R.[[OriginalSource]].
- Let flags be R.[[OriginalFlags]].
- Return
EscapeRegExpPattern (src, flags).
22.2.5.13 RegExp.prototype [ @@split ] ( string, limit )
Returns an Array object into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the
The /a*?/[Symbol.split]("ab") evaluates to the array ["a", "b"], while /a*/[Symbol.split]("ab") evaluates to the array ["","b"].)
If string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If the regular expression contains capturing parentheses, then each time separator is matched the results (including any
/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")
evaluates to the array
["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]
If limit is not
When the @@split method is called, the following steps are taken:
- Let rx be the
this value. - If
Type (rx) is not Object, throw aTypeError exception. - Let S be ?
ToString (string). - Let C be ?
SpeciesConstructor (rx,%RegExp% ). - Let flags be ?
ToString (?Get (rx,"flags" )). - If flags contains
"u" , let unicodeMatching betrue . - Else, let unicodeMatching be
false . - If flags contains
"y" , let newFlags be flags. - Else, let newFlags be the
string-concatenation of flags and"y" . - Let splitter be ?
Construct (C, « rx, newFlags »). - Let A be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232 - 1; else let lim be ℝ(?ToUint32 (limit)). - If lim is 0, return A.
- Let size be the length of S.
- If size is 0, then
- Let z be ?
RegExpExec (splitter, S). - If z is not
null , return A. - Perform !
CreateDataPropertyOrThrow (A,"0" , S). - Return A.
- Let z be ?
- Let p be 0.
- Let q be p.
- Repeat, while q < size,
- Perform ?
Set (splitter,"lastIndex" , 𝔽(q),true ). - Let z be ?
RegExpExec (splitter, S). - If z is
null , set q toAdvanceStringIndex (S, q, unicodeMatching). - Else,
- Let e be ℝ(?
ToLength (?Get (splitter,"lastIndex" ))). Set e tomin (e, size).- If e = p, set q to
AdvanceStringIndex (S, q, unicodeMatching). - Else,
- Let T be the
substring of S from p to q. - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽(lengthA)), T). Set lengthA to lengthA + 1.- If lengthA = lim, return A.
Set p to e.- Let numberOfCaptures be ?
LengthOfArrayLike (z). Set numberOfCaptures tomax (numberOfCaptures - 1, 0).- Let i be 1.
- Repeat, while i ≤ numberOfCaptures,
Set q to p.
- Let T be the
- Let e be ℝ(?
- Perform ?
- Let T be the
substring of S from p to size. - Perform !
CreateDataPropertyOrThrow (A, !ToString (𝔽(lengthA)), T). - Return A.
The value of the
The @@split method ignores the value of the
22.2.5.14 get RegExp.prototype.sticky
RegExp.prototype.sticky is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x0079 (LATIN SMALL LETTER Y), return
true . - Return
false .
22.2.5.15 RegExp.prototype.test ( S )
The following steps are taken:
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - Let string be ?
ToString (S). - Let match be ?
RegExpExec (R, string). - If match is not
null , returntrue ; else returnfalse .
22.2.5.16 RegExp.prototype.toString ( )
The returned String has the form of a
22.2.5.17 get RegExp.prototype.unicode
RegExp.prototype.unicode is an
- Let R be the
this value. - If
Type (R) is not Object, throw aTypeError exception. - If R does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (R,%RegExp.prototype% ) istrue , returnundefined . - Otherwise, throw a
TypeError exception.
- If
- Let flags be R.[[OriginalFlags]].
- If flags contains the code unit 0x0075 (LATIN SMALL LETTER U), return
true . - Return
false .
22.2.6 Properties of RegExp Instances
RegExp instances are ordinary objects that inherit properties from the
Prior to ECMAScript 2015, RegExp instances were specified as having the own data properties RegExp.prototype.
RegExp instances also have the following property:
22.2.6.1 lastIndex
The value of the
22.2.7 RegExp String Iterator Objects
A RegExp String Iterator is an object, that represents a specific iteration over some specific String instance object, matching against some specific RegExp instance object. There is not a named
22.2.7.1 CreateRegExpStringIterator ( R, S, global, fullUnicode )
The abstract operation CreateRegExpStringIterator takes arguments R, S, global, and fullUnicode. It performs the following steps when called:
Assert :Type (S) is String.Assert :Type (global) is Boolean.Assert :Type (fullUnicode) is Boolean.- Let closure be a new
Abstract Closure with no parameters that captures R, S, global, and fullUnicode and performs the following steps when called:- Repeat,
- Let match be ?
RegExpExec (R, S). - If match is
null , returnundefined . - If global is
false , then- Perform ?
Yield (match). - Return
undefined .
- Perform ?
- Let matchStr be ?
ToString (?Get (match,"0" )). - If matchStr is the empty String, then
- Let thisIndex be ℝ(?
ToLength (?Get (R,"lastIndex" ))). - Let nextIndex be !
AdvanceStringIndex (S, thisIndex, fullUnicode). - Perform ?
Set (R,"lastIndex" , 𝔽(nextIndex),true ).
- Let thisIndex be ℝ(?
- Perform ?
Yield (match).
- Let match be ?
- Repeat,
- Return !
CreateIteratorFromClosure (closure," ,%RegExpStringIteratorPrototype% "%RegExpStringIteratorPrototype% ).
22.2.7.2 The %RegExpStringIteratorPrototype% Object
The
- has properties that are inherited by all RegExp String Iterator Objects.
- is an
ordinary object . - has a [[Prototype]] internal slot whose value is
%IteratorPrototype% . - has the following properties:
22.2.7.2.1 %RegExpStringIteratorPrototype% .next ( )
- Return ?
GeneratorResume (this value,empty ," ).%RegExpStringIteratorPrototype% "
22.2.7.2.2 %RegExpStringIteratorPrototype% [ @@toStringTag ]
The initial value of the @@toStringTag property is the String value
This property has the attributes { [[Writable]]: