22 Text Processing
22.1 String Objects
22.1.1 The String Constructor
The String
- is
%String% . - is the initial value of the
"String" property of theglobal object . - creates and initializes a new String object when called as a
constructor . - performs a type conversion when called as a function rather than as a
constructor . - may be used as the value of an
extendsclause of a class definition. Subclassconstructors that intend to inherit the specified String behaviour must include asupercall to the Stringconstructor to create and initialize the subclass instance with a [[StringData]] internal slot.
22.1.1.1 String ( value )
This function performs the following steps when called:
- If value is not present, then
- Let string be the empty String.
- Else,
- If NewTarget is
undefined and valueis a Symbol , returnSymbolDescriptiveString (value). - Let string be ?
ToString (value).
- If NewTarget is
- If NewTarget is
undefined , return string. - Return
StringCreate (string, ?GetPrototypeFromConstructor (NewTarget," )).%String.prototype% "
22.1.2 Properties of the String Constructor
The String
- has a [[Prototype]] internal slot whose value is
%Function.prototype% . - has the following properties:
22.1.2.1 String.fromCharCode ( ...codeUnits )
This function may be called with any number of arguments which form the rest parameter codeUnits.
It performs the following steps when called:
- Let result be the empty String.
- For each element next of codeUnits, do
- Let nextCU be the code unit whose numeric value is
ℝ (?ToUint16 (next)). - Set result to the
string-concatenation of result and nextCU.
- Let nextCU be the code unit whose numeric value is
- Return result.
The
22.1.2.2 String.fromCodePoint ( ...codePoints )
This function may be called with any number of arguments which form the rest parameter codePoints.
It performs the following steps when called:
- Let result be the empty String.
- For each element next of codePoints, do
- Let nextCP be ?
ToNumber (next). - If nextCP is not an
integral Number , throw aRangeError exception. - If
ℝ (nextCP) < 0 orℝ (nextCP) > 0x10FFFF, throw aRangeError exception. - Set result to the
string-concatenation of result andUTF16EncodeCodePoint (ℝ (nextCP)).
- Let nextCP be ?
Assert : If codePoints is empty, then result is the empty String.- Return result.
The
22.1.2.3 String.prototype
The initial value of String.prototype is the
This property has the attributes { [[Writable]]:
22.1.2.4 String.raw ( template, ...substitutions )
This function may be called with a variable number of arguments. The first argument is template and the remainder of the arguments form the
It performs the following steps when called:
- Let substitutionCount be the number of elements in substitutions.
- Let cooked be ?
ToObject (template). - Let literals be ?
ToObject (?Get (cooked,"raw" )). - Let literalCount be ?
LengthOfArrayLike (literals). - If literalCount ≤ 0, return the empty String.
- Let result be the empty String.
- Let nextIndex be 0.
- Repeat,
- Let nextLiteralValue be ?
Get (literals, !ToString (𝔽 (nextIndex))). - Let nextLiteral be ?
ToString (nextLiteralValue). - Set result to the
string-concatenation of result and nextLiteral. - If nextIndex + 1 = literalCount, return result.
- If nextIndex < substitutionCount, then
- Let nextSubValue be substitutions[nextIndex].
- Let nextSub be ?
ToString (nextSubValue). - Set result to the
string-concatenation of result and nextSub.
- Set nextIndex to nextIndex + 1.
- Let nextLiteralValue be ?
This function is intended for use as a tag function of a Tagged Template (
22.1.3 Properties of the String Prototype Object
The String prototype object:
- is
%String.prototype% . is a String exotic object and has the internal methods specified for such objects.- has a [[StringData]] internal slot whose value is the empty String.
- has a
"length" property whose initial value is+0 𝔽 and whose attributes are { [[Writable]]:false , [[Enumerable]]:false , [[Configurable]]:false }. - has a [[Prototype]] internal slot whose value is
%Object.prototype% .
Unless explicitly stated otherwise, the methods of the String prototype object defined below are not generic and the
22.1.3.1 String.prototype.at ( index )
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let length be the length of string.
- Let relativeIndex be ?
ToIntegerOrInfinity (index). - If relativeIndex ≥ 0, then
- Let k be relativeIndex.
- Else,
- Let k be length + relativeIndex.
- If k < 0 or k ≥ length, return
undefined . - Return the
substring of string from k to k + 1.
22.1.3.2 String.prototype.charAt ( position )
This method returns a single element String containing the code unit at index position within the String value resulting from converting this object to a String. If there is no element at that index, the result is the empty String. The result
If pos is an x.charAt(pos) is equivalent to the result of x.substring(pos, pos + 1).
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Set position to ?
ToIntegerOrInfinity (position). - Let size be the length of string.
- If position < 0 or position ≥ size, return the empty String.
- Return the
substring of string from position to position + 1.
This method is intentionally generic; it does not require that its
22.1.3.3 String.prototype.charCodeAt ( position )
This method returns a Number (a non-negative
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Set position to ?
ToIntegerOrInfinity (position). - Let size be the length of string.
- If position < 0 or position ≥ size, return
NaN . - Return the
Number value for the numeric value of the code unit at index position within the String string.
This method is intentionally generic; it does not require that its
22.1.3.4 String.prototype.codePointAt ( position )
This method returns a non-negative
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Set position to ?
ToIntegerOrInfinity (position). - Let size be the length of string.
- If position < 0 or position ≥ size, return
undefined . - Let codePoint be
CodePointAt (string, position). - Return
𝔽 (codePoint.[[CodePoint]]).
This method is intentionally generic; it does not require that its
22.1.3.5 String.prototype.concat ( ...args )
When this method is called it returns the String value consisting of the code units of the
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let result be string.
- For each element next of args, do
- Let nextString be ?
ToString (next). - Set result to the
string-concatenation of result and nextString.
- Let nextString be ?
- Return result.
The
This method is intentionally generic; it does not require that its
22.1.3.6 String.prototype.constructor
The initial value of String.prototype.constructor is
22.1.3.7 String.prototype.endsWith ( searchString [ , endPosition ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let isRegexp be ?
IsRegExp (searchString). - If isRegexp is
true , throw aTypeError exception. - Set searchString to ?
ToString (searchString). - Let length be the length of string.
- If endPosition is
undefined , let position be length; else let position be ?ToIntegerOrInfinity (endPosition). - Let end be the result of
clamping position between 0 and length. - Let searchLength be the length of searchString.
- If searchLength = 0, return
true . - Let start be end - searchLength.
- If start < 0, return
false . - Let substring be the
substring of string from start to end. - If substring is searchString, return
true . - Return
false .
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
22.1.3.8 String.prototype.includes ( searchString [ , position ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let isRegexp be ?
IsRegExp (searchString). - If isRegexp is
true , throw aTypeError exception. - Set searchString to ?
ToString (searchString). - Let positionInt be ?
ToIntegerOrInfinity (position). Assert : If position isundefined , then positionInt is 0.- Let length be the length of string.
- Let start be the result of
clamping positionInt between 0 and length. - Let index be
StringIndexOf (string, searchString, start). - If index is
not-found , returnfalse . - Return
true .
If searchString appears as a
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
22.1.3.9 String.prototype.indexOf ( searchString [ , position ] )
If searchString appears as a
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Set searchString to ?
ToString (searchString). - Let positionInt be ?
ToIntegerOrInfinity (position). Assert : If position isundefined , then positionInt is 0.- Let length be the length of string.
- Let start be the result of
clamping positionInt between 0 and length. - Let result be
StringIndexOf (string, searchString, start). - If result is
not-found , return-1 𝔽. - Return
𝔽 (result).
This method is intentionally generic; it does not require that its
22.1.3.10 String.prototype.isWellFormed ( )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Return
IsStringWellFormedUnicode (string).
22.1.3.11 String.prototype.lastIndexOf ( searchString [ , position ] )
If searchString appears as a
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Set searchString to ?
ToString (searchString). - Let numberPosition be ?
ToNumber (position). Assert : If position isundefined , then numberPosition isNaN .- If numberPosition is
NaN , set position to +∞; else set position to !ToIntegerOrInfinity (numberPosition). - Let length be the length of string.
- Let searchLength be the length of searchString.
- If length < searchLength, return
-1 𝔽. - Let start be the result of
clamping position between 0 and length - searchLength. - Let result be
StringLastIndexOf (string, searchString, start). - If result is
not-found , return-1 𝔽. - Return
𝔽 (result).
This method is intentionally generic; it does not require that its
22.1.3.12 String.prototype.localeCompare ( that [ , reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method returns a Number other than
Before performing the comparisons, this method performs the following steps to prepare the Strings:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let thatValue be ?
ToString (that).
The meaning of the optional second and third parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not assign any other interpretation to those parameter positions.
The actual return values are
This method itself is not directly suitable as an argument to Array.prototype.sort because the latter requires a function of two arguments.
This method may rely on whatever language- and/or locale-sensitive comparison functionality is available to the ECMAScript environment from the
// Å ANGSTROM SIGN vs.
// Å LATIN CAPITAL LETTER A + COMBINING RING ABOVE
"\u212B".localeCompare("A\u030A")
// Ω OHM SIGN vs.
// Ω GREEK CAPITAL LETTER OMEGA
"\u2126".localeCompare("\u03A9")
// ṩ LATIN SMALL LETTER S WITH DOT BELOW AND DOT ABOVE vs.
// ṩ LATIN SMALL LETTER S + COMBINING DOT ABOVE + COMBINING DOT BELOW
"\u1E69".localeCompare("s\u0307\u0323")
// ḍ̇ LATIN SMALL LETTER D WITH DOT ABOVE + COMBINING DOT BELOW vs.
// ḍ̇ LATIN SMALL LETTER D WITH DOT BELOW + COMBINING DOT ABOVE
"\u1E0B\u0323".localeCompare("\u1E0D\u0307")
// 가 HANGUL CHOSEONG KIYEOK + HANGUL JUNGSEONG A vs.
// 가 HANGUL SYLLABLE GA
"\u1100\u1161".localeCompare("\uAC00")
For a definition and discussion of canonical equivalence see the Unicode Standard, chapters 2 and 3, as well as Unicode Standard Annex #15, Unicode Normalization Forms and Unicode Technical Note #5, Canonical Equivalence in Applications. Also see Unicode Technical Standard #10, Unicode Collation Algorithm.
It is recommended that this method should not honour Unicode compatibility equivalents or compatibility decompositions as defined in the Unicode Standard, chapter 3, section 3.7.
This method is intentionally generic; it does not require that its
22.1.3.13 String.prototype.match ( regexpOrPattern )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If regexpOrPattern
is an Object , then - Let string be ?
ToString (thisValue). - Let regexp be ?
RegExpCreate (regexpOrPattern,undefined ). - Return ?
Invoke (regexp,%Symbol.match% , « string »).
This method is intentionally generic; it does not require that its
22.1.3.14 String.prototype.matchAll ( regexpOrPattern )
This method performs a regular expression match of the String representing the
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If regexpOrPattern
is an Object , then- Let isRegexp be ?
IsRegExp (regexpOrPattern). - If isRegexp is
true , then- Let flags be ?
Get (regexpOrPattern,"flags" ). - Perform ?
RequireObjectCoercible (flags). - If ?
ToString (flags) does not contain"g" , throw aTypeError exception.
- Let flags be ?
- Let matcher be ?
GetMethod (regexpOrPattern,%Symbol.matchAll% ). - If matcher is not
undefined , then- Return ?
Call (matcher, regexpOrPattern, « thisValue »).
- Return ?
- Let isRegexp be ?
- Let string be ?
ToString (thisValue). - Let regexp be ?
RegExpCreate (regexpOrPattern,"g" ). - Return ?
Invoke (regexp,%Symbol.matchAll% , « string »).
String.prototype.split, String.prototype.matchAll is designed to typically act without mutating its inputs.22.1.3.15 String.prototype.normalize ( [ form ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - If form is
undefined , set form to"NFC" . - Else, set form to ?
ToString (form). - If form is not one of
"NFC" ,"NFD" ,"NFKC" , or"NFKD" , throw aRangeError exception. - Let normal be the String value that is the result of normalizing string into the normalization form named by form as specified in the latest Unicode Standard, Normalization Forms.
- Return normal.
This method is intentionally generic; it does not require that its
22.1.3.16 String.prototype.padEnd ( maxLength [ , fillString ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Return ?
StringPaddingBuiltinsImpl (thisValue, maxLength, fillString,end ).
22.1.3.17 String.prototype.padStart ( maxLength [ , fillString ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Return ?
StringPaddingBuiltinsImpl (thisValue, maxLength, fillString,start ).
22.1.3.17.1 StringPaddingBuiltinsImpl ( thisValue, maxLength, fillString, placement )
The abstract operation StringPaddingBuiltinsImpl takes arguments thisValue (an
- Let string be ?
ToString (thisValue). - Let intMaxLength be
ℝ (?ToLength (maxLength)). - Let stringLength be the length of string.
- If intMaxLength ≤ stringLength, return string.
- If fillString is
undefined , set fillString to the String value consisting solely of the code unit 0x0020 (SPACE). - Else, set fillString to ?
ToString (fillString). - Return
StringPad (string, intMaxLength, fillString, placement).
22.1.3.17.2 StringPad ( string, maxLength, fillString, placement )
The abstract operation StringPad takes arguments string (a String), maxLength (a non-negative
- Let stringLength be the length of string.
- If maxLength ≤ stringLength, return string.
- If fillString is the empty String, return string.
- Let fillLength be maxLength - stringLength.
- Let truncatedStringFiller be the String value consisting of repeated concatenations of fillString truncated to length fillLength.
- If placement is
start , return thestring-concatenation of truncatedStringFiller and string. - Return the
string-concatenation of string and truncatedStringFiller.
The argument maxLength will be clamped such that it can be no smaller than the length of string.
The argument fillString defaults to
22.1.3.17.3 ToZeroPaddedDecimalString ( n, minLength )
The abstract operation ToZeroPaddedDecimalString takes arguments n (a non-negative
- Let string be the String representation of n, formatted as a decimal number.
- Return
StringPad (string, minLength,"0" ,start ).
22.1.3.18 String.prototype.repeat ( count )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let n be ?
ToIntegerOrInfinity (count). - If n < 0 or n = +∞, throw a
RangeError exception. - If n = 0, return the empty String.
- Return the String value that is made from n copies of string appended together.
This method creates the String value consisting of the code units of the
This method is intentionally generic; it does not require that its
22.1.3.19 String.prototype.replace ( searchValue, replaceValue )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If searchValue
is an Object , then - Let string be ?
ToString (thisValue). - Let searchString be ?
ToString (searchValue). - Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Set replaceValue to ?
ToString (replaceValue).
- Set replaceValue to ?
- Let searchLength be the length of searchString.
- Let position be
StringIndexOf (string, searchString, 0). - If position is
not-found , return string. - Let preceding be the
substring of string from 0 to position. - Let following be the
substring of string from position + searchLength. - If functionalReplace is
true , then - Else,
Assert : replaceValueis a String .- Let captures be a new empty
List . - Let replacement be !
GetSubstitution (searchString, string, position, captures,undefined , replaceValue).
- Return the
string-concatenation of preceding, replacement, and following.
This method is intentionally generic; it does not require that its
22.1.3.19.1 GetSubstitution ( matched, string, position, captures, namedCaptures, replacementTemplate )
The abstract operation GetSubstitution takes arguments matched (a String), string (a String), position (a non-negative
- Let stringLength be the length of string.
Assert : position ≤ stringLength.- Let result be the empty String.
- Let templateRemainder be replacementTemplate.
- Repeat, while templateRemainder is not the empty String,
NOTE : The following steps isolate ref (a prefix of templateRemainder), determine refReplacement (its replacement), and then append that replacement to result.- If templateRemainder starts with
"$$" , then- Let ref be
"$$" . - Let refReplacement be
"$" .
- Let ref be
- Else if templateRemainder starts with
"$`" , then- Let ref be
"$`" . - Let refReplacement be the
substring of string from 0 to position.
- Let ref be
- Else if templateRemainder starts with
"$&" , then- Let ref be
"$&" . - Let refReplacement be matched.
- Let ref be
- Else if templateRemainder starts with
"$'" (0x0024 (DOLLAR SIGN) followed by 0x0027 (APOSTROPHE)), then- Let ref be
"$'" . - Let matchLength be the length of matched.
- Let tailPosition be position + matchLength.
- Let refReplacement be the
substring of string frommin (tailPosition, stringLength). NOTE : tailPosition can exceed stringLength only if this abstract operation was invoked by a call to the intrinsic%Symbol.replace% method of%RegExp.prototype% on an object whose"exec" property is not the intrinsic%RegExp.prototype.exec% .
- Let ref be
- Else if templateRemainder starts with
"$" followed by 1 or more decimal digits, then- If templateRemainder starts with
"$" followed by 2 or more decimal digits, let digitCount be 2; else let digitCount be 1. - Let digits be the
substring of templateRemainder from 1 to 1 + digitCount. - Let index be
ℝ (StringToNumber (digits)). Assert : 0 ≤ index ≤ 99.- Let captureLength be the number of elements in captures.
- If index > captureLength and digitCount = 2, then
NOTE : When a two-digit replacement pattern specifies an index exceeding the count of capturing groups, it is treated as a one-digit replacement pattern followed by a literal digit.- Set digitCount to 1.
- Set digits to the
substring of digits from 0 to 1. - Set index to
ℝ (StringToNumber (digits)).
- Let ref be the
substring of templateRemainder from 0 to 1 + digitCount. - If 1 ≤ index ≤ captureLength, then
- Let capture be captures[index - 1].
- If capture is
undefined , then- Let refReplacement be the empty String.
- Else,
- Let refReplacement be capture.
- Else,
- Let refReplacement be ref.
- If templateRemainder starts with
- Else if templateRemainder starts with
"$<" , then- Let gtPosition be
StringIndexOf (templateRemainder,">" , 0). - If gtPosition is
not-found or namedCaptures isundefined , then- Let ref be
"$<" . - Let refReplacement be ref.
- Let ref be
- Else,
- Let ref be the
substring of templateRemainder from 0 to gtPosition + 1. - Let groupName be the
substring of templateRemainder from 2 to gtPosition. Assert : namedCapturesis an Object .- Let capture be ?
Get (namedCaptures, groupName). - If capture is
undefined , then- Let refReplacement be the empty String.
- Else,
- Let refReplacement be ?
ToString (capture).
- Let refReplacement be ?
- Let ref be the
- Let gtPosition be
- Else,
- Let ref be the
substring of templateRemainder from 0 to 1. - Let refReplacement be ref.
- Let ref be the
- Let refLength be the length of ref.
- Set templateRemainder to the
substring of templateRemainder from refLength. - Set result to the
string-concatenation of result and refReplacement.
- Return result.
22.1.3.20 String.prototype.replaceAll ( searchValue, replaceValue )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If searchValue
is an Object , then- Let isRegexp be ?
IsRegExp (searchValue). - If isRegexp is
true , then- Let flags be ?
Get (searchValue,"flags" ). - Perform ?
RequireObjectCoercible (flags). - If ?
ToString (flags) does not contain"g" , throw aTypeError exception.
- Let flags be ?
- Let replacer be ?
GetMethod (searchValue,%Symbol.replace% ). - If replacer is not
undefined , then- Return ?
Call (replacer, searchValue, « thisValue, replaceValue »).
- Return ?
- Let isRegexp be ?
- Let string be ?
ToString (thisValue). - Let searchString be ?
ToString (searchValue). - Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Set replaceValue to ?
ToString (replaceValue).
- Set replaceValue to ?
- Let searchLength be the length of searchString.
- Let advanceBy be
max (1, searchLength). - Let matchPositions be a new empty
List . - Let position be
StringIndexOf (string, searchString, 0). - Repeat, while position is not
not-found ,- Append position to matchPositions.
- Set position to
StringIndexOf (string, searchString, position + advanceBy).
- Let endOfLastMatch be 0.
- Let result be the empty String.
- For each element matchPosition of matchPositions, do
- Let preserved be the
substring of string from endOfLastMatch to matchPosition. - If functionalReplace is
true , then - Else,
Assert : replaceValueis a String .- Let captures be a new empty
List . - Let replacement be !
GetSubstitution (searchString, string, matchPosition, captures,undefined , replaceValue).
- Set result to the
string-concatenation of result, preserved, and replacement. - Set endOfLastMatch to matchPosition + searchLength.
- Let preserved be the
- If endOfLastMatch < the length of string, then
- Set result to the
string-concatenation of result and thesubstring of string from endOfLastMatch.
- Set result to the
- Return result.
22.1.3.21 String.prototype.search ( regexpOrPattern )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If regexpOrPattern
is an Object , then - Let string be ?
ToString (thisValue). - Let regexp be ?
RegExpCreate (regexpOrPattern,undefined ). - Return ?
Invoke (regexp,%Symbol.search% , « string »).
This method is intentionally generic; it does not require that its
22.1.3.22 String.prototype.slice ( start, end )
This method returns a
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let length be the length of string.
- Let intStart be ?
ToIntegerOrInfinity (start). - If intStart = -∞, let from be 0.
- Else if intStart < 0, let from be
max (length + intStart, 0). - Else, let from be
min (intStart, length). - If end is
undefined , let intEnd be length; else let intEnd be ?ToIntegerOrInfinity (end). - If intEnd = -∞, let to be 0.
- Else if intEnd < 0, let to be
max (length + intEnd, 0). - Else, let to be
min (intEnd, length). - If from ≥ to, return the empty String.
- Return the
substring of string from from to to.
This method is intentionally generic; it does not require that its
22.1.3.23 String.prototype.split ( separator, limit )
This method returns an Array into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any String in the returned array, but serve to divide up the String value. The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - If separator
is an Object , then - Let string be ?
ToString (thisValue). - If limit is
undefined , let lim be 232 - 1; else let lim beℝ (?ToUint32 (limit)). - Let separatorString be ?
ToString (separator). - If lim = 0, then
- Return
CreateArrayFromList (« »).
- Return
- If separator is
undefined , then- Return
CreateArrayFromList (« string »).
- Return
- Let separatorLength be the length of separatorString.
- If separatorLength = 0, then
- Let stringLength be the length of string.
- Let outLength be the result of
clamping lim between 0 and stringLength. - Let head be the
substring of string from 0 to outLength. - Let codeUnits be a
List consisting of the sequence of code units that are the elements of head. - Return
CreateArrayFromList (codeUnits).
- If string is the empty String, return
CreateArrayFromList (« string »). - Let substrings be a new empty
List . - Let searchStart be 0.
- Let matchIndex be
StringIndexOf (string, separatorString, 0). - Repeat, while matchIndex is not
not-found ,- Let substring be the
substring of string from searchStart to matchIndex. - Append substring to substrings.
- If the number of elements in substrings is lim, return
CreateArrayFromList (substrings). - Set searchStart to matchIndex + separatorLength.
- Set matchIndex to
StringIndexOf (string, separatorString, searchStart).
- Let substring be the
- Let substring be the
substring of string from searchStart. - Append substring to substrings.
- Return
CreateArrayFromList (substrings).
The value of separator may be an empty String. In this case, separator does not match the empty
If the
If separator is
This method is intentionally generic; it does not require that its
22.1.3.24 String.prototype.startsWith ( searchString [ , position ] )
This method performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let isRegexp be ?
IsRegExp (searchString). - If isRegexp is
true , throw aTypeError exception. - Set searchString to ?
ToString (searchString). - Let length be the length of string.
- If position is
undefined , set position to 0; else set position to ?ToIntegerOrInfinity (position). - Let start be the result of
clamping position between 0 and length. - Let searchLength be the length of searchString.
- If searchLength = 0, return
true . - Let end be start + searchLength.
- If end > length, return
false . - Let substring be the
substring of string from start to end. - If substring is searchString, return
true . - Return
false .
Throwing an exception if the first argument is a RegExp is specified in order to allow future editions to define extensions that allow such argument values.
This method is intentionally generic; it does not require that its
22.1.3.25 String.prototype.substring ( start, end )
This method returns a
If either argument is
If start is strictly greater than end, they are swapped.
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let length be the length of string.
- Let intStart be ?
ToIntegerOrInfinity (start). - If end is
undefined , let intEnd be length; else let intEnd be ?ToIntegerOrInfinity (end). - Let finalStart be the result of
clamping intStart between 0 and length. - Let finalEnd be the result of
clamping intEnd between 0 and length. - Let from be
min (finalStart, finalEnd). - Let to be
max (finalStart, finalEnd). - Return the
substring of string from from to to.
This method is intentionally generic; it does not require that its
22.1.3.26 String.prototype.toLocaleLowerCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It works exactly the same as toLowerCase except that it is intended to yield a locale-sensitive result corresponding with conventions of the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
This method is intentionally generic; it does not require that its
22.1.3.27 String.prototype.toLocaleUpperCase ( [ reserved1 [ , reserved2 ] ] )
An ECMAScript implementation that includes the ECMA-402 Internationalization API must implement this method as specified in the ECMA-402 specification. If an ECMAScript implementation does not include the ECMA-402 API the following specification of this method is used:
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It works exactly the same as toUpperCase except that it is intended to yield a locale-sensitive result corresponding with conventions of the
The meaning of the optional parameters to this method are defined in the ECMA-402 specification; implementations that do not include ECMA-402 support must not use those parameter positions for anything else.
This method is intentionally generic; it does not require that its
22.1.3.28 String.prototype.toLowerCase ( )
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let sText be
StringToCodePoints (string). - Let lowerText be toLowercase(sText), according to the Unicode Default Case Conversion algorithm.
- Let lowercaseString be
CodePointsToString (lowerText). - Return lowercaseString.
The result must be derived according to the locale-insensitive case mappings in the Unicode Character Database (this explicitly includes not only the file UnicodeData.txt, but also all locale-insensitive mappings in the file SpecialCasing.txt that accompanies it).
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String. Because both toUpperCase and toLowerCase have context-sensitive behaviour, the methods are not symmetrical. In other words, s.toUpperCase().toLowerCase() is not necessarily equal to s.toLowerCase().
This method is intentionally generic; it does not require that its
22.1.3.29 String.prototype.toString ( )
This method performs the following steps when called:
- Return ?
ThisStringValue (this value).
For a String object, this method happens to return the same thing as the valueOf method.
22.1.3.30 String.prototype.toUpperCase ( )
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It behaves in exactly the same way as String.prototype.toLowerCase, except that the String is mapped using the toUppercase algorithm of the Unicode Default Case Conversion.
This method is intentionally generic; it does not require that its
22.1.3.31 String.prototype.toWellFormed ( )
This method returns a String representation of this object with all
It performs the following steps when called:
- Let thisValue be the
this value. - Perform ?
RequireObjectCoercible (thisValue). - Let string be ?
ToString (thisValue). - Let stringLength be the length of string.
- Let k be 0.
- Let result be the empty String.
- Repeat, while k < stringLength,
- Let codePoint be
CodePointAt (string, k). - If codePoint.[[IsUnpairedSurrogate]] is
true , then- Set result to the
string-concatenation of result and 0xFFFD (REPLACEMENT CHARACTER).
- Set result to the
- Else,
- Set result to the
string-concatenation of result andUTF16EncodeCodePoint (codePoint.[[CodePoint]]).
- Set result to the
- Set k to k + codePoint.[[CodeUnitCount]].
- Let codePoint be
- Return result.
22.1.3.32 String.prototype.trim ( )
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
- Let thisValue be the
this value. - Return ?
TrimString (thisValue,start+end ).
This method is intentionally generic; it does not require that its
22.1.3.32.1 TrimString ( arg, where )
The abstract operation TrimString takes arguments arg (an
- Perform ?
RequireObjectCoercible (arg). - Let string be ?
ToString (arg). - If where is
start , then- Let trimmedString be the String value that is a copy of string with leading white space removed.
- Else if where is
end , then- Let trimmedString be the String value that is a copy of string with trailing white space removed.
- Else,
Assert : where isstart+end .- Let trimmedString be the String value that is a copy of string with both leading and trailing white space removed.
- Return trimmedString.
The definition of white space is the union of
22.1.3.33 String.prototype.trimEnd ( )
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
- Let string be the
this value. - Return ?
TrimString (string,end ).
This method is intentionally generic; it does not require that its
22.1.3.34 String.prototype.trimStart ( )
This method interprets a String value as a sequence of UTF-16 encoded code points, as described in
It performs the following steps when called:
- Let string be the
this value. - Return ?
TrimString (string,start ).
This method is intentionally generic; it does not require that its
22.1.3.35 String.prototype.valueOf ( )
This method performs the following steps when called:
- Return ?
ThisStringValue (this value).
22.1.3.35.1 ThisStringValue ( arg )
The abstract operation ThisStringValue takes argument arg (an
- If arg
is a String , return arg. - If arg
is an Object and arg has a [[StringData]] internal slot, then- Let string be arg.[[StringData]].
Assert : stringis a String .- Return string.
- Throw a
TypeError exception.
22.1.3.36 String.prototype [ %Symbol.iterator% ] ( )
This method returns an
It performs the following steps when called:
- Let string be the
this value. - Perform ?
RequireObjectCoercible (string). - Set string to ?
ToString (string). - Let closure be a new
Abstract Closure with no parameters that captures string and performs the following steps when called:- Let length be the length of string.
- Let position be 0.
- Repeat, while position < length,
- Let codePoint be
CodePointAt (string, position). - Let nextIndex be position + codePoint.[[CodeUnitCount]].
- Let resultString be the
substring of string from position to nextIndex. - Set position to nextIndex.
- Perform ?
GeneratorYield (CreateIteratorResultObject (resultString,false )).
- Let codePoint be
- Return
NormalCompletion (unused ).
- Return
CreateIteratorFromClosure (closure," ,%StringIteratorPrototype% "%StringIteratorPrototype% ).
The value of the
22.1.4 Properties of String Instances
String instances are
String instances have a
22.1.4.1 length
The number of elements in the String value represented by this String object.
Once a String object is initialized, this property is unchanging. It has the attributes { [[Writable]]:
22.1.5 String Iterator Objects
A String Iterator is an object that represents a specific iteration over some specific String instance object. There is not a named
22.1.5.1 The %StringIteratorPrototype% Object
The
- has properties that are inherited by all
String Iterator objects . - is an
ordinary object . - has a [[Prototype]] internal slot whose value is
%Iterator.prototype% . - has the following properties:
22.1.5.1.1 %StringIteratorPrototype% .next ( )
- Return ?
.GeneratorResume (this value,empty ," )%StringIteratorPrototype% "
22.1.5.1.2 %StringIteratorPrototype% [ %Symbol.toStringTag% ]
The initial value of the
This property has the attributes { [[Writable]]:
22.2 RegExp (Regular Expression) Objects
A RegExp object contains a regular expression and the associated flags.
The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language.
22.2.1 Patterns
The RegExp
Syntax
Each \u u u \u
The first two lines here are equivalent to CharacterClass.
A number of productions in this section are given alternative definitions in section
22.2.1.1 Static Semantics: Early Errors
This section is amended in
-
It is a Syntax Error if
CountLeftCapturingParensWithin (Pattern ) ≥ 232 - 1. -
It is a Syntax Error if
Pattern contains two distinctGroupSpecifier s x and y such that theCapturingGroupName of x is theCapturingGroupName of y and such thatMightBothParticipate (x, y) istrue .
-
It is a Syntax Error if the MV of the first
DecimalDigits is strictly greater than the MV of the secondDecimalDigits .
-
It is a Syntax Error if the
source text matched by RegularExpressionModifiers contains the same code point more than once.
-
It is a Syntax Error if the
source text matched by the firstRegularExpressionModifiers and thesource text matched by the secondRegularExpressionModifiers are both empty. -
It is a Syntax Error if the
source text matched by the firstRegularExpressionModifiers contains the same code point more than once. -
It is a Syntax Error if the
source text matched by the secondRegularExpressionModifiers contains the same code point more than once. -
It is a Syntax Error if any code point in the
source text matched by the firstRegularExpressionModifiers is also contained in thesource text matched by the secondRegularExpressionModifiers .
-
It is a Syntax Error if
GroupSpecifiersThatMatch (GroupName ) is empty.
-
It is a Syntax Error if the
CapturingGroupNumber ofDecimalEscape is strictly greater thanCountLeftCapturingParensWithin (thePattern containingAtomEscape ).
-
It is a Syntax Error if
IsCharacterClass of the firstClassAtom istrue orIsCharacterClass of the secondClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass of the firstClassAtom isfalse ,IsCharacterClass of the secondClassAtom isfalse , and theCharacterValue of the firstClassAtom is strictly greater than theCharacterValue of the secondClassAtom .
-
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash istrue orIsCharacterClass ofClassAtom istrue . -
It is a Syntax Error if
IsCharacterClass ofClassAtomNoDash isfalse ,IsCharacterClass ofClassAtom isfalse , and theCharacterValue ofClassAtomNoDash is strictly greater than theCharacterValue ofClassAtom .
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the numeric value of some code point matched by theIdentifierStartChar lexical grammar production.
-
It is a Syntax Error if the
RegExpIdentifierCodePoint ofRegExpIdentifierStart is not matched by theUnicodeIDStart lexical grammar production.
-
It is a Syntax Error if the
CharacterValue ofRegExpUnicodeEscapeSequence is not the numeric value of some code point matched by theIdentifierPartChar lexical grammar production.
-
It is a Syntax Error if the
RegExpIdentifierCodePoint ofRegExpIdentifierPart is not matched by theUnicodeIDContinue lexical grammar production.
-
It is a Syntax Error if the
source text matched by UnicodePropertyName is not aUnicode property name or property alias listed in the “Property name and aliases ” column ofTable 64 . -
It is a Syntax Error if the
source text matched by UnicodePropertyValue is not a property value or property value alias for the Unicode property or property alias given by thesource text matched by UnicodePropertyName listed inPropertyValueAliases.txt.
-
It is a Syntax Error if the
source text matched by LoneUnicodePropertyNameOrValue is not a Unicode property value or property value alias for the General_Category (gc) property listed inPropertyValueAliases.txt, nor a binary property or binary property alias listed in the “Property name and aliases ” column ofTable 65 , nor a binary property of strings listed in the “Property name ” column ofTable 66 . -
It is a Syntax Error if the enclosing
Pattern does not have a [UnicodeSetsMode] parameter and thesource text matched by LoneUnicodePropertyNameOrValue is a binary property of strings listed in the “Property name ” column ofTable 66 .
-
It is a Syntax Error if
MayContainStrings of theUnicodePropertyValueExpression istrue .
-
It is a Syntax Error if
MayContainStrings of theClassContents istrue .
-
It is a Syntax Error if
MayContainStrings of theClassContents istrue .
-
It is a Syntax Error if the
CharacterValue of the firstClassSetCharacter is strictly greater than theCharacterValue of the secondClassSetCharacter .
22.2.1.2 Static Semantics: CountLeftCapturingParensWithin ( parseNode )
The abstract operation CountLeftCapturingParensWithin takes argument parseNode (a ( pattern character that is matched by the ( terminal of the
This section is amended in
It performs the following steps when called:
Assert : parseNode is an instance of a production inthe RegExp Pattern grammar .- Return the number of
Atom :: ( GroupSpecifier opt Disjunction ) Parse Nodes contained within parseNode.
22.2.1.3 Static Semantics: CountLeftCapturingParensBefore ( parseNode )
The abstract operation CountLeftCapturingParensBefore takes argument parseNode (a
This section is amended in
It performs the following steps when called:
Assert : parseNode is an instance of a production inthe RegExp Pattern grammar .- Let pattern be the
Pattern containing parseNode. - Return the number of
Atom :: ( GroupSpecifier opt Disjunction ) Parse Nodes contained within pattern that either occur before parseNode or contain parseNode.
22.2.1.4 Static Semantics: MightBothParticipate ( x, y )
The abstract operation MightBothParticipate takes arguments x (a
Assert : x and y have the same enclosingPattern .- If the enclosing
Pattern contains aDisjunction :: Alternative | Disjunction Parse Node such that either x is contained within theAlternative and y is contained within the derivedDisjunction , or x is contained within the derivedDisjunction and y is contained within theAlternative , returnfalse . - Return
true .
22.2.1.5 Static Semantics: CapturingGroupNumber
The
This section is amended in
It is defined piecewise over the following productions:
- Return the MV of
NonZeroDigit .
- Let n be the number of code points in
DecimalDigits . - Return (the MV of
NonZeroDigit × 10n plus the MV ofDecimalDigits ).
The definitions of “the MV of
22.2.1.6 Static Semantics: IsCharacterClass
The
This section is amended in
It is defined piecewise over the following productions:
- Return
false .
- Return
true .
22.2.1.7 Static Semantics: CharacterValue
The
This section is amended in
It is defined piecewise over the following productions:
- Return the numeric value of U+002D (HYPHEN-MINUS).
- Let codePoint be the code point matched by
SourceCharacter . - Return the numeric value of codePoint.
- Return the numeric value of U+0008 (BACKSPACE).
- Return the numeric value of U+002D (HYPHEN-MINUS).
- Return the numeric value according to
Table 62 .
| ControlEscape | Numeric Value | Code Point | Unicode Name | Symbol |
|---|---|---|---|---|
t
|
9 |
U+0009
|
CHARACTER TABULATION | <HT> |
n
|
10 |
U+000A
|
LINE FEED (LF) | <LF> |
v
|
11 |
U+000B
|
LINE TABULATION | <VT> |
f
|
12 |
U+000C
|
FORM FEED (FF) | <FF> |
r
|
13 |
U+000D
|
CARRIAGE RETURN (CR) | <CR> |
- Let codePoint be the code point matched by
AsciiLetter . - Let i be the numeric value of codePoint.
- Return the remainder of dividing i by 32.
- Return the numeric value of U+0000 (NULL).
\0 represents the <NUL> character and cannot be followed by a decimal digit.
- Return the MV of
HexEscapeSequence .
- Let lead be the CharacterValue of
HexLeadSurrogate . - Let trail be the CharacterValue of
HexTrailSurrogate . - Let codePoint be
UTF16SurrogatePairToCodePoint (lead, trail). - Return the numeric value of codePoint.
- Return the MV of
Hex4Digits .
- Return the MV of
CodePoint .
- Return the MV of
Hex4Digits .
- Let codePoint be the code point matched by
IdentityEscape . - Return the numeric value of codePoint.
- Let codePoint be the code point matched by
SourceCharacter . - Return the numeric value of codePoint.
- Let codePoint be the code point matched by
ClassSetReservedPunctuator . - Return the numeric value of codePoint.
- Return the numeric value of U+0008 (BACKSPACE).
22.2.1.8 Static Semantics: MayContainStrings
The
- Return
false .
- If the
source text matched by LoneUnicodePropertyNameOrValue is a binary property of strings listed in the “Property name ” column ofTable 66 , returntrue . - Return
false .
- If the
ClassUnion is present, return MayContainStrings of theClassUnion . - Return
false .
- If MayContainStrings of the
ClassSetOperand istrue , returntrue . - If
ClassUnion is present, return MayContainStrings of theClassUnion . - Return
false .
- If MayContainStrings of the first
ClassSetOperand isfalse , returnfalse . - If MayContainStrings of the second
ClassSetOperand isfalse , returnfalse . - Return
true .
- If MayContainStrings of the
ClassIntersection isfalse , returnfalse . - If MayContainStrings of the
ClassSetOperand isfalse , returnfalse . - Return
true .
- Return MayContainStrings of the first
ClassSetOperand .
- Return MayContainStrings of the
ClassSubtraction .
- If MayContainStrings of the
ClassString istrue , returntrue . - Return MayContainStrings of the
ClassStringDisjunctionContents .
- Return
true .
- Return MayContainStrings of the
NonEmptyClassString .
- If
NonEmptyClassString is present, returntrue . - Return
false .
22.2.1.9 Static Semantics: GroupSpecifiersThatMatch ( thisGroupName )
The abstract operation GroupSpecifiersThatMatch takes argument thisGroupName (a
- Let name be the
CapturingGroupName of thisGroupName. - Let pattern be the
Pattern containing thisGroupName. - Let result be a new empty
List . - For each
GroupSpecifier groupSpecifier that pattern contains, do- If the
CapturingGroupName of groupSpecifier is name, then- Append groupSpecifier to result.
- If the
- Return result.
22.2.1.10 Static Semantics: CapturingGroupName
The
- Let idTextUnescaped be the
RegExpIdentifierCodePoints ofRegExpIdentifierName . - Return
CodePointsToString (idTextUnescaped).
22.2.1.11 Static Semantics: RegExpIdentifierCodePoints
The
- Let codePoint be the
RegExpIdentifierCodePoint ofRegExpIdentifierStart . - Return « codePoint ».
- Let cps be the RegExpIdentifierCodePoints of the derived
RegExpIdentifierName . - Let codePoint be the
RegExpIdentifierCodePoint ofRegExpIdentifierPart . - Return the
list-concatenation of cps and « codePoint ».
22.2.1.12 Static Semantics: RegExpIdentifierCodePoint
The
- Return the code point matched by
IdentifierStartChar .
- Return the code point matched by
IdentifierPartChar .
- Return the code point whose numeric value is the
CharacterValue ofRegExpUnicodeEscapeSequence .
- Let lead be the code unit whose numeric value is the numeric value of the code point matched by
UnicodeLeadSurrogate . - Let trail be the code unit whose numeric value is the numeric value of the code point matched by
UnicodeTrailSurrogate . - Return
UTF16SurrogatePairToCodePoint (lead, trail).
22.2.2 Pattern Semantics
A regular expression pattern is converted into an
A u nor a v. Otherwise, it is a Unicode pattern. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (
The syntax and semantics of
For example, consider a pattern expressed in source text as the single non-BMP character U+1D11E (MUSICAL SYMBOL G CLEF). Interpreted as a Unicode pattern, it would be a single element (character)
Patterns are passed to the RegExp
An implementation may not actually perform such translations to or from UTF-16, but the semantics of this specification requires that the result of pattern matching be as if such translations were performed.
22.2.2.1 Notation
The descriptions below use the following internal data structures:
-
A CharSetElement is one of the two following entities:
-
If regexpRecord.[[UnicodeSets]] is
false , then a CharSetElement is a character in the sense of the Pattern Semantics above. -
If regexpRecord.[[UnicodeSets]] is
true , then a CharSetElement is a sequence whose elements are characters in the sense of the Pattern Semantics above. This includes the empty sequence, sequences of one character, and sequences of more than one character. For convenience, when working with CharSetElements of this kind, an individual character is treated interchangeably with a sequence of one character.
-
If regexpRecord.[[UnicodeSets]] is
- A CharSet is a mathematical set of CharSetElements.
-
A CaptureRange is a
Record { [[StartIndex]], [[EndIndex]] } that represents the range of characters included in a capture, where [[StartIndex]] is aninteger representing the start index (inclusive) of the range within input, and [[EndIndex]] is aninteger representing the end index (exclusive) of the range within input. For anyCaptureRange , these indices must satisfy the invariant that [[StartIndex]] ≤ [[EndIndex]]. -
A MatchState is a
Record { [[Input]], [[EndIndex]], [[Captures]] } where [[Input]] is aList of characters representing the String being matched, [[EndIndex]] is aninteger , and [[Captures]] is aList of values, one for eachleft-capturing parenthesis in the pattern.MatchStates are used to represent partial match states in the regular expression matching algorithms. The [[EndIndex]] is one plus the index of the last input character matched so far by the pattern, while [[Captures]] holds the results of capturing parentheses. The nth element of [[Captures]] is either aCaptureRange representing the range of characters captured by the nth set of capturing parentheses, orundefined if the nth set of capturing parentheses hasn't been reached yet. Due to backtracking, manyMatchStates may be in use at any time during the matching process. -
A MatcherContinuation is an
Abstract Closure that takes oneMatchState argument and returns either aMatchState orfailure . TheMatcherContinuation attempts to match the remaining portion (specified by the closure's captured values) of the pattern against input, starting at the intermediate state given by itsMatchState argument. If the match succeeds, theMatcherContinuation returns the finalMatchState that it reached; if the match fails, theMatcherContinuation returnsfailure . -
A Matcher is an
Abstract Closure that takes two arguments—aMatchState and aMatcherContinuation —and returns either aMatchState orfailure . AMatcher attempts to match a middle subpattern (specified by the closure's captured values) of the pattern against theMatchState 's [[Input]], starting at the intermediate state given by itsMatchState argument. TheMatcherContinuation argument should be a closure that matches the rest of the pattern. After matching the subpattern of a pattern to obtain a newMatchState , theMatcher then callsMatcherContinuation on that newMatchState to test if the rest of the pattern can match as well. If it can, theMatcher returns theMatchState returned byMatcherContinuation ; if not, theMatcher may try different choices at its choice points, repeatedly callingMatcherContinuation until it either succeeds or all possibilities have been exhausted.
22.2.2.1.1 RegExp Records
A RegExp Record is a
It has the following fields:
| Field Name | Value | Meaning |
|---|---|---|
| [[IgnoreCase]] | a Boolean | indicates whether |
| [[Multiline]] | a Boolean | indicates whether |
| [[DotAll]] | a Boolean | indicates whether |
| [[Unicode]] | a Boolean | indicates whether |
| [[UnicodeSets]] | a Boolean | indicates whether |
| [[CapturingGroupsCount]] | a non-negative |
the number of |
22.2.2.2 Runtime Semantics: CompilePattern
The
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord andforward . - Return a new
Abstract Closure with parameters (input, index) that captures regexpRecord and m and performs the following steps when called:Assert : input is aList of characters.Assert : 0 ≤ index ≤ the number of elements in input.- Let c be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let capability be a
List of regexpRecord.[[CapturingGroupsCount]]undefined values, indexed 1 through regexpRecord.[[CapturingGroupsCount]]. - Let x be the
MatchState { [[Input]]: input, [[EndIndex]]: index, [[Captures]]: capability }. - Return m(x, c).
A Pattern compiles to an
22.2.2.3 Runtime Semantics: CompileSubpattern
The
This section is amended in
It is defined piecewise over the following productions:
- Let m1 be CompileSubpattern of
Alternative with arguments regexpRecord and direction. - Let m2 be CompileSubpattern of
Disjunction with arguments regexpRecord and direction. - Return
MatchTwoAlternatives (m1, m2).
The | regular expression operator separates two alternatives. The pattern first tries to match the left | produce
/a|ab/.exec("abc")
returns the result
/((a)|(ab))((c)|(bc))/.exec("abc")
returns the array
["abc", "a", "a", undefined, "bc", undefined, "bc"]
and not
["abc", "ab", undefined, "ab", "c", "c", undefined]
The order in which the two alternatives are tried is independent of the value of direction.
- Return
EmptyMatcher ().
- Let m1 be CompileSubpattern of
Alternative with arguments regexpRecord and direction. - Let m2 be CompileSubpattern of
Term with arguments regexpRecord and direction. - Return
MatchSequence (m1, m2, direction).
Consecutive
- Return
CompileAssertion ofAssertion with argument regexpRecord.
The resulting
- Return
CompileAtom ofAtom with arguments regexpRecord and direction.
- Let m be
CompileAtom ofAtom with arguments regexpRecord and direction. - Let q be
CompileQuantifier ofQuantifier . Assert : q.[[Min]] ≤ q.[[Max]].- Let parenIndex be
CountLeftCapturingParensBefore (Term ). - Let parenCount be
CountLeftCapturingParensWithin (Atom ). - Return a new
Matcher with parameters (x, c) that captures m, q, parenIndex, and parenCount and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Return
RepeatMatcher (m, q.[[Min]], q.[[Max]], q.[[Greedy]], x, c, parenIndex, parenCount).
22.2.2.3.1 RepeatMatcher ( m, min, max, greedy, matchState, continue, parenIndex, parenCount )
The abstract operation RepeatMatcher takes arguments m (a
- If max = 0, return continue(matchState).
- Let d be a new
MatcherContinuation with parameters (y) that captures m, min, max, greedy, matchState, continue, parenIndex, and parenCount and performs the following steps when called:Assert : y is aMatchState .- If min = 0 and y.[[EndIndex]] = matchState.[[EndIndex]], return
failure . - If min = 0, let min2 be 0; else let min2 be min - 1.
- If max = +∞, let max2 be +∞; else let max2 be max - 1.
- Return RepeatMatcher(m, min2, max2, greedy, y, continue, parenIndex, parenCount).
- Let capability be a copy of matchState.[[Captures]].
- For each
integer k in theinclusive interval from parenIndex + 1 to parenIndex + parenCount, set capability[k] toundefined . - Let input be matchState.[[Input]].
- Let e be matchState.[[EndIndex]].
- Let xr be the
MatchState { [[Input]]: input, [[EndIndex]]: e, [[Captures]]: capability }. - If min ≠ 0, return m(xr, d).
- If greedy is
false , then- Let z be continue(matchState).
- If z is not
failure , return z. - Return m(xr, d).
- Let z be m(xr, d).
- If z is not
failure , return z. - Return continue(matchState).
An
If the
Compare
/a[a-z]{2,4}/.exec("abcdefghi")
which returns
/a[a-z]{2,4}?/.exec("abcdefghi")
which returns
Consider also
/(aa|aabaac|ba|b|c)*/.exec("aabaac")
which, by the choice point ordering above, returns the array
["aaba", "ba"]
and not any of:
["aabaac", "aabaac"]
["aabaac", "c"]
The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:
"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/, "$1")
which returns the gcd in unary notation
Step
/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")
which returns the array
["zaacbbbcac", "z", "ac", "a", undefined, "c"]
and not
["zaacbbbcac", "z", "ac", "a", "bbb", "c"]
because each iteration of the outermost * clears all captured Strings contained in the quantified
Step
/(a*)*/.exec("b")
or the slightly more complicated:
/(a*)b\1+/.exec("baaaac")
which returns the array
["b", ""]
22.2.2.3.2 EmptyMatcher ( )
The abstract operation EmptyMatcher takes no arguments and returns a
- Return a new
Matcher with parameters (matchState, continue) that captures nothing and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Return continue(matchState).
22.2.2.3.3 MatchTwoAlternatives ( m1, m2 )
The abstract operation MatchTwoAlternatives takes arguments m1 (a
- Return a new
Matcher with parameters (matchState, continue) that captures m1 and m2 and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let result be m1(matchState, continue).
- If result is not
failure , return result. - Return m2(matchState, continue).
22.2.2.3.4 MatchSequence ( m1, m2, direction )
The abstract operation MatchSequence takes arguments m1 (a
- If direction is
forward , then- Return a new
Matcher with parameters (matchState, continue) that captures m1 and m2 and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures continue and m2 and performs the following steps when called:Assert : y is aMatchState .- Return m2(y, continue).
- Return m1(matchState, d).
- Return a new
Assert : direction isbackward .- Return a new
Matcher with parameters (matchState, continue) that captures m1 and m2 and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures continue and m1 and performs the following steps when called:Assert : y is aMatchState .- Return m1(y, continue).
- Return m2(matchState, d).
22.2.2.4 Runtime Semantics: CompileAssertion
The
This section is amended in
It is defined piecewise over the following productions:
- Return a new
Matcher with parameters (matchState, continue) that captures regexpRecord and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let input be matchState.[[Input]].
- Let e be matchState.[[EndIndex]].
- If e = 0, or if regexpRecord.[[Multiline]] is
true and the character input[e - 1] is matched byLineTerminator , then- Return continue(matchState).
- Return
failure .
Even when the y flag is used with a pattern, ^ always matches only at the beginning of input, or (if regexpRecord.[[Multiline]] is
- Return a new
Matcher with parameters (matchState, continue) that captures regexpRecord and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let input be matchState.[[Input]].
- Let e be matchState.[[EndIndex]].
- Let inputLength be the number of elements in input.
- If e = inputLength, or if regexpRecord.[[Multiline]] is
true and the character input[e] is matched byLineTerminator , then- Return continue(matchState).
- Return
failure .
- Return a new
Matcher with parameters (matchState, continue) that captures regexpRecord and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let input be matchState.[[Input]].
- Let e be matchState.[[EndIndex]].
- Let a be
IsWordChar (regexpRecord, input, e - 1). - Let b be
IsWordChar (regexpRecord, input, e). - If a is
true and b isfalse , or if a isfalse and b istrue , return continue(matchState). - Return
failure .
- Return a new
Matcher with parameters (matchState, continue) that captures regexpRecord and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let input be matchState.[[Input]].
- Let e be matchState.[[EndIndex]].
- Let a be
IsWordChar (regexpRecord, input, e - 1). - Let b be
IsWordChar (regexpRecord, input, e). - If a is
true and b istrue , or if a isfalse and b isfalse , return continue(matchState). - Return
failure .
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord andforward . - Return a new
Matcher with parameters (matchState, continue) that captures m and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let result be m(matchState, d).
- If result is
failure , returnfailure . Assert : result is aMatchState .- Let capability be result.[[Captures]].
- Let input be matchState.[[Input]].
- Let xe be matchState.[[EndIndex]].
- Let z be the
MatchState { [[Input]]: input, [[EndIndex]]: xe, [[Captures]]: capability }. - Return continue(z).
The form (?= ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside (?= form (this unusual behaviour is inherited from Perl). This only matters when the
For example,
/(?=(a+))/.exec("baaabac")
matches the empty String immediately after the first b and therefore returns the array:
["", "aaa"]
To illustrate the lack of backtracking into the lookahead, consider:
/(?=(a+))a*b\1/.exec("baaabac")
This expression returns
["aba", "a"]
and not:
["aaaba", "a"]
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord andforward . - Return a new
Matcher with parameters (matchState, continue) that captures m and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let result be m(matchState, d).
- If result is not
failure , returnfailure . - Return continue(matchState).
The form (?! ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside
/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")
looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against
["baaabaac", "ba", undefined, "abaac"]
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord andbackward . - Return a new
Matcher with parameters (matchState, continue) that captures m and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let result be m(matchState, d).
- If result is
failure , returnfailure . Assert : result is aMatchState .- Let capability be result.[[Captures]].
- Let input be matchState.[[Input]].
- Let xe be matchState.[[EndIndex]].
- Let z be the
MatchState { [[Input]]: input, [[EndIndex]]: xe, [[Captures]]: capability }. - Return continue(z).
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord andbackward . - Return a new
Matcher with parameters (matchState, continue) that captures m and performs the following steps when called:Assert : matchState is aMatchState .Assert : continue is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures nothing and performs the following steps when called:Assert : y is aMatchState .- Return y.
- Let result be m(matchState, d).
- If result is not
failure , returnfailure . - Return continue(matchState).
22.2.2.4.1 IsWordChar ( regexpRecord, input, e )
The abstract operation IsWordChar takes arguments regexpRecord (a
- Let inputLength be the number of elements in input.
- If e = -1 or e = inputLength, return
false . - Let char be the character input[e].
- If
WordCharacters (regexpRecord) contains char, returntrue . - Return
false .
22.2.2.5 Runtime Semantics: CompileQuantifier
The
- Let qp be
CompileQuantifierPrefix ofQuantifierPrefix . - Return the
Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]:true }.
- Let qp be
CompileQuantifierPrefix ofQuantifierPrefix . - Return the
Record { [[Min]]: qp.[[Min]], [[Max]]: qp.[[Max]], [[Greedy]]:false }.
22.2.2.6 Runtime Semantics: CompileQuantifierPrefix
The
- Return the
Record { [[Min]]: 0, [[Max]]: +∞ }.
- Return the
Record { [[Min]]: 1, [[Max]]: +∞ }.
- Return the
Record { [[Min]]: 0, [[Max]]: 1 }.
- Let i be the MV of
DecimalDigits (see12.9.3 ). - Return the
Record { [[Min]]: i, [[Max]]: i }.
- Let i be the MV of
DecimalDigits . - Return the
Record { [[Min]]: i, [[Max]]: +∞ }.
- Let i be the MV of the first
DecimalDigits . - Let j be the MV of the second
DecimalDigits . - Return the
Record { [[Min]]: i, [[Max]]: j }.
22.2.2.7 Runtime Semantics: CompileAtom
The
This section is amended in
It is defined piecewise over the following productions:
- Let char be the character matched by
PatternCharacter . - Let charSet be a one-element
CharSet containing the character char. - Return
CharacterSetMatcher (regexpRecord, charSet,false , direction).
- Let charSet be
AllCharacters (regexpRecord). - If regexpRecord.[[DotAll]] is not
true , then- Remove from charSet all characters corresponding to a code point on the right-hand side of the
LineTerminator production.
- Remove from charSet all characters corresponding to a code point on the right-hand side of the
- Return
CharacterSetMatcher (regexpRecord, charSet,false , direction).
- Let cc be
CompileCharacterClass ofCharacterClass with argument regexpRecord. - Let cs be cc.[[CharSet]].
- If regexpRecord.[[UnicodeSets]] is
false or everyCharSetElement of cs consists of a single character (including if cs is empty), returnCharacterSetMatcher (regexpRecord, cs, cc.[[Invert]], direction). Assert : cc.[[Invert]] isfalse .- Let listOfMatchers be an empty
List ofMatchers . - For each
CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do- Let cs2 be a one-element
CharSet containing the last code point of s. - Let m2 be
CharacterSetMatcher (regexpRecord, cs2,false , direction). - For each code point c1 in s, iterating backwards from its second-to-last code point, do
- Let cs1 be a one-element
CharSet containing c1. - Let m1 be
CharacterSetMatcher (regexpRecord, cs1,false , direction). - Set m2 to
MatchSequence (m1, m2, direction).
- Let cs1 be a one-element
- Append m2 to listOfMatchers.
- Let cs2 be a one-element
- Let singles be the
CharSet containing everyCharSetElement of cs that consists of a single character. - Append
CharacterSetMatcher (regexpRecord, singles,false , direction) to listOfMatchers. - If cs contains the empty sequence of characters, append
EmptyMatcher () to listOfMatchers. - Let m2 be the last
Matcher in listOfMatchers. - For each
Matcher m1 of listOfMatchers, iterating backwards from its second-to-last element, do- Set m2 to
MatchTwoAlternatives (m1, m2).
- Set m2 to
- Return m2.
- Let m be
CompileSubpattern ofDisjunction with arguments regexpRecord and direction. - Let parenIndex be
CountLeftCapturingParensBefore (Atom ). - Return a new
Matcher with parameters (x, c) that captures direction, m, and parenIndex and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let d be a new
MatcherContinuation with parameters (y) that captures x, c, direction, and parenIndex and performs the following steps when called:Assert : y is aMatchState .- Let capability be a copy of y.[[Captures]].
- Let input be x.[[Input]].
- Let xe be x.[[EndIndex]].
- Let ye be y.[[EndIndex]].
- If direction is
forward , thenAssert : xe ≤ ye.- Let r be the
CaptureRange { [[StartIndex]]: xe, [[EndIndex]]: ye }.
- Else,
Assert : direction isbackward .Assert : ye ≤ xe.- Let r be the
CaptureRange { [[StartIndex]]: ye, [[EndIndex]]: xe }.
- Set capability[parenIndex + 1] to r.
- Let z be the
MatchState { [[Input]]: input, [[EndIndex]]: ye, [[Captures]]: capability }. - Return c(z).
- Return m(x, d).
Parentheses of the form ( ) serve both to group the components of the \ followed by a non-zero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching (?: ) instead.
- Let addModifiers be the
source text matched by RegularExpressionModifiers . - Let removeModifiers be the empty String.
- Let modifiedRer be
UpdateModifiers (regexpRecord,CodePointsToString (addModifiers), removeModifiers). - Return
CompileSubpattern ofDisjunction with arguments modifiedRer and direction.
- Let addModifiers be the
source text matched by the firstRegularExpressionModifiers . - Let removeModifiers be the
source text matched by the secondRegularExpressionModifiers . - Let modifiedRer be
UpdateModifiers (regexpRecord,CodePointsToString (addModifiers),CodePointsToString (removeModifiers)). - Return
CompileSubpattern ofDisjunction with arguments modifiedRer and direction.
- Let n be the
CapturingGroupNumber ofDecimalEscape . Assert : n ≤ regexpRecord.[[CapturingGroupsCount]].- Return
BackreferenceMatcher (regexpRecord, « n », direction).
An escape sequence of the form \ followed by a non-zero decimal number n matches the result of the nth set of capturing parentheses (
- Let charValue be the
CharacterValue ofCharacterEscape . - Let char be the character whose character value is charValue.
- Let charSet be a one-element
CharSet containing the character char. - Return
CharacterSetMatcher (regexpRecord, charSet,false , direction).
- Let cs be
CompileToCharSet ofCharacterClassEscape with argument regexpRecord. - If regexpRecord.[[UnicodeSets]] is
false or everyCharSetElement of cs consists of a single character (including if cs is empty), returnCharacterSetMatcher (regexpRecord, cs,false , direction). - Let listOfMatchers be an empty
List ofMatchers . - For each
CharSetElement s in cs containing more than 1 character, iterating in descending order of length, do- Let cs2 be a one-element
CharSet containing the last code point of s. - Let m2 be
CharacterSetMatcher (regexpRecord, cs2,false , direction). - For each code point c1 in s, iterating backwards from its second-to-last code point, do
- Let cs1 be a one-element
CharSet containing c1. - Let m1 be
CharacterSetMatcher (regexpRecord, cs1,false , direction). - Set m2 to
MatchSequence (m1, m2, direction).
- Let cs1 be a one-element
- Append m2 to listOfMatchers.
- Let cs2 be a one-element
- Let singles be the
CharSet containing everyCharSetElement of cs that consists of a single character. - Append
CharacterSetMatcher (regexpRecord, singles,false , direction) to listOfMatchers. - If cs contains the empty sequence of characters, append
EmptyMatcher () to listOfMatchers. - Let m2 be the last
Matcher in listOfMatchers. - For each
Matcher m1 of listOfMatchers, iterating backwards from its second-to-last element, do- Set m2 to
MatchTwoAlternatives (m1, m2).
- Set m2 to
- Return m2.
- Let matchingGroupSpecifiers be
GroupSpecifiersThatMatch (GroupName ). - Let parenIndices be a new empty
List . - For each
GroupSpecifier groupSpecifier of matchingGroupSpecifiers, do- Let parenIndex be
CountLeftCapturingParensBefore (groupSpecifier). - Append parenIndex to parenIndices.
- Let parenIndex be
- Return
BackreferenceMatcher (regexpRecord, parenIndices, direction).
22.2.2.7.1 CharacterSetMatcher ( regexpRecord, charSet, invert, direction )
The abstract operation CharacterSetMatcher takes arguments regexpRecord (a
- If regexpRecord.[[UnicodeSets]] is
true , thenAssert : invert isfalse .Assert : EveryCharSetElement of charSet consists of a single character.
- Return a new
Matcher with parameters (x, c) that captures regexpRecord, charSet, invert, and direction and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let input be x.[[Input]].
- Let endIndex be x.[[EndIndex]].
- If direction is
forward , let f be endIndex + 1. - Else, let f be endIndex - 1.
- Let inputLength be the number of elements in input.
- If f < 0 or f > inputLength, return
failure . - Let index be
min (endIndex, f). - Let char be the character input[index].
- Let cc be
Canonicalize (regexpRecord, char). - If there exists a
CharSetElement in charSet containing exactly one character a such thatCanonicalize (regexpRecord, a) is cc, let found betrue ; else let found befalse . - If invert is
false and found isfalse , returnfailure . - If invert is
true and found istrue , returnfailure . - Let capability be x.[[Captures]].
- Let y be the
MatchState { [[Input]]: input, [[EndIndex]]: f, [[Captures]]: capability }. - Return c(y).
22.2.2.7.2 BackreferenceMatcher ( regexpRecord, ns, direction )
The abstract operation BackreferenceMatcher takes arguments regexpRecord (a
- Return a new
Matcher with parameters (x, c) that captures regexpRecord, ns, and direction and performs the following steps when called:Assert : x is aMatchState .Assert : c is aMatcherContinuation .- Let input be x.[[Input]].
- Let capability be x.[[Captures]].
- Let r be
undefined . - For each
integer n of ns, do- If capability[n] is not
undefined , thenAssert : r isundefined .- Set r to capability[n].
- If capability[n] is not
- If r is
undefined , return c(x). - Let endIndex be x.[[EndIndex]].
- Let rs be r.[[StartIndex]].
- Let re be r.[[EndIndex]].
- Let length be re - rs.
- If direction is
forward , let f be endIndex + length. - Else, let f be endIndex - length.
- Let inputLength be the number of elements in input.
- If f < 0 or f > inputLength, return
failure . - Let g be
min (endIndex, f). - If there exists an
integer i in theinterval from 0 (inclusive) to length (exclusive) such thatCanonicalize (regexpRecord, input[rs + i]) is notCanonicalize (regexpRecord, input[g + i]), returnfailure . - Let y be the
MatchState { [[Input]]: input, [[EndIndex]]: f, [[Captures]]: capability }. - Return c(y).
22.2.2.7.3 Canonicalize ( regexpRecord, char )
The abstract operation Canonicalize takes arguments regexpRecord (a
- If
HasEitherUnicodeFlag (regexpRecord) istrue and regexpRecord.[[IgnoreCase]] istrue , then- If the file
CaseFolding.txtof the Unicode Character Database provides a simple or common case folding mapping for char, return the result of applying that mapping to char. - Return char.
- If the file
- If regexpRecord.[[IgnoreCase]] is
false , return char. Assert : char is a UTF-16 code unit.- Let codePoint be the code point whose numeric value is the numeric value of char.
- Let u be toUppercase(« codePoint »), according to the Unicode Default Case Conversion algorithm.
- Let uString be
CodePointsToString (u). - If the length of uString ≠ 1, return char.
- Let codeUnit be uString's single code unit element.
- If the numeric value of char ≥ 128 and the numeric value of codeUnit < 128, return char.
- Return codeUnit.
In case-insignificant matches when ß (U+00DF LATIN SMALL LETTER SHARP S) to ss or SS. It may however map code points outside the Basic Latin block to code points within it—for example, ſ (U+017F LATIN SMALL LETTER LONG S) case-folds to s (U+0073 LATIN SMALL LETTER S) and K (U+212A KELVIN SIGN) case-folds to k (U+006B LATIN SMALL LETTER K). Strings containing those code points are matched by regular expressions such as /[a-z]/ui.
In case-insignificant matches when Ω (U+2126 OHM SIGN) is mapped by toUppercase to itself but by toCasefold to ω (U+03C9 GREEK SMALL LETTER OMEGA) along with Ω (U+03A9 GREEK CAPITAL LETTER OMEGA), so /[ω]/ui and /[\u03A9]/ui but not by /[ω]/i or /[\u03A9]/i. Also, no code point outside the Basic Latin block is mapped to a code point within it, so strings such as /[a-z]/i.
22.2.2.7.4 UpdateModifiers ( regexpRecord, add, remove )
The abstract operation UpdateModifiers takes arguments regexpRecord (a
Assert : add and remove have no elements in common.- Let ignoreCase be regexpRecord.[[IgnoreCase]].
- Let multiline be regexpRecord.[[Multiline]].
- Let dotAll be regexpRecord.[[DotAll]].
- Let unicode be regexpRecord.[[Unicode]].
- Let unicodeSets be regexpRecord.[[UnicodeSets]].
- Let capturingGroupsCount be regexpRecord.[[CapturingGroupsCount]].
- If remove contains
"i" , set ignoreCase tofalse . - Else if add contains
"i" , set ignoreCase totrue . - If remove contains
"m" , set multiline tofalse . - Else if add contains
"m" , set multiline totrue . - If remove contains
"s" , set dotAll tofalse . - Else if add contains
"s" , set dotAll totrue . - Return the
RegExp Record { [[IgnoreCase]]: ignoreCase, [[Multiline]]: multiline, [[DotAll]]: dotAll, [[Unicode]]: unicode, [[UnicodeSets]]: unicodeSets, [[CapturingGroupsCount]]: capturingGroupsCount }.
22.2.2.8 Runtime Semantics: CompileCharacterClass
The
- Let charSet be
CompileToCharSet ofClassContents with argument regexpRecord. - Return the
Record { [[CharSet]]: charSet, [[Invert]]:false }.
- Let charSet be
CompileToCharSet ofClassContents with argument regexpRecord. - If regexpRecord.[[UnicodeSets]] is
true , then- Return the
Record { [[CharSet]]:CharacterComplement (regexpRecord, charSet), [[Invert]]:false }.
- Return the
- Return the
Record { [[CharSet]]: charSet, [[Invert]]:true }.
22.2.2.9 Runtime Semantics: CompileToCharSet
The
This section is amended in
It is defined piecewise over the following productions:
- Return the empty
CharSet .
- Let charSet be CompileToCharSet of
ClassAtom with argument regexpRecord. - Let otherSet be CompileToCharSet of
NonemptyClassRangesNoDash with argument regexpRecord. - Return the union of
CharSets charSet and otherSet.
- Let charSet be CompileToCharSet of the first
ClassAtom with argument regexpRecord. - Let otherSet be CompileToCharSet of the second
ClassAtom with argument regexpRecord. - Let remainingSet be CompileToCharSet of
ClassContents with argument regexpRecord. - Let rangeSet be
CharacterRange (charSet, otherSet). - Return the union of rangeSet and remainingSet.
- Let charSet be CompileToCharSet of
ClassAtomNoDash with argument regexpRecord. - Let otherSet be CompileToCharSet of
NonemptyClassRangesNoDash with argument regexpRecord. - Return the union of
CharSets charSet and otherSet.
- Let charSet be CompileToCharSet of
ClassAtomNoDash with argument regexpRecord. - Let otherSet be CompileToCharSet of
ClassAtom with argument regexpRecord. - Let remainingSet be CompileToCharSet of
ClassContents with argument regexpRecord. - Let rangeSet be
CharacterRange (charSet, otherSet). - Return the union of rangeSet and remainingSet.
Even if the pattern ignores case, the case of the two ends of a range is significant in determining which characters belong to the range. Thus, for example, the pattern /[E-F]/i matches only the letters E, F, e, and f, while the pattern /[E-f]/i matches all uppercase and lowercase letters in the Unicode Basic Latin block as well as the symbols [, \, ], ^, _, and `.
A - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of
- Return the
CharSet containing the single character-U+002D (HYPHEN-MINUS).
- Return the
CharSet containing the character matched bySourceCharacter .
- Let charValue be the
CharacterValue of thisClassEscape . - Let char be the character whose character value is charValue.
- Return the
CharSet containing the single character char.
A \b, \B, and backreferences. Inside a \b means the backspace character, while \B and backreferences raise errors. Using a backreference inside a
- Return the ten-element
CharSet containing the characters0,1,2,3,4,5,6,7,8, and9.
- Let charSet be the
CharSet returned by .CharacterClassEscape :: d - Return
CharacterComplement (regexpRecord, charSet).
- Return the
CharSet containing all characters corresponding to a code point on the right-hand side of theWhiteSpace orLineTerminator productions.
- Let charSet be the
CharSet returned by .CharacterClassEscape :: s - Return
CharacterComplement (regexpRecord, charSet).
- Return
MaybeSimpleCaseFolding (regexpRecord,WordCharacters (regexpRecord)).
- Let charSet be the
CharSet returned by .CharacterClassEscape :: w - Return
CharacterComplement (regexpRecord, charSet).
- Return CompileToCharSet of
UnicodePropertyValueExpression with argument regexpRecord.
- Let charSet be CompileToCharSet of
UnicodePropertyValueExpression with argument regexpRecord. Assert : charSet contains only single code points.- Return
CharacterComplement (regexpRecord, charSet).
- Let ps be the
source text matched by UnicodePropertyName . - Let p be
UnicodeMatchProperty (regexpRecord, ps). Assert : p is aUnicode property name or property alias listed in the “Property name and aliases ” column ofTable 64 .- Let vs be the
source text matched by UnicodePropertyValue . - Let v be
UnicodeMatchPropertyValue (p, vs). - Let charSet be the
CharSet containing all Unicode code points whose character database definition includes the property p with value v. - Return
MaybeSimpleCaseFolding (regexpRecord, charSet).
- Let s be the
source text matched by LoneUnicodePropertyNameOrValue . - If
UnicodeMatchPropertyValue (General_Category, s) is a Unicode property value or property value alias for the General_Category (gc) property listed inPropertyValueAliases.txt, then- Return the
CharSet containing all Unicode code points whose character database definition includes the property “General_Category” with value s.
- Return the
- Let p be
UnicodeMatchProperty (regexpRecord, s). Assert : p is a binary Unicode property or binary property alias listed in the “Property name and aliases ” column ofTable 65 , or a binary Unicode property of strings listed in the “Property name ” column ofTable 66 .- Let charSet be the
CharSet containing all CharSetElements whose character database definition includes the property p with value “True”. - Return
MaybeSimpleCaseFolding (regexpRecord, charSet).
- Let charSet be CompileToCharSet of
ClassSetRange with argument regexpRecord. - If
ClassUnion is present, then- Let otherSet be CompileToCharSet of
ClassUnion with argument regexpRecord. - Return the union of
CharSets charSet and otherSet.
- Let otherSet be CompileToCharSet of
- Return charSet.
- Let charSet be CompileToCharSet of
ClassSetOperand with argument regexpRecord. - If
ClassUnion is present, then- Let otherSet be CompileToCharSet of
ClassUnion with argument regexpRecord. - Return the union of
CharSets charSet and otherSet.
- Let otherSet be CompileToCharSet of
- Return charSet.
- Let charSet be CompileToCharSet of the first
ClassSetOperand with argument regexpRecord. - Let otherSet be CompileToCharSet of the second
ClassSetOperand with argument regexpRecord. - Return the intersection of
CharSets charSet and otherSet.
- Let charSet be CompileToCharSet of the
ClassIntersection with argument regexpRecord. - Let otherSet be CompileToCharSet of the
ClassSetOperand with argument regexpRecord. - Return the intersection of
CharSets charSet and otherSet.
- Let charSet be CompileToCharSet of the first
ClassSetOperand with argument regexpRecord. - Let otherSet be CompileToCharSet of the second
ClassSetOperand with argument regexpRecord. - Return the
CharSet containing the CharSetElements of charSet which are not also CharSetElements of otherSet.
- Let charSet be CompileToCharSet of the
ClassSubtraction with argument regexpRecord. - Let otherSet be CompileToCharSet of the
ClassSetOperand with argument regexpRecord. - Return the
CharSet containing the CharSetElements of charSet which are not also CharSetElements of otherSet.
- Let charSet be CompileToCharSet of the first
ClassSetCharacter with argument regexpRecord. - Let otherSet be CompileToCharSet of the second
ClassSetCharacter with argument regexpRecord. - Return
MaybeSimpleCaseFolding (regexpRecord,CharacterRange (charSet, otherSet)).
The result will often consist of two or more ranges. When UnicodeSets is
- Let charSet be CompileToCharSet of
ClassSetCharacter with argument regexpRecord. - Return
MaybeSimpleCaseFolding (regexpRecord, charSet).
- Let charSet be CompileToCharSet of
ClassStringDisjunction with argument regexpRecord. - Return
MaybeSimpleCaseFolding (regexpRecord, charSet).
- Return CompileToCharSet of
NestedClass with argument regexpRecord.
- Return CompileToCharSet of
ClassContents with argument regexpRecord.
- Let charSet be CompileToCharSet of
ClassContents with argument regexpRecord. - Return
CharacterComplement (regexpRecord, charSet).
- Return CompileToCharSet of
CharacterClassEscape with argument regexpRecord.
- Return CompileToCharSet of
ClassStringDisjunctionContents with argument regexpRecord.
- Let s be
CompileClassSetString ofClassString with argument regexpRecord. - Return the
CharSet containing the one string s.
- Let s be
CompileClassSetString ofClassString with argument regexpRecord. - Let charSet be the
CharSet containing the one string s. - Let otherSet be CompileToCharSet of
ClassStringDisjunctionContents with argument regexpRecord. - Return the union of
CharSets charSet and otherSet.
- Let charValue be the
CharacterValue of thisClassSetCharacter . - Let char be the character whose character value is charValue.
- Return the
CharSet containing the single character char.
- Return the
CharSet containing the single character U+0008 (BACKSPACE).
22.2.2.9.1 CharacterRange ( charSet, otherSet )
The abstract operation CharacterRange takes arguments charSet (a
Assert : charSet and otherSet each contain exactly one character.- Let a be the one character in
CharSet charSet. - Let b be the one character in
CharSet otherSet. - Let i be the character value of character a.
- Let j be the character value of character b.
Assert : i ≤ j.- Return the
CharSet containing all characters with a character value in theinclusive interval from i to j.
22.2.2.9.2 HasEitherUnicodeFlag ( regexpRecord )
The abstract operation HasEitherUnicodeFlag takes argument regexpRecord (a
- If regexpRecord.[[Unicode]] is
true or regexpRecord.[[UnicodeSets]] istrue , returntrue . - Return
false .
22.2.2.9.3 WordCharacters ( regexpRecord )
The abstract operation WordCharacters takes argument regexpRecord (a \b, \B, \w, and \W. It performs the following steps when called:
- Let basicWordChars be the
CharSet containing every character inthe ASCII word characters . - Let extraWordChars be the
CharSet containing all characters c such that c is not in basicWordChars butCanonicalize (regexpRecord, c) is in basicWordChars. Assert : extraWordChars is empty unlessHasEitherUnicodeFlag (regexpRecord) istrue and regexpRecord.[[IgnoreCase]] istrue .- Return the union of basicWordChars and extraWordChars.
22.2.2.9.4 AllCharacters ( regexpRecord )
The abstract operation AllCharacters takes argument regexpRecord (a
- If regexpRecord.[[UnicodeSets]] is
true and regexpRecord.[[IgnoreCase]] istrue , then- Return the
CharSet containing all Unicode code points c that do not have a Simple Case Folding mapping (that is,scf (c)=c).
- Return the
- If
HasEitherUnicodeFlag (regexpRecord) istrue , then- Return the
CharSet containing all code point values.
- Return the
- Return the
CharSet containing all code unit values.
22.2.2.9.5 MaybeSimpleCaseFolding ( regexpRecord, charSet )
The abstract operation MaybeSimpleCaseFolding takes arguments regexpRecord (a CaseFolding.txt of the Unicode Character Database (each of which maps a single code point to another single code point) to map each
- If regexpRecord.[[UnicodeSets]] is
false or regexpRecord.[[IgnoreCase]] isfalse , return charSet. - Let otherSet be a new empty
CharSet . - For each
CharSetElement s of charSet, do- Let t be an empty sequence of characters.
- For each single code point codePoint in s, do
- Append
scf (codePoint) to t.
- Append
- Add t to otherSet.
- Return otherSet.
22.2.2.9.6 CharacterComplement ( regexpRecord, complement )
The abstract operation CharacterComplement takes arguments regexpRecord (a
- Let charSet be
AllCharacters (regexpRecord). - Return the
CharSet containing the CharSetElements of charSet which are not also CharSetElements of complement.
22.2.2.9.7 UnicodeMatchProperty ( regexpRecord, p )
The abstract operation UnicodeMatchProperty takes arguments regexpRecord (a
- If regexpRecord.[[UnicodeSets]] is
true and p is aUnicode property name listed in the “Property name ” column ofTable 66 , then- Return the
List of Unicode code points p.
- Return the
Assert : p is aUnicode property name or property alias listed in the “Property name and aliases ” column ofTable 64 orTable 65 .- Let c be the canonical
property name of p as given in the “Canonical property name ” column of the corresponding row. - Return the
List of Unicode code points c.
Implementations must support the
For example, Script_Extensions (scx (property alias) are valid, but script_extensions or Scx aren't.
The listed properties form a superset of what UTS18 RL1.2 requires.
The spellings of entries in these tables (including casing) match the spellings used in the file PropertyAliases.txt in the Unicode Character Database. The precise spellings in that file are guaranteed to be stable.
General_Category |
General_Category |
gc |
|
Script |
Script |
sc |
|
Script_Extensions |
Script_Extensions |
scx |
ASCII |
ASCII |
ASCII_Hex_Digit |
ASCII_Hex_Digit |
AHex |
|
Alphabetic |
Alphabetic |
Alpha |
|
Any |
Any |
Assigned |
Assigned |
Bidi_Control |
Bidi_Control |
Bidi_C |
|
Bidi_Mirrored |
Bidi_Mirrored |
Bidi_M |
|
Case_Ignorable |
Case_Ignorable |
CI |
|
Cased |
Cased |
Changes_When_Casefolded |
Changes_When_Casefolded |
CWCF |
|
Changes_When_Casemapped |
Changes_When_Casemapped |
CWCM |
|
Changes_When_Lowercased |
Changes_When_Lowercased |
CWL |
|
Changes_When_NFKC_Casefolded |
Changes_When_NFKC_Casefolded |
CWKCF |
|
Changes_When_Titlecased |
Changes_When_Titlecased |
CWT |
|
Changes_When_Uppercased |
Changes_When_Uppercased |
CWU |
|
Dash |
Dash |
Default_Ignorable_Code_Point |
Default_Ignorable_Code_Point |
DI |
|
Deprecated |
Deprecated |
Dep |
|
Diacritic |
Diacritic |
Dia |
|
Emoji |
Emoji |
Emoji_Component |
Emoji_Component |
EComp |
|
Emoji_Modifier |
Emoji_Modifier |
EMod |
|
Emoji_Modifier_Base |
Emoji_Modifier_Base |
EBase |
|
Emoji_Presentation |
Emoji_Presentation |
EPres |
|
Extended_Pictographic |
Extended_Pictographic |
ExtPict |
|
Extender |
Extender |
Ext |
|
Grapheme_Base |
Grapheme_Base |
Gr_Base |
|
Grapheme_Extend |
Grapheme_Extend |
Gr_Ext |
|
Hex_Digit |
Hex_Digit |
Hex |
|
IDS_Binary_Operator |
IDS_Binary_Operator |
IDSB |
|
IDS_Trinary_Operator |
IDS_Trinary_Operator |
IDST |
|
ID_Continue |
ID_Continue |
IDC |
|
ID_Start |
ID_Start |
IDS |
|
Ideographic |
Ideographic |
Ideo |
|
Join_Control |
Join_Control |
Join_C |
|
Logical_Order_Exception |
Logical_Order_Exception |
LOE |
|
Lowercase |
Lowercase |
Lower |
|
Math |
Math |
Noncharacter_Code_Point |
Noncharacter_Code_Point |
NChar |
|
Pattern_Syntax |
Pattern_Syntax |
Pat_Syn |
|
Pattern_White_Space |
Pattern_White_Space |
Pat_WS |
|
Quotation_Mark |
Quotation_Mark |
QMark |
|
Radical |
Radical |
Regional_Indicator |
Regional_Indicator |
RI |
|
Sentence_Terminal |
Sentence_Terminal |
STerm |
|
Soft_Dotted |
Soft_Dotted |
SD |
|
Terminal_Punctuation |
Terminal_Punctuation |
Term |
|
Unified_Ideograph |
Unified_Ideograph |
UIdeo |
|
Uppercase |
Uppercase |
Upper |
|
Variation_Selector |
Variation_Selector |
VS |
|
White_Space |
White_Space |
space |
|
XID_Continue |
XID_Continue |
XIDC |
|
XID_Start |
XID_Start |
XIDS |
Basic_Emoji |
Emoji_Keycap_Sequence |
RGI_Emoji_Modifier_Sequence |
RGI_Emoji_Flag_Sequence |
RGI_Emoji_Tag_Sequence |
RGI_Emoji_ZWJ_Sequence |
RGI_Emoji |
22.2.2.9.8 UnicodeMatchPropertyValue ( p, v )
The abstract operation UnicodeMatchPropertyValue takes arguments p (
Assert : p is a canonical, unaliasedUnicode property name listed in the “Canonical property name ” column ofTable 64 .Assert : v is a property value or property value alias for the Unicode property p listed inPropertyValueAliases.txt.- Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
- Return the
List of Unicode code points value.
Implementations must support the Unicode property values and property value aliases listed in PropertyValueAliases.txt for the properties listed in
For example, Xpeo and Old_Persian are valid Script_Extensions values, but xpeo and Old Persian aren't.
This algorithm differs from the matching rules for symbolic values listed in UAX44: case, Is prefix is not supported.
22.2.2.10 Runtime Semantics: CompileClassSetString
The
- Return an empty sequence of characters.
- Return CompileClassSetString of
NonEmptyClassString with argument regexpRecord.
- Let cs be
CompileToCharSet ofClassSetCharacter with argument regexpRecord. - Let s1 be the sequence of characters that is the single
CharSetElement of cs. - If
NonEmptyClassString is present, then- Let s2 be CompileClassSetString of
NonEmptyClassString with argument regexpRecord. - Return the concatenation of s1 and s2.
- Let s2 be CompileClassSetString of
- Return s1.
22.2.3 Abstract Operations for RegExp Creation
22.2.3.1 RegExpCreate ( pattern, flags )
The abstract operation RegExpCreate takes arguments pattern (an
- Let obj be !
RegExpAlloc (%RegExp% ). - Return ?
RegExpInitialize (obj, pattern, flags).
22.2.3.2 RegExpAlloc ( newTarget )
The abstract operation RegExpAlloc takes argument newTarget (a
- Let obj be ?
OrdinaryCreateFromConstructor (newTarget," , « [[OriginalSource]], [[OriginalFlags]], [[RegExpRecord]], [[RegExpMatcher]] »).%RegExp.prototype% " - Perform !
DefinePropertyOrThrow (obj,"lastIndex" , PropertyDescriptor { [[Writable]]:true , [[Enumerable]]:false , [[Configurable]]:false }). - Return obj.
22.2.3.3 RegExpInitialize ( obj, pattern, flags )
The abstract operation RegExpInitialize takes arguments obj (an Object), pattern (an
- If pattern is
undefined , set pattern to the empty String. - Else, set pattern to ?
ToString (pattern). - If flags is
undefined , set flags to the empty String. - Else, set flags to ?
ToString (flags). - If flags contains any code unit other than
"d" ,"g" ,"i" ,"m" ,"s" ,"u" ,"v" , or"y" , throw aSyntaxError exception. - If flags contains any code unit more than once, throw a
SyntaxError exception. - If flags contains
"i" , let i betrue ; else let i befalse . - If flags contains
"m" , let m betrue ; else let m befalse . - If flags contains
"s" , let s betrue ; else let s befalse . - If flags contains
"u" , let u betrue ; else let u befalse . - If flags contains
"v" , let v betrue ; else let v befalse . - If u is
true or v istrue , then- Let patternText be
StringToCodePoints (pattern).
- Let patternText be
- Else,
- Let patternText be the result of interpreting each of pattern's 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
- Let parseResult be
ParsePattern (patternText, u, v). - If parseResult is a non-empty
List ofSyntaxError objects, throw aSyntaxError exception. Assert : parseResult is aPattern Parse Node .- Set obj.[[OriginalSource]] to pattern.
- Set obj.[[OriginalFlags]] to flags.
- Let capturingGroupsCount be
CountLeftCapturingParensWithin (parseResult). - Let regexpRecord be the
RegExp Record { [[IgnoreCase]]: i, [[Multiline]]: m, [[DotAll]]: s, [[Unicode]]: u, [[UnicodeSets]]: v, [[CapturingGroupsCount]]: capturingGroupsCount }. - Set obj.[[RegExpRecord]] to regexpRecord.
- Set obj.[[RegExpMatcher]] to
CompilePattern of parseResult with argument regexpRecord. - Perform ?
.Set (obj,"lastIndex" ,+0 𝔽,true ) - Return obj.
22.2.3.4 Static Semantics: ParsePattern ( patternText, u, v )
The abstract operation ParsePattern takes arguments patternText (a sequence of Unicode code points), u (a Boolean), and v (a Boolean) and returns a
This section is amended in
It performs the following steps when called:
- If v is
true and u istrue , then- Let parseResult be a
List containing one or moreSyntaxError objects.
- Let parseResult be a
- Else if v is
true , then - Else if u is
true , then - Else,
- Return parseResult.
22.2.4 The RegExp Constructor
The RegExp
- is
%RegExp% . - is the initial value of the
"RegExp" property of theglobal object . - creates and initializes a new RegExp object when called as a
constructor . - when called as a function rather than as a
constructor , returns either a new RegExp object, or the argument itself if the only argument is a RegExp object. - may be used as the value of an
extendsclause of a class definition. Subclassconstructors that intend to inherit the specified RegExp behaviour must include asupercall to the RegExpconstructor to create and initialize subclass instances with the necessary internal slots.
22.2.4.1 RegExp ( patternOrRegexp, flags )
This function performs the following steps when called:
- Let patternIsRegExp be ?
IsRegExp (patternOrRegexp). - If NewTarget is
undefined , then- Let newTarget be the
active function object . - If patternIsRegExp is
true and flags isundefined , then
- Let newTarget be the
- Else,
- Let newTarget be NewTarget.
- If patternOrRegexp
is an Object and patternOrRegexp has a [[RegExpMatcher]] internal slot, then- Let patternSource be patternOrRegexp.[[OriginalSource]].
- If flags is
undefined , set flags to patternOrRegexp.[[OriginalFlags]].
- Else if patternIsRegExp is
true , then - Else,
- Let patternSource be patternOrRegexp.
- Let obj be ?
RegExpAlloc (newTarget). - Return ?
RegExpInitialize (obj, patternSource, flags).
If pattern is supplied using a
22.2.5 Properties of the RegExp Constructor
The RegExp
- has a [[Prototype]] internal slot whose value is
%Function.prototype% . - has the following properties:
22.2.5.1 RegExp.escape ( string )
This function returns a copy of string in which characters that are potentially special in a regular expression
It performs the following steps when called:
- If string
is not a String , throw aTypeError exception. - Let escaped be the empty String.
- Let codePointList be
StringToCodePoints (string). - For each code point codePoint of codePointList, do
- If escaped is the empty String and codePoint is matched by either
DecimalDigit orAsciiLetter , thenNOTE : Escaping a leading digit ensures that output corresponds with pattern text which may be used after a\0character escape or aDecimalEscape such as\1and still match string rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after\c.- Let numericValue be the numeric value of codePoint.
- Let hex be
Number::toString (𝔽 (numericValue), 16). Assert : The length of hex is 2.- Set escaped to the
string-concatenation of the code unit 0x005C (REVERSE SOLIDUS),"x" , and hex.
- Else,
- Set escaped to the
string-concatenation of escaped andEncodeForRegExpEscape (codePoint).
- Set escaped to the
- If escaped is the empty String and codePoint is matched by either
- Return escaped.
Despite having similar names, RegExp.escape do not perform similar actions. The former escapes a pattern for representation as a string, while this function escapes a string for representation inside a pattern.
22.2.5.1.1 EncodeForRegExpEscape ( codePoint )
The abstract operation EncodeForRegExpEscape takes argument codePoint (a code point) and returns a String. It returns a String representing a
- If codePoint is matched by
SyntaxCharacter or codePoint is U+002F (SOLIDUS), then- Return the
string-concatenation of 0x005C (REVERSE SOLIDUS) andUTF16EncodeCodePoint (codePoint).
- Return the
- If codePoint is a code point listed in the “Code Point” column of
Table 62 , then- Return the
string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the “ControlEscape” column of the row whose “Code Point” column contains codePoint.
- Return the
- Let otherPunctuators be the
string-concatenation of",-=<>#&!%:;@~'`" and the code unit 0x0022 (QUOTATION MARK). - Let toEscape be
StringToCodePoints (otherPunctuators). - If toEscape contains codePoint, codePoint is matched by either
WhiteSpace orLineTerminator , or codePoint has the same numeric value as aleading surrogate ortrailing surrogate , then- Let codePointNumber be the numeric value of codePoint.
- If codePointNumber ≤ 0xFF, then
- Let hex be
Number::toString (𝔽 (codePointNumber), 16). - Return the
string-concatenation of the code unit 0x005C (REVERSE SOLIDUS),"x" , andStringPad (hex, 2,"0" ,start ).
- Let hex be
- Let escaped be the empty String.
- Let codeUnits be
UTF16EncodeCodePoint (codePoint). - For each code unit codeUnit of codeUnits, do
- Set escaped to the
string-concatenation of escaped andUnicodeEscape (codeUnit).
- Set escaped to the
- Return escaped.
- Return
UTF16EncodeCodePoint (codePoint).
22.2.5.2 RegExp.prototype
The initial value of RegExp.prototype is the
This property has the attributes { [[Writable]]:
22.2.5.3 get RegExp [ %Symbol.species% ]
RegExp[%Symbol.species%] is an
- Return the
this value.
The value of the
RegExp prototype methods normally use their
22.2.6 Properties of the RegExp Prototype Object
The RegExp prototype object:
- is
%RegExp.prototype% . - is an
ordinary object . - is not a RegExp instance and does not have a [[RegExpMatcher]] internal slot or any of the other internal slots of RegExp instance objects.
- has a [[Prototype]] internal slot whose value is
%Object.prototype% .
The RegExp prototype object does not have a
22.2.6.1 RegExp.prototype.constructor
The initial value of RegExp.prototype.constructor is
22.2.6.2 RegExp.prototype.exec ( string )
This method searches string for an occurrence of the regular expression pattern and returns an Array containing the results of the match, or
It performs the following steps when called:
- Let regexp be the
this value. - Perform ?
RequireInternalSlot (regexp, [[RegExpMatcher]]). - Set string to ?
ToString (string). - Return ?
RegExpBuiltinExec (regexp, string).
22.2.6.3 get RegExp.prototype.dotAll
RegExp.prototype.dotAll is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0073 (LATIN SMALL LETTER S).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.4 get RegExp.prototype.flags
RegExp.prototype.flags is an
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Let codeUnits be a new empty
List . - Let hasIndices be
ToBoolean (?Get (regexp,"hasIndices" )). - If hasIndices is
true , append the code unit 0x0064 (LATIN SMALL LETTER D) to codeUnits. - Let global be
ToBoolean (?Get (regexp,"global" )). - If global is
true , append the code unit 0x0067 (LATIN SMALL LETTER G) to codeUnits. - Let ignoreCase be
ToBoolean (?Get (regexp,"ignoreCase" )). - If ignoreCase is
true , append the code unit 0x0069 (LATIN SMALL LETTER I) to codeUnits. - Let multiline be
ToBoolean (?Get (regexp,"multiline" )). - If multiline is
true , append the code unit 0x006D (LATIN SMALL LETTER M) to codeUnits. - Let dotAll be
ToBoolean (?Get (regexp,"dotAll" )). - If dotAll is
true , append the code unit 0x0073 (LATIN SMALL LETTER S) to codeUnits. - Let unicode be
ToBoolean (?Get (regexp,"unicode" )). - If unicode is
true , append the code unit 0x0075 (LATIN SMALL LETTER U) to codeUnits. - Let unicodeSets be
ToBoolean (?Get (regexp,"unicodeSets" )). - If unicodeSets is
true , append the code unit 0x0076 (LATIN SMALL LETTER V) to codeUnits. - Let sticky be
ToBoolean (?Get (regexp,"sticky" )). - If sticky is
true , append the code unit 0x0079 (LATIN SMALL LETTER Y) to codeUnits. - Return the String value whose code units are the elements of the
List codeUnits. If codeUnits has no elements, the empty String is returned.
22.2.6.4.1 RegExpHasFlag ( regexp, codeUnit )
The abstract operation RegExpHasFlag takes arguments regexp (an
- If regexp
is not an Object , throw aTypeError exception. - If regexp does not have an [[OriginalFlags]] internal slot, then
- If
SameValue (regexp,%RegExp.prototype% ) istrue , returnundefined . - Throw a
TypeError exception.
- If
- Let flags be regexp.[[OriginalFlags]].
- If flags contains codeUnit, return
true . - Return
false .
22.2.6.5 get RegExp.prototype.global
RegExp.prototype.global is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0067 (LATIN SMALL LETTER G).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.6 get RegExp.prototype.hasIndices
RegExp.prototype.hasIndices is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0064 (LATIN SMALL LETTER D).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.7 get RegExp.prototype.ignoreCase
RegExp.prototype.ignoreCase is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0069 (LATIN SMALL LETTER I).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.8 RegExp.prototype [ %Symbol.match% ] ( string )
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let flags be ?
ToString (?Get (regexp,"flags" )). - If flags does not contain
"g" , return ?RegExpExec (regexp, string). - If flags contains
"u" or flags contains"v" , let fullUnicode betrue ; else let fullUnicode befalse . - Perform ?
Set (regexp,"lastIndex" ,+0 𝔽,true ). - Let array be !
ArrayCreate (0). - Let matchCount be 0.
- Repeat,
- Let result be ?
RegExpExec (regexp, string). - If result is
null , then- If matchCount = 0, return
null . - Return array.
- If matchCount = 0, return
- Let matchString be ?
ToString (?Get (result,"0" )). - Perform !
CreateDataPropertyOrThrow (array, !ToString (𝔽 (matchCount)), matchString). - If matchString is the empty String, then
- Set matchCount to matchCount + 1.
- Let result be ?
The value of the
The
22.2.6.9 RegExp.prototype [ %Symbol.matchAll% ] ( string )
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let speciesCtor be ?
SpeciesConstructor (regexp,%RegExp% ). - Let flags be ?
ToString (?Get (regexp,"flags" )). - Let matcher be ?
Construct (speciesCtor, « regexp, flags »). - Let lastIndex be ?
ToLength (?Get (regexp,"lastIndex" )). - Perform ?
Set (matcher,"lastIndex" , lastIndex,true ). - If flags contains
"g" , let global betrue . - Else, let global be
false . - If flags contains
"u" or flags contains"v" , let fullUnicode betrue . - Else, let fullUnicode be
false . - Return
CreateRegExpStringIterator (matcher, string, global, fullUnicode).
The value of the
22.2.6.10 get RegExp.prototype.multiline
RegExp.prototype.multiline is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x006D (LATIN SMALL LETTER M).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.11 RegExp.prototype [ %Symbol.replace% ] ( string, replaceValue )
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let stringLength be the length of string.
- Let functionalReplace be
IsCallable (replaceValue). - If functionalReplace is
false , then- Set replaceValue to ?
ToString (replaceValue).
- Set replaceValue to ?
- Let flags be ?
ToString (?Get (regexp,"flags" )). - If flags contains
"g" , let global betrue ; else let global befalse . - If global is
true , then- Perform ?
Set (regexp,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Let results be a new empty
List . - Let done be
false . - Repeat, while done is
false ,- Let result be ?
RegExpExec (regexp, string). - If result is
null , then- Set done to
true .
- Set done to
- Else,
- Let result be ?
- Let accumulatedResult be the empty String.
- Let nextSourcePosition be 0.
- For each element result of results, do
- Let resultLength be ?
LengthOfArrayLike (result). - Let capturesCount be
max (resultLength - 1, 0). - Let matched be ?
ToString (?Get (result,"0" )). - Let matchLength be the length of matched.
- Let position be ?
ToIntegerOrInfinity (?Get (result,"index" )). - Set position to the result of
clamping position between 0 and stringLength. - Let captures be a new empty
List . - Let captureNumber be 1.
- Repeat, while captureNumber ≤ capturesCount,
- Let capture be ?
Get (result, !ToString (𝔽 (captureNumber))). - If capture is not
undefined , then- Set capture to ?
ToString (capture).
- Set capture to ?
- Append capture to captures.
NOTE : When captureNumber = 1, the preceding step puts the first element into captures (at index 0). More generally, the captureNumberth capture (the characters captured by the captureNumberth set of capturing parentheses) is at captures[captureNumber - 1].- Set captureNumber to captureNumber + 1.
- Let capture be ?
- Let namedCaptures be ?
Get (result,"groups" ). - If functionalReplace is
true , then- Let replacerArgs be the
list-concatenation of « matched », captures, and «𝔽 (position), string ». - If namedCaptures is not
undefined , then- Append namedCaptures to replacerArgs.
- Let replacementValue be ?
Call (replaceValue,undefined , replacerArgs). - Let replacementString be ?
ToString (replacementValue).
- Let replacerArgs be the
- Else,
- If namedCaptures is not
undefined , then- Set namedCaptures to ?
ToObject (namedCaptures).
- Set namedCaptures to ?
- Let replacementString be ?
GetSubstitution (matched, string, position, captures, namedCaptures, replaceValue).
- If namedCaptures is not
- If position ≥ nextSourcePosition, then
NOTE : position should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of regexp. In such cases, the corresponding substitution is ignored.- Set accumulatedResult to the
string-concatenation of accumulatedResult, thesubstring of string from nextSourcePosition to position, and replacementString. - Set nextSourcePosition to position + matchLength.
- Let resultLength be ?
- If nextSourcePosition ≥ stringLength, return accumulatedResult.
- Return the
string-concatenation of accumulatedResult and thesubstring of string from nextSourcePosition.
The value of the
22.2.6.12 RegExp.prototype [ %Symbol.search% ] ( string )
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let previousLastIndex be ?
Get (regexp,"lastIndex" ). - If previousLastIndex is not
+0 𝔽, then- Perform ?
Set (regexp,"lastIndex" ,+0 𝔽,true ).
- Perform ?
- Let result be ?
RegExpExec (regexp, string). - Let currentLastIndex be ?
Get (regexp,"lastIndex" ). - If
SameValue (currentLastIndex, previousLastIndex) isfalse , then- Perform ?
Set (regexp,"lastIndex" , previousLastIndex,true ).
- Perform ?
- If result is
null , return-1 𝔽. - Return ?
Get (result,"index" ).
The value of the
The
22.2.6.13 get RegExp.prototype.source
RegExp.prototype.source is an
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - If regexp does not have an [[OriginalSource]] internal slot, then
- If
SameValue (regexp,%RegExp.prototype% ) istrue , return"(?:)" . - Throw a
TypeError exception.
- If
Assert : regexp has an [[OriginalFlags]] internal slot.- Let source be regexp.[[OriginalSource]].
- Let flags be regexp.[[OriginalFlags]].
- Return
EscapeRegExpPattern (source, flags).
22.2.6.13.1 EscapeRegExpPattern ( pattern, flags )
The abstract operation EscapeRegExpPattern takes arguments pattern (a String) and flags (a String) and returns a String. It performs the following steps when called:
- If flags contains
"v" , then- Let patternSymbol be
Pattern .[+UnicodeMode, +UnicodeSetsMode]
- Let patternSymbol be
- Else if flags contains
"u" , then- Let patternSymbol be
Pattern .[+UnicodeMode, ~UnicodeSetsMode]
- Let patternSymbol be
- Else,
- Let patternSymbol be
Pattern .[~UnicodeMode, ~UnicodeSetsMode]
- Let patternSymbol be
- Let escapedPattern be a String in the form of a patternSymbol equivalent to pattern interpreted as UTF-16 encoded Unicode code points (
6.1.4 ), in which certain code points are escaped as described below. escapedPattern may or may not differ from pattern; however, theAbstract Closure that would result from evaluating escapedPattern as a patternSymbol must behave identically to theAbstract Closure given by the constructed object's [[RegExpMatcher]] internal slot. Multiple calls to this abstract operation using the same values for pattern and flags must produce identical results. - The code points
/or anyLineTerminator occurring in the pattern shall be escaped in escapedPattern as necessary to ensure that thestring-concatenation of"/" , escapedPattern,"/" , and flags can be parsed (in an appropriate lexical context) as aRegularExpressionLiteral that behaves identically to the constructed regular expression. For example, if pattern is"/" , then escapedPattern could be"\/" or"\u002F" , among other possibilities, but not"/" , because///followed by flags would be parsed as aSingleLineComment rather than aRegularExpressionLiteral . If pattern is the empty String, this specification can be met by letting escapedPattern be"(?:)" . - Return escapedPattern.
Despite having similar names, RegExp.escape and EscapeRegExpPattern do not perform similar actions. The former escapes a string for representation inside a pattern, while this function escapes a pattern for representation as a string.
22.2.6.14 RegExp.prototype [ %Symbol.split% ] ( string, limit )
This method returns an Array into which substrings of the result of converting string to a String have been stored. The substrings are determined by searching from left to right for matches of the
The /a*?/[Symbol.split]("ab") evaluates to the array ["a", "b"], while /a*/[Symbol.split]("ab") evaluates to the array ["","b"].)
If string is (or converts to) the empty String, the result depends on whether the regular expression can match the empty String. If it can, the result array contains no elements. Otherwise, the result array contains one element, which is the empty String.
If the regular expression contains capturing parentheses, then each time separator is matched the results (including any
/<(\/)?([^<>]+)>/[Symbol.split]("A<B>bold</B>and<CODE>coded</CODE>")
evaluates to the array
["A", undefined, "B", "bold", "/", "B", "and", undefined, "CODE", "coded", "/", "CODE", ""]
If limit is not
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let speciesCtor be ?
SpeciesConstructor (regexp,%RegExp% ). - Let flags be ?
ToString (?Get (regexp,"flags" )). - If flags contains
"u" or flags contains"v" , let unicodeMatching betrue . - Else, let unicodeMatching be
false . - If flags contains
"y" , let newFlags be flags. - Else, let newFlags be the
string-concatenation of flags and"y" . - Let splitter be ?
Construct (speciesCtor, « regexp, newFlags »). - Let array be !
ArrayCreate (0). - Let lengthA be 0.
- If limit is
undefined , let lim be 232 - 1; else let lim beℝ (?ToUint32 (limit)). - If lim = 0, return array.
- If string is the empty String, then
- Let matchResult be ?
RegExpExec (splitter, string). - If matchResult is not
null , return array. - Perform !
CreateDataPropertyOrThrow (array,"0" , string). - Return array.
- Let matchResult be ?
- Let size be the length of string.
- Let lastMatchEnd be 0.
- Let searchIndex be lastMatchEnd.
- Repeat, while searchIndex < size,
- Perform ?
Set (splitter,"lastIndex" ,𝔽 (searchIndex),true ). - Let matchResult be ?
RegExpExec (splitter, string). - If matchResult is
null , then- Set searchIndex to
AdvanceStringIndex (string, searchIndex, unicodeMatching).
- Set searchIndex to
- Else,
- Let matchEnd be
ℝ (?ToLength (?Get (splitter,"lastIndex" ))). - Set matchEnd to
min (matchEnd, size). - If matchEnd = lastMatchEnd, then
- Set searchIndex to
AdvanceStringIndex (string, searchIndex, unicodeMatching).
- Set searchIndex to
- Else,
- Let substring be the
substring of string from lastMatchEnd to searchIndex. - Perform !
CreateDataPropertyOrThrow (array, !ToString (𝔽 (lengthA)), substring). - Set lengthA to lengthA + 1.
- If lengthA = lim, return array.
- Set lastMatchEnd to matchEnd.
- Let numberOfCaptures be ?
LengthOfArrayLike (matchResult). - Set numberOfCaptures to
max (numberOfCaptures - 1, 0). - Let captureIndex be 1.
- Repeat, while captureIndex ≤ numberOfCaptures,
- Set searchIndex to lastMatchEnd.
- Let substring be the
- Let matchEnd be
- Perform ?
- Let substring be the
substring of string from lastMatchEnd to size. - Perform !
CreateDataPropertyOrThrow (array, !ToString (𝔽 (lengthA)), substring). - Return array.
The value of the
This method ignores the value of the
22.2.6.15 get RegExp.prototype.sticky
RegExp.prototype.sticky is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0079 (LATIN SMALL LETTER Y).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.16 RegExp.prototype.test ( string )
This method performs the following steps when called:
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Set string to ?
ToString (string). - Let match be ?
RegExpExec (regexp, string). - If match is
null , returnfalse . - Return
true .
22.2.6.17 RegExp.prototype.toString ( )
- Let regexp be the
this value. - If regexp
is not an Object , throw aTypeError exception. - Let pattern be ?
ToString (?Get (regexp,"source" )). - Let flags be ?
ToString (?Get (regexp,"flags" )). - Let result be the
string-concatenation of"/" , pattern,"/" , and flags. - Return result.
The returned String has the form of a
22.2.6.18 get RegExp.prototype.unicode
RegExp.prototype.unicode is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0075 (LATIN SMALL LETTER U).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.6.19 get RegExp.prototype.unicodeSets
RegExp.prototype.unicodeSets is an
- Let regexp be the
this value. - Let codeUnit be the code unit 0x0076 (LATIN SMALL LETTER V).
- Return ?
RegExpHasFlag (regexp, codeUnit).
22.2.7 Abstract Operations for RegExp Matching
22.2.7.1 RegExpExec ( regexp, string )
The abstract operation RegExpExec takes arguments regexp (an Object) and string (a String) and returns either a
- Let exec be ?
Get (regexp,"exec" ). - If
IsCallable (exec) istrue , then- Let result be ?
Call (exec, regexp, « string »). - If result
is not an Object and result is notnull , throw aTypeError exception. - Return result.
- Let result be ?
- Perform ?
RequireInternalSlot (regexp, [[RegExpMatcher]]). - Return ?
RegExpBuiltinExec (regexp, string).
If a callable
22.2.7.2 RegExpBuiltinExec ( regexp, string )
The abstract operation RegExpBuiltinExec takes arguments regexp (an initialized RegExp instance) and string (a String) and returns either a
- Let length be the length of string.
- Let lastIndex be
ℝ (?ToLength (!Get (regexp,"lastIndex" ))). - Let flags be regexp.[[OriginalFlags]].
- If flags contains
"g" , let global betrue ; else let global befalse . - If flags contains
"y" , let sticky betrue ; else let sticky befalse . - If flags contains
"d" , let hasIndices betrue ; else let hasIndices befalse . - If global is
false and sticky isfalse , set lastIndex to 0. - Let matcher be regexp.[[RegExpMatcher]].
- If flags contains
"u" or flags contains"v" , let fullUnicode betrue ; else let fullUnicode befalse . - Let matchSucceeded be
false . - If fullUnicode is
true , let input beStringToCodePoints (string); else let input be aList whose elements are the code units that are the elements of string. NOTE : Each element of input is considered to be a character.- Repeat, while matchSucceeded is
false ,- If lastIndex > length, then
- If global is
true or sticky istrue , then- Perform ?
.Set (regexp,"lastIndex" ,+0 𝔽,true )
- Perform ?
- Return
null .
- If global is
- Let inputIndex be the index into input of the character that was obtained from element lastIndex of string.
- Let result be matcher(input, inputIndex).
- If result is
failure , then- If sticky is
true , then- Perform ?
.Set (regexp,"lastIndex" ,+0 𝔽,true ) - Return
null .
- Perform ?
- Set lastIndex to
AdvanceStringIndex (string, lastIndex, fullUnicode).
- If sticky is
- Else,
Assert : result is aMatchState .- Set matchSucceeded to
true .
- If lastIndex > length, then
- Let endIndex be result.[[EndIndex]].
- If fullUnicode is
true , set endIndex toGetStringIndex (string, endIndex). - If global is
true or sticky istrue , then - Let capturingGroupsCount be the number of elements in result.[[Captures]].
Assert : capturingGroupsCount = regexp.[[RegExpRecord]].[[CapturingGroupsCount]].Assert : capturingGroupsCount < 232 - 1.- Let array be !
ArrayCreate (capturingGroupsCount + 1). Assert : Themathematical value of array's"length" property is capturingGroupsCount + 1.- Perform !
CreateDataPropertyOrThrow (array,"index" ,𝔽 (lastIndex)). - Perform !
CreateDataPropertyOrThrow (array,"input" , string). - Let match be the
Match Record { [[StartIndex]]: lastIndex, [[EndIndex]]: endIndex }. - Let indices be a new empty
List . - Let groupNames be a new empty
List . - Append match to indices.
- Let matchedSubstring be
GetMatchString (string, match). - Perform !
CreateDataPropertyOrThrow (array,"0" , matchedSubstring). - If regexp contains any
GroupName , then- Let groups be
OrdinaryObjectCreate (null ). - Let hasGroups be
true .
- Let groups be
- Else,
- Let groups be
undefined . - Let hasGroups be
false .
- Let groups be
- Perform !
CreateDataPropertyOrThrow (array,"groups" , groups). - Let matchedGroupNames be a new empty
List . - For each
integer i such that 1 ≤ i ≤ capturingGroupsCount, in ascending order, do- Let capture be ith element of result.[[Captures]].
- If capture is
undefined , then- Let capturedValue be
undefined . - Append
undefined to indices.
- Let capturedValue be
- Else,
- Let captureStart be capture.[[StartIndex]].
- Let captureEnd be capture.[[EndIndex]].
- If fullUnicode is
true , then- Set captureStart to
GetStringIndex (string, captureStart). - Set captureEnd to
GetStringIndex (string, captureEnd).
- Set captureStart to
- Let captureRecord be the
Match Record { [[StartIndex]]: captureStart, [[EndIndex]]: captureEnd }. - Let capturedValue be
GetMatchString (string, captureRecord). - Append captureRecord to indices.
- Perform !
CreateDataPropertyOrThrow (array, !ToString (𝔽 (i)), capturedValue). - If the ith capture of regexp was defined with a
GroupName , then- Let groupName be the
CapturingGroupName of thatGroupName . - If matchedGroupNames contains groupName, then
Assert : capturedValue isundefined .- Append
undefined to groupNames.
- Else,
- If capturedValue is not
undefined , append groupName to matchedGroupNames. NOTE : If there are multiple groups named groupName, groups may already have an groupName property at this point. However, because groups is anordinary object whose properties are all writabledata properties , the call toCreateDataPropertyOrThrow is nevertheless guaranteed to succeed.- Perform !
CreateDataPropertyOrThrow (groups, groupName, capturedValue). - Append groupName to groupNames.
- If capturedValue is not
- Let groupName be the
- Else,
- Append
undefined to groupNames.
- Append
- If hasIndices is
true , then- Let indicesArray be
MakeMatchIndicesIndexPairArray (string, indices, groupNames, hasGroups). - Perform !
CreateDataPropertyOrThrow (array,"indices" , indicesArray).
- Let indicesArray be
- Return array.
22.2.7.3 AdvanceStringIndex ( string, index, unicode )
The abstract operation AdvanceStringIndex takes arguments string (a String), index (a non-negative
Assert : index ≤ 253 - 1.- If unicode is
false , return index + 1. - Let length be the length of string.
- If index + 1 ≥ length, return index + 1.
- Let codePoint be
CodePointAt (string, index). - Return index + codePoint.[[CodeUnitCount]].
22.2.7.4 GetStringIndex ( string, codePointIndex )
The abstract operation GetStringIndex takes arguments string (a String) and codePointIndex (a non-negative
- If string is the empty String, return 0.
- Let length be the length of string.
- Let codeUnitCount be 0.
- Let codePointCount be 0.
- Repeat, while codeUnitCount < length,
- If codePointCount = codePointIndex, return codeUnitCount.
- Let codePoint be
CodePointAt (string, codeUnitCount). - Set codeUnitCount to codeUnitCount + codePoint.[[CodeUnitCount]].
- Set codePointCount to codePointCount + 1.
- Return length.
22.2.7.5 Match Records
A Match Record is a
Match Records have the fields listed in
| Field Name | Value | Meaning |
|---|---|---|
| [[StartIndex]] | a non-negative |
The number of code units from the start of a string at which the match begins (inclusive). |
| [[EndIndex]] | an |
The number of code units from the start of a string at which the match ends (exclusive). |
22.2.7.6 GetMatchString ( string, match )
The abstract operation GetMatchString takes arguments string (a String) and match (a
22.2.7.7 GetMatchIndexPair ( string, match )
The abstract operation GetMatchIndexPair takes arguments string (a String) and match (a
Assert : match.[[StartIndex]] ≤ match.[[EndIndex]] ≤ the length of string.- Return
CreateArrayFromList («𝔽 (match.[[StartIndex]]),𝔽 (match.[[EndIndex]]) »).
22.2.7.8 MakeMatchIndicesIndexPairArray ( string, indices, groupNames, hasGroups )
The abstract operation MakeMatchIndicesIndexPairArray takes arguments string (a String), indices (a
- Let n be the number of elements in indices.
Assert : n < 232 - 1.Assert : groupNames has n - 1 elements.NOTE : The groupNamesList contains elements aligned with the indicesList starting at indices[1].- Let array be !
ArrayCreate (n). - If hasGroups is
true , then- Let groups be
OrdinaryObjectCreate (null ).
- Let groups be
- Else,
- Let groups be
undefined .
- Let groups be
- Perform !
CreateDataPropertyOrThrow (array,"groups" , groups). - For each
integer i such that 0 ≤ i < n, in ascending order, do- Let matchIndices be indices[i].
- If matchIndices is not
undefined , then- Let matchIndexPair be
GetMatchIndexPair (string, matchIndices).
- Let matchIndexPair be
- Else,
- Let matchIndexPair be
undefined .
- Let matchIndexPair be
- Perform !
CreateDataPropertyOrThrow (array, !ToString (𝔽 (i)), matchIndexPair). - If i > 0, then
- Let name be groupNames[i - 1].
- If name is not
undefined , thenAssert : groups is notundefined .NOTE : If there are multiple groups named name, groups may already have an name property at this point. However, because groups is anordinary object whose properties are all writabledata properties , the call toCreateDataPropertyOrThrow is nevertheless guaranteed to succeed.- Perform !
CreateDataPropertyOrThrow (groups, name, matchIndexPair).
- Return array.
22.2.8 Properties of RegExp Instances
RegExp instances are
Prior to ECMAScript 2015, RegExp instances were specified as having the own RegExp.prototype.
RegExp instances also have the following property:
22.2.8.1 lastIndex
The value of the
22.2.9 RegExp String Iterator Objects
A RegExp String Iterator is an object that represents a specific iteration over some specific String instance object, matching against some specific RegExp instance object. There is not a named
22.2.9.1 CreateRegExpStringIterator ( regexp, string, global, fullUnicode )
The abstract operation CreateRegExpStringIterator takes arguments regexp (an Object), string (a String), global (a Boolean), and fullUnicode (a Boolean) and returns an Object. It performs the following steps when called:
- Let iterator be
OrdinaryObjectCreate (%RegExpStringIteratorPrototype% , « [[IteratingRegExp]], [[IteratedString]], [[Global]], [[Unicode]], [[Done]] »). - Set iterator.[[IteratingRegExp]] to regexp.
- Set iterator.[[IteratedString]] to string.
- Set iterator.[[Global]] to global.
- Set iterator.[[Unicode]] to fullUnicode.
- Set iterator.[[Done]] to
false . - Return iterator.
22.2.9.2 The %RegExpStringIteratorPrototype% Object
The
- has properties that are inherited by all
RegExp String Iterator objects . - is an
ordinary object . - has a [[Prototype]] internal slot whose value is
%Iterator.prototype% . - has the following properties:
22.2.9.2.1 %RegExpStringIteratorPrototype% .next ( )
- Let iteratorObj be the
this value. - If iteratorObj
is not an Object , throw aTypeError exception. - If iteratorObj does not have all of the internal slots of a
RegExp String Iterator Object Instance (see22.2.9.3 ), throw aTypeError exception. - If iteratorObj.[[Done]] is
true , then- Return
CreateIteratorResultObject (undefined ,true ).
- Return
- Let regexp be iteratorObj.[[IteratingRegExp]].
- Let string be iteratorObj.[[IteratedString]].
- Let global be iteratorObj.[[Global]].
- Let fullUnicode be iteratorObj.[[Unicode]].
- Let match be ?
RegExpExec (regexp, string). - If match is
null , then- Set iteratorObj.[[Done]] to
true . - Return
CreateIteratorResultObject (undefined ,true ).
- Set iteratorObj.[[Done]] to
- If global is
false , then- Set iteratorObj.[[Done]] to
true . - Return
CreateIteratorResultObject (match,false ).
- Set iteratorObj.[[Done]] to
- Let matchString be ?
ToString (?Get (match,"0" )). - If matchString is the empty String, then
- Return
CreateIteratorResultObject (match,false ).
22.2.9.2.2 %RegExpStringIteratorPrototype% [ %Symbol.toStringTag% ]
The initial value of the
This property has the attributes { [[Writable]]:
22.2.9.3 Properties of RegExp String Iterator Instances
| Internal Slot | Type | Description |
|---|---|---|
| [[IteratingRegExp]] | an Object | The regular expression used for iteration. |
| [[IteratedString]] | a String | The String value being iterated upon. |
| [[Global]] | a Boolean | Indicates whether the [[IteratingRegExp]] is global or not. |
| [[Unicode]] | a Boolean | Indicates whether the [[IteratingRegExp]] is in Unicode mode or not. |
| [[Done]] | a Boolean | Indicates whether the iteration is complete or not. |