SAP REGEX PCRE - Syntax SPECIALS



Get Example source ABAP code based on a different SAP table
  



ABAP_PCRE_REGEX - Special Characters
The following tables summarize the special characters in PCRE regular expressions.
ITOC



Latest notes:

See also PCRE2 documentation <(>pcre2syntax man page<)>.
NON_V5_HINTS
ABAP_HINT_END

Pattern Syntax

Quoting SyntaxDescription
xhandle x as a literal if x has no special meaning
Q... E,handle enclosed characters as literal

Escaped Characters SyntaxDescription
aalarm (BEL character, 0x07)
cxcontrol-x, where x is any ASCII printing character
eescape (0x1B)
fform feed (0x0C)
nline feed (by default 0x0A; depends on the active line-feed-mode)
rcarriage return (0x0D)
ttab (0x09)
0ddcharacter with octal code 0dd
dddcharacter with octal code ddd, or backreference
o{ddd..}character with octal code ddd..
N{U+hh..}character with Unicode code point hh..
xhhcharacter with hex code hh
x{hh..}character with hex code hh..

Character Types SyntaxDescription
.any character except line feed (unless in <(>dotall<)> mode, then any character)
Cone code unit; only allowed in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED, as it could partially match UTF-16 characters otherwise
da digit (respecting Unicode character properties)
Da character that is not a digit
ha horizontal white space character (respecting Unicode character properties)
Ha character that is not a horizontal white space character
Na character that is not a line feed (depends on the active line-feed-mode)
p{xx}a character with the xx Unicode character property (see below)
P{xx}a character without the xx Unicode character property (see below)
Ra line feed sequence; by default matches any Unicode line feed sequence
sa white space character (respecting Unicode character properties)
Sa character that is not a white space character
va vertical white space character (respecting Unicode character properties)
Va character that is not a vertical white space character
wa word character (respecting Unicode character properties)
Wa non-word character
Xa Unicode extended grapheme cluster
General Categories for Properties p and P
Based on the general categories as defined by the Unicode standard. Category IdentifierDescription
COther
CcControl
CfFormat
CnUnassigned
CoPrivate use
CsSurrogate
LLetter
LlLower case letter
LmModifier letter
LoOther letter
LtTitle case letter
LuUpper case letter
L Ll, Lu, or Lt
MMark
McSpacing mark
MeEnclosing mark
MnNon-spacing mark
NNumber
NdDecimal number
NlLetter number
NoOther number
PPunctuation
PcConnector punctuation
PdDash punctuation
PeClose punctuation
PfFinal punctuation
PiInitial punctuation
PoOther punctuation
PsOpen punctuation
SSymbol
ScCurrency symbol
SkModifier symbol
SmMathematical symbol
SoOther symbol
ZSeparator
ZlLine separator
ZpParagraph separator
ZsSpace separator
Script Names for p and P
Adlam
Ahom
Anatolian_Hieroglyphs
Arabic
Armenian
Avestan
Balinese
Bamum
Bassa_Vah
Batak
Bengali
Bhaiksuki
Bopomofo
Brahmi
Braille
Buginese
Buhid
Canadian_Aboriginal
Carian
Caucasian_Albanian
Chakma
Cham
Cherokee
Chorasmian
Common
Coptic
Cuneiform
Cypriot
Cyrillic
Deseret
Devanagari
Dives_Akuru
Dogra
Duployan
Egyptian_Hieroglyphs
Elbasan
Elymaic
Ethiopic
Georgian
Glagolitic
Gothic
Grantha
Greek
Gujarati
Gunjala_Gondi
Gurmukhi
Han
Hangul
Hanifi_Rohingya
Hanunoo
Hatran
Hebrew
Hiragana
Imperial_Aramaic
Inherited
Inscriptional_Pahlavi
Inscriptional_Parthian
Javanese
Kaithi
Kannada
Katakana
Kayah_Li
Kharoshthi
Khitan_Small_Script
Khmer
Khojki
Khudawadi
Lao
Latin
Lepcha
Limbu
Linear_A
Linear_B
Lisu
Lycian
Lydian
Mahajani
Makasar
Malayalam
Mandaic
Manichaean
Marchen
Masaram_Gondi
Medefaidrin
Meetei_Mayek
Mende_Kikakui
Meroitic_Cursive
Meroitic_Hieroglyphs
Miao
Modi
Mongolian
Mro
Multani
Myanmar
Nabataean
Nandinagari
New_Tai_Lue
Newa
Nko
Nushu
Nyakeng_Puachue_Hmong
Ogham
Ol_Chiki
Old_Hungarian
Old_Italic
Old_North_Arabian
Old_Permic
Old_Persian
Old_Sogdian
Old_South_Arabian
Old_Turkic
Oriya
Osage
Osmanya
Pahawh_Hmong
Palmyrene
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Rejang
Runic
Samaritan
Saurashtra
Sharada
Shavian
Siddham
SignWriting
Sinhala
Sogdian
Sora_Sompeng
Soyombo
Sundanese
Syloti_Nagri
Syriac
Tagalog
Tagbanwa
Tai_Le
Tai_Tham
Tai_Viet
Takri
Tamil
Tangut
Telugu
Thaana
Thai
Tibetan
Tifinagh
Tirhuta
Ugaritic
Vai
Wancho
Warang_Citi
Yezidi
Yi
Zanabazar_Square

Character Classes SyntaxDescription
[...]positive character class
[^...]negative character class
[x-y]range
[[:xxx:]]positive POSIX named set (see below)
[[:^xxx:]]negative POSIX named set (see below)
Names for POSIX Named Sets SyntaxDescription
alnumalphanumeric
alphaalphabetic
ascii0-127
blankspace or tab
cntrlcontrol character
digitdecimal digit
graphprinting, excluding space
lowerlower case letter
printprinting, including space
punctprinting, excluding alphanumeric
spacewhite space
upperupper case letter
wordsame as w
xdigithexadecimal digit
POSIX named sets also make use of Unicode character properties if applicable.

Quantifiers SyntaxDescription
?0 or 1, greedy
?+0 or 1, possessive
??0 or 1, lazy
*0 or more, greedy
*+0 or more, possessive
*?0 or more, lazy
+1 or more, greedy
++1 or more, possessive
+?1 or more, lazy
{n}exactly n
{n,m}at least n, no more than m, greedy
{n,m}+at least n, no more than m, possessive
{n,m}?at least n, no more than m, lazy
{n,}n or more, greedy
{n,}+n or more, possessive
{n,}?n or more, lazy

Anchors and Basic Assertions SyntaxDescription
bword boundary
Bnot a word boundary
^start of subject (also after an internal line feed, that is a line feed that does not occur at the end of the subject, in multiline mode)
Astart of subject (if matching on a subject is done with a starting offset<(>,<)> A can never match)
$end of subject and before a line feed at the end of the subject (also before line feed in multiline mode)
Zend of subject and before a line feed at the end of the subject
zend of subject
Gfirst matching position in subject (true if the current matching position is at the start point of the matching process, which may differ from the start of the subject e.g. if a starting offset is specified)

Reported Match Point Setting SyntaxDescription
Kset reported start of match; e.g. the regex foo Kbar matches foobar but reports that it has matched only bar
K is respected in positive assertions, but ignored in negative ones.

Alternation SyntaxDescription
|start of alternative branch

Grouping and Capturing SyntaxDescription
(...)capture group
(?<(><<)>name>...)named capture group (Perl style)
(?'name'...)named capture group (Perl style)
(?P?<(><<)>name>...)named capture group (Python style)
(?:...)non-capture group
(?|...)non-capture group; reset group numbers for capture groups in each alternative
(?>...)atomic non-capture group
(*atomic:...)atomic non-capture group
A name must not start with a digit. Unicode names are allowed.

Comments SyntaxDescription
(?#...)comment (cannot be nested)
#...extended mode: comment
In extended mode, an unescaped # introduces a comment which in this case continues to immediately after the next line feed character or character sequence in the pattern. This has to be a literal line feed character or character sequence, escape sequences that happen to represent a line feed like n do not count.

Option Setting SyntaxDescription
(?i)caseless / case-insensitive search
(?J)allow duplicate named groups
(?m)multiline mode
(?n)no auto capture
(?s)single line mode (<(>dotall<)>)
(?U)default ungreedy quantifiers (lazy)
(?x)extended mode: ignore white space except in classes
(?xx)same as (?x) but also ignore space and tab in classes
(?-...)unset option(s)
(?^)unset i, m, n, s and x options
Changes of these options within a group are automatically cancelled at the end of the group.
Several options may be set at once, and a mixture of setting and unsetting such as (?i-x) is allowed, but there may be only one hyphen. Setting (but no unsetting) is allowed after (?^ for example (?^in). An option setting may appear at the start of a non-capture group, for example (?i:...).

Special Control Verbs
Special control verbs are only recognized at the very start of a pattern. SyntaxDescription
(*LIMIT_DEPTH=d)set the backtracking limit to d
(*LIMIT_HEAP=d)set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d)set the match limit to d
(*NOTEMPTY)lock out matching of empty strings entirely
(*NOTEMPTY_ATSTART)lock out matching of empty strings at the start of the subject
(*NO_AUTO_POSSES)prevents quantifiers from automatically being made possessive when what follows cannot match the repeated item (e.g. by default, a+b is handled as a++b as an optimization)
(*NO_DOTSTAR_ANCHOR)disable optimizations that apply to patterns whose top-level branches start with .*
(*NO_JIT)do not JIT-compile this pattern
(*NO_START_OPT)disable several optimizations for quickly reaching a no match result; this can be useful if you want callouts or backtracking control verbs to be executed in any case
(*UTF)enable UTF-mode; this verb cannot be used in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED , as it would clash with usages of C
(*UCP)enable usage of Unicode character properties; for ABAP regular expressions this option is already enabled by default
The following special control verbs control the line break convention, i.e. what gets recognized as a line feed character. They do not affect R : SyntaxDescription
(*CR)carriage return only
(*LF)line feed only
(*CRLF)carriage return followed by line feed
(*ANYCRLF)all three of the above
(*ANY)any Unicode line feed sequence
(*NUL)the NUL character (binary zero)
The following special control verbs control what R matches: SyntaxDescription
(*BSR_ANYCRLF)CR, LF and CRLF
(*BSR_UNICODE)any Unicode line feed sequence

Look-ahead and Look-behind Assertions SyntaxDescription
(?=...)positive look-ahead
(*pla:...)positive look-ahead
(*positive_look-ahead:...)positive look-ahead
(?!...)negative look-ahead
(*nla:...)negative look-ahead
(*negative_look-ahead:...)negative look-ahead
(?<(><<)>=...)positive look-behind
(*plb:...)positive look-behind
(*positive_look-behind:...)positive look-behind
(?<(><<)>!...)negative look-behind
(*nlb:...)negative look-behind
(*negative_look-behind:...)negative look-behind
Each top-level branch of a look-behind must be of fixed length.

Non-Atomic Look-around Assertions SyntaxDescription
(?*...)non-atomic positive look-ahead
(*napla:...)non-atomic positive look-ahead
(*non_atomic_positive_look-ahead:...)non-atomic positive look-ahead
(?<(><<)>*...)non-atomic positive look-behind
(*naplb:...)non-atomic positive look-behind
(*non_atomic_positive_look-behind:...)non-atomic positive look-behind

Backreferences SyntaxDescription
nreference by number n (can be ambiguous, see octal escapes)
gnreference by number n
g{n}reference by number n
g+nrelative reference by number n
g-nrelative reference by number n
g{+n}relative reference by number n
g{-n}relative reference by number n
k?<(><<)>name>reference by name (Perl style)
k'name'reference by name (Perl style)
g{name}reference by name (Perl style)
k{name}reference by name (.NET style)
(?P=name)reference by name (Perl style)

Subroutine References SyntaxDescription
(?R)recurse whole pattern
(?n)call subroutine by absolute number n
g<(><<)>n>call subroutine by absolute number n ( Oniguruma style)
g'n'call subroutine by absolute number n (Oniguruma style)
(?+n)call subroutine by relative number n
(?-n)call subroutine by relative number n
g<(><<)>+n>call subroutine by relative number n
g'+n'call subroutine by relative number n
g<(><<)>-n>call subroutine by relative number n
g'-n'call subroutine by relative number n
(?<(> <)>name)call subroutine by name (Perl style)
(?P>name)call subroutine by name (Python style)
g?<(><<)>name>call subroutine by name (Oniguruma style)
g'name'call subroutine by name (Oniguruma style)
Subroutine calls can be recursive. Left recursion is not possible however.

ABAP_EXAMPLE_ABEXA
Parsing with PCRE Regular Expression
ABAP_EXAMPLE_END

Conditional Patterns SyntaxDescription
(?(condition)yes-pattern)match the yes-pattern if the condition holds
(?(condition)yes-pattern|no-pattern)match the yes-pattern if the condition holds, otherwise match the false-pattern
Where condition can be one of the following or any other assertion like a look-ahead or look-behind assertion: SyntaxDescription
nabsolute number n reference condition
+nrelative number n reference condition
-nrelative number n reference condition
?<(><<)>name>named reference condition (Perl style)
'name'named reference condition (Perl style)
Roverall recursion condition
Rnspecific number n group recursion condition
R<(> <)>namespecific named group recursion condition
DEFINEdefine groups for reference (always yields false)
VERSION[>]=n.mtest PCRE2 version

Backtracking Control
The following backtracking control verbs are triggered immediately when they are reached: SyntaxDescription
(*ACCEPT)force successful match; lbr if triggered inside a group that is called as a subroutine, only the group is ended successfully; lbr if triggered inside a positive assertion, the assertion succeeds; lbrif triggered inside a negative assertion, the assertion fails
(*FAIL)force backtrack
(*F)force backtrack
(?!)force backtrack; actually a negative look-ahead for the empty string, which always matches thus always failing the look-ahead; (*FAIL) and (*F) are synonyms for (?!)
(*MARK:NAME)mark a position with name NAME, see (*SKIP:NAME) below; synonym (:NAME); NAME can contain any sequence of characters that does not include the closing parenthesis; an empty name will cause the mark to have no effect
The following backtracking control verbs are only triggered when a subsequent match failure causes a backtrack to reach them. All of them force a match failure, but differ in what happens afterwards: SyntaxDescription
(*COMMIT)overall failure, no advance of starting point
(*PRUNE)advance to next starting character; only applies if the pattern is not anchored, otherwise behaves the same
(*SKIP)advance to current matching position
(*SKIP:NAME)advance to position corresponding to an earlier (*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN)local failure, backtrack to next alternation
If one of these verbs is located in a group called as a subroutine, its effects are confined to the subroutine call.

Callouts SyntaxDescription
(?C)callout (assumed number 0)
(?Cn)callout with numeric data n
(?C'text')callout with string data text

ABAP_EXAMPLE_ABEXA
PCRE Regular Expression with Callouts
ABAP_EXAMPLE_END


Replacement Syntax

Capture Group Substitution SyntaxDescription
$idsubstitute for the content of the capture group identified by id, $0 being the content of the whole match; id can either be a number referring to a capture group or the name of a named capture group
dollar{id}substitute for the content of the capture group identified by id, dollar{0} being the content of the whole match; id can either be a number referring to a capture group or the name of a named capture group
When referring to a capture group that is not set (i.e. was not participating) in the match, the empty string will be substituted.
When referring to a capture group that does not exist, it is assumed unset and thus the empty string is substituted.

Conditional Substitution SyntaxDescription
{id:+matched:unmatched}substitute for matched if the capture group identified by id was set in the match, otherwise for unmatched; id can either be a number referring to a capture group or the name of a named capture group
{id:-default}substitute for the content of the capture group identified by id if said capture group was set in the match, otherwise for default; shorthand for {id:+ dollar{id}:default}; id can either be a number referring to a capture group or the name of a named capture group
When referring to a capture group that is not set (i.e. was not participating) in the match, the empty string will be substituted.
When referring to a capture group that does not exist, it is assumed unset and thus the empty string is substituted.

Case Conversion SyntaxDescription
uthe first character after u that is inserted into the replacement text is converted to uppercase
Uall characters after U up to the next L or E that are inserted into the replacement text are converted to uppercase
lthe first character after l that is inserted into the replacement text is converted to lowercase
Lall characters after L up to the next U or E that are inserted into the replacement text are converted to lowercase
Eterminates the current upper- or lowercase transformation

Substituting Special Characters SyntaxDescription
tinsert a tab (0x09)
rinsert a carriage return (0x0D)
ninsert a line feed
finsert a form feed (0x0C)
xhhinsert the character with hex code hh
x{hh..}insert the character with hex code hh..

Quoting SyntaxDescription
insert a literal
$insert a literal $
xif x has no special meaning, insert a literal x