SAP REGEX PCRE - Syntax SPECIALS

Get Example source ABAP code based on a different SAP table

ABAP_PCRE_REGEX - Special Characters
The following tables summarize the special characters in PCRE regular expressions.
ITOC

Latest notes:
See also PCRE2 documentation <(>pcre2syntax man page<)>.
NON_V5_HINTS
ABAP_HINT_END

Pattern Syntax

Quoting SyntaxDescription
xhandle x as a literal if x has no special meaning
Q... E,handle enclosed characters as literal

Escaped Characters SyntaxDescription
aalarm (BEL character, 0x07)
cxcontrol-x, where x is any ASCII printing character
eescape (0x1B)
fform feed (0x0C)
nline feed (by default 0x0A; depends on the active line-feed-mode)
rcarriage return (0x0D)
ttab (0x09)
0ddcharacter with octal code 0dd
dddcharacter with octal code ddd, or backreference
o{ddd..}character with octal code ddd..
N{U+hh..}character with Unicode code point hh..
xhhcharacter with hex code hh
x{hh..}character with hex code hh..

Character Types SyntaxDescription
.any character except line feed (unless in <(>dotall<)> mode, then any character)
Cone code unit; only allowed in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED, as it could partially match UTF-16 characters otherwise
da digit (respecting Unicode character properties)
Da character that is not a digit
ha horizontal white space character (respecting Unicode character properties)
Ha character that is not a horizontal white space character
Na character that is not a line feed (depends on the active line-feed-mode)
p{xx}a character with the xx Unicode character property (see below)
P{xx}a character without the xx Unicode character property (see below)
Ra line feed sequence; by default matches any Unicode line feed sequence
sa white space character (respecting Unicode character properties)
Sa character that is not a white space character
va vertical white space character (respecting Unicode character properties)
Va character that is not a vertical white space character
wa word character (respecting Unicode character properties)
Wa non-word character
Xa Unicode extended grapheme cluster
General Categories for Properties p and P
Based on the general categories as defined by the Unicode standard. Category IdentifierDescription
COther
CcControl
CfFormat
CnUnassigned
CoPrivate use
CsSurrogate
LLetter
LlLower case letter
LmModifier letter
LoOther letter
LtTitle case letter
LuUpper case letter
L Ll, Lu, or Lt
MMark
McSpacing mark
MeEnclosing mark
MnNon-spacing mark
NNumber
NdDecimal number
NlLetter number
NoOther number
PPunctuation
PcConnector punctuation
PdDash punctuation
PeClose punctuation
PfFinal punctuation
PiInitial punctuation
PoOther punctuation
PsOpen punctuation
SSymbol
ScCurrency symbol
SkModifier symbol
SmMathematical symbol
SoOther symbol
ZSeparator
ZlLine separator
ZpParagraph separator
ZsSpace separator
Script Names for p and P

Adlam

Ahom

Anatolian_Hieroglyphs

Arabic

Armenian

Avestan

Balinese

Bamum

Bassa_Vah

Batak

Bengali

Bhaiksuki

Bopomofo

Brahmi

Braille

Buginese

Buhid

Canadian_Aboriginal

Carian

Caucasian_Albanian

Chakma

Cham

Cherokee

Chorasmian

Common

Coptic

Cuneiform

Cypriot

Cyrillic

Deseret

Devanagari

Dives_Akuru

Dogra

Duployan

Egyptian_Hieroglyphs

Elbasan

Elymaic

Ethiopic

Georgian

Glagolitic

Gothic

Grantha

Greek

Gujarati

Gunjala_Gondi

Gurmukhi

Han

Hangul

Hanifi_Rohingya

Hanunoo

Hatran

Hebrew

Hiragana

Imperial_Aramaic

Inherited

Inscriptional_Pahlavi

Inscriptional_Parthian

Javanese

Kaithi

Kannada

Katakana

Kayah_Li

Kharoshthi

Khitan_Small_Script

Khmer

Khojki

Khudawadi

Lao

Latin

Lepcha

Limbu

Linear_A

Linear_B

Lisu

Lycian

Lydian

Mahajani

Makasar

Malayalam

Mandaic

Manichaean

Marchen

Masaram_Gondi

Medefaidrin

Meetei_Mayek

Mende_Kikakui

Meroitic_Cursive

Meroitic_Hieroglyphs

Miao

Modi

Mongolian

Mro

Multani

Myanmar

Nabataean

Nandinagari

New_Tai_Lue

Newa

Nko

Nushu

Nyakeng_Puachue_Hmong

Ogham

Ol_Chiki

Old_Hungarian

Old_Italic

Old_North_Arabian

Old_Permic

Old_Persian

Old_Sogdian

Old_South_Arabian

Old_Turkic

Oriya

Osage

Osmanya

Pahawh_Hmong

Palmyrene

Pau_Cin_Hau

Phags_Pa

Phoenician

Psalter_Pahlavi

Rejang

Runic

Samaritan

Saurashtra

Sharada

Shavian

Siddham

SignWriting

Sinhala

Sogdian

Sora_Sompeng

Soyombo

Sundanese

Syloti_Nagri

Syriac

Tagalog

Tagbanwa

Tai_Le

Tai_Tham

Tai_Viet

Takri

Tamil

Tangut

Telugu

Thaana

Thai

Tibetan

Tifinagh

Tirhuta

Ugaritic

Vai

Wancho

Warang_Citi

Yezidi

Zanabazar_Square

Character Classes SyntaxDescription
[...]positive character class
[^...]negative character class
[x-y]range
[[:xxx:]]positive POSIX named set (see below)
[[:^xxx:]]negative POSIX named set (see below)
Names for POSIX Named Sets SyntaxDescription
alnumalphanumeric
alphaalphabetic
ascii0-127
blankspace or tab
cntrlcontrol character
digitdecimal digit
graphprinting, excluding space
lowerlower case letter
printprinting, including space
punctprinting, excluding alphanumeric
spacewhite space
upperupper case letter
wordsame as w
xdigithexadecimal digit
POSIX named sets also make use of Unicode character properties if applicable.

Quantifiers SyntaxDescription
?0 or 1, greedy
?+0 or 1, possessive
??0 or 1, lazy
*0 or more, greedy
*+0 or more, possessive
*?0 or more, lazy
+1 or more, greedy
++1 or more, possessive
+?1 or more, lazy
{n}exactly n
{n,m}at least n, no more than m, greedy
{n,m}+at least n, no more than m, possessive
{n,m}?at least n, no more than m, lazy
{n,}n or more, greedy
{n,}+n or more, possessive
{n,}?n or more, lazy

Anchors and Basic Assertions SyntaxDescription
bword boundary
Bnot a word boundary
^start of subject (also after an internal line feed, that is a line feed that does not occur at the end of the subject, in multiline mode)
Astart of subject (if matching on a subject is done with a starting offset<(>,<)> A can never match)
$end of subject and before a line feed at the end of the subject (also before line feed in multiline mode)
Zend of subject and before a line feed at the end of the subject
zend of subject
Gfirst matching position in subject (true if the current matching position is at the start point of the matching process, which may differ from the start of the subject e.g. if a starting offset is specified)

Reported Match Point Setting SyntaxDescription
Kset reported start of match; e.g. the regex foo Kbar matches foobar but reports that it has matched only bar
K is respected in positive assertions, but ignored in negative ones.

Alternation SyntaxDescription
|start of alternative branch

Grouping and Capturing SyntaxDescription
(...)capture group
(?<(><<)>name>...)named capture group (Perl style)
(?'name'...)named capture group (Perl style)
(?P?<(><<)>name>...)named capture group (Python style)
(?:...)non-capture group
(?|...)non-capture group; reset group numbers for capture groups in each alternative
(?>...)atomic non-capture group
(*atomic:...)atomic non-capture group
A name must not start with a digit. Unicode names are allowed.

Comments SyntaxDescription
(?#...)comment (cannot be nested)
#...extended mode: comment
In extended mode, an unescaped # introduces a comment which in this case continues to immediately after the next line feed character or character sequence in the pattern. This has to be a literal line feed character or character sequence, escape sequences that happen to represent a line feed like n do not count.

Option Setting SyntaxDescription
(?i)caseless / case-insensitive search
(?J)allow duplicate named groups
(?m)multiline mode
(?n)no auto capture
(?s)single line mode (<(>dotall<)>)
(?U)default ungreedy quantifiers (lazy)
(?x)extended mode: ignore white space except in classes
(?xx)same as (?x) but also ignore space and tab in classes
(?-...)unset option(s)
(?^)unset i, m, n, s and x options
Changes of these options within a group are automatically cancelled at the end of the group.
Several options may be set at once, and a mixture of setting and unsetting such as (?i-x) is allowed, but there may be only one hyphen. Setting (but no unsetting) is allowed after (?^ for example (?^in). An option setting may appear at the start of a non-capture group, for example (?i:...).

Special Control Verbs
Special control verbs are only recognized at the very start of a pattern. SyntaxDescription
(*LIMIT_DEPTH=d)set the backtracking limit to d
(*LIMIT_HEAP=d)set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d)set the match limit to d
(*NOTEMPTY)lock out matching of empty strings entirely
(*NOTEMPTY_ATSTART)lock out matching of empty strings at the start of the subject
(*NO_AUTO_POSSES)prevents quantifiers from automatically being made possessive when what follows cannot match the repeated item (e.g. by default, a+b is handled as a++b as an optimization)
(*NO_DOTSTAR_ANCHOR)disable optimizations that apply to patterns whose top-level branches start with .*
(*NO_JIT)do not JIT-compile this pattern
(*NO_START_OPT)disable several optimizations for quickly reaching a no match result; this can be useful if you want callouts or backtracking control verbs to be executed in any case
(*UTF)enable UTF-mode; this verb cannot be used in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED , as it would clash with usages of C
(*UCP)enable usage of Unicode character properties; for ABAP regular expressions this option is already enabled by default
The following special control verbs control the line break convention, i.e. what gets recognized as a line feed character. They do not affect R : SyntaxDescription
(*CR)carriage return only
(*LF)line feed only
(*CRLF)carriage return followed by line feed
(*ANYCRLF)all three of the above
(*ANY)any Unicode line feed sequence
(*NUL)the NUL character (binary zero)
The following special control verbs control what R matches: SyntaxDescription
(*BSR_ANYCRLF)CR, LF and CRLF
(*BSR_UNICODE)any Unicode line feed sequence

Look-ahead and Look-behind Assertions SyntaxDescription
(?=...)positive look-ahead
(*pla:...)positive look-ahead
(*positive_look-ahead:...)positive look-ahead
(?!...)negative look-ahead
(*nla:...)negative look-ahead
(*negative_look-ahead:...)negative look-ahead
(?<(><<)>=...)positive look-behind
(*plb:...)positive look-behind
(*positive_look-behind:...)positive look-behind
(?<(><<)>!...)negative look-behind
(*nlb:...)negative look-behind
(*negative_look-behind:...)negative look-behind
Each top-level branch of a look-behind must be of fixed length.

Non-Atomic Look-around Assertions SyntaxDescription
(?*...)non-atomic positive look-ahead
(*napla:...)non-atomic positive look-ahead
(*non_atomic_positive_look-ahead:...)non-atomic positive look-ahead
(?<(><<)>*...)non-atomic positive look-behind
(*naplb:...)non-atomic positive look-behind
(*non_atomic_positive_look-behind:...)non-atomic positive look-behind

Backreferences SyntaxDescription
nreference by number n (can be ambiguous, see octal escapes)
gnreference by number n
g{n}reference by number n
g+nrelative reference by number n
g-nrelative reference by number n
g{+n}relative reference by number n
g{-n}relative reference by number n
k?<(><<)>name>reference by name (Perl style)
k'name'reference by name (Perl style)
g{name}reference by name (Perl style)
k{name}reference by name (.NET style)
(?P=name)reference by name (Perl style)

Subroutine References SyntaxDescription
(?R)recurse whole pattern
(?n)call subroutine by absolute number n
g<(><<)>n>call subroutine by absolute number n ( Oniguruma style)
g'n'call subroutine by absolute number n (Oniguruma style)
(?+n)call subroutine by relative number n
(?-n)call subroutine by relative number n
g<(><<)>+n>call subroutine by relative number n
g'+n'call subroutine by relative number n
g<(><<)>-n>call subroutine by relative number n
g'-n'call subroutine by relative number n
(?<(> <)>name)call subroutine by name (Perl style)
(?P>name)call subroutine by name (Python style)
g?<(><<)>name>call subroutine by name (Oniguruma style)
g'name'call subroutine by name (Oniguruma style)
Subroutine calls can be recursive. Left recursion is not possible however.

ABAP_EXAMPLE_ABEXA
Parsing with PCRE Regular Expression
ABAP_EXAMPLE_END

Conditional Patterns SyntaxDescription
(?(condition)yes-pattern)match the yes-pattern if the condition holds
(?(condition)yes-pattern|no-pattern)match the yes-pattern if the condition holds, otherwise match the false-pattern
Where condition can be one of the following or any other assertion like a look-ahead or look-behind assertion: SyntaxDescription
nabsolute number n reference condition
+nrelative number n reference condition
-nrelative number n reference condition
?<(><<)>name>named reference condition (Perl style)
'name'named reference condition (Perl style)
Roverall recursion condition
Rnspecific number n group recursion condition
R<(> <)>namespecific named group recursion condition
DEFINEdefine groups for reference (always yields false)
VERSION[>]=n.mtest PCRE2 version

Backtracking Control
The following backtracking control verbs are triggered immediately when they are reached: SyntaxDescription
(*ACCEPT)force successful match; lbr if triggered inside a group that is called as a subroutine, only the group is ended successfully; lbr if triggered inside a positive assertion, the assertion succeeds; lbrif triggered inside a negative assertion, the assertion fails
(*FAIL)force backtrack
(*F)force backtrack
(?!)force backtrack; actually a negative look-ahead for the empty string, which always matches thus always failing the look-ahead; (*FAIL) and (*F) are synonyms for (?!)
(*MARK:NAME)mark a position with name NAME, see (*SKIP:NAME) below; synonym (:NAME); NAME can contain any sequence of characters that does not include the closing parenthesis; an empty name will cause the mark to have no effect
The following backtracking control verbs are only triggered when a subsequent match failure causes a backtrack to reach them. All of them force a match failure, but differ in what happens afterwards: SyntaxDescription
(*COMMIT)overall failure, no advance of starting point
(*PRUNE)advance to next starting character; only applies if the pattern is not anchored, otherwise behaves the same
(*SKIP)advance to current matching position
(*SKIP:NAME)advance to position corresponding to an earlier (*MARK:NAME); if not found, the (*SKIP) is ignored
(*THEN)local failure, backtrack to next alternation
If one of these verbs is located in a group called as a subroutine, its effects are confined to the subroutine call.

Callouts SyntaxDescription
(?C)callout (assumed number 0)
(?Cn)callout with numeric data n
(?C'text')callout with string data text

ABAP_EXAMPLE_ABEXA
PCRE Regular Expression with Callouts
ABAP_EXAMPLE_END

Replacement Syntax

Capture Group Substitution SyntaxDescription
$idsubstitute for the content of the capture group identified by id,