SAP REGEX POSIX PCRE IMPROVE



Get Example source ABAP code based on a different SAP table
  



ABAP_REGEX - New Features in PCRE Compared to POSIX
While topic Incompatibilities between POSIX and PCRE deals with incompatibilities and missing features when migrating from POSIX to PCRE, it is also worth taking a look at the vast array of new features PCRE has to offer.
An introduction to some of these features is provided in the following, the list is however far from complete.
ITOC

Making Use of New Features for Patterns

Lazy Quantifiers
The most obvious downside of POSIX regular expressions in ABAP is the lack of lazy (also known as non-greedy or reluctant) quantifiers.
In PCRE a quantifier can be made lazy by adding a trailing ?:: DescriptionPCRE Syntax
0 or 1, preferred 0??
0 or more, as few as possible*?
1 or more, as few as possible+?
at least n, no more than m, as few as possible{n,m}?
at least n, as few as possible{n,}?

ABAP_EXAMPLE_VX5
Difference between greedy and non-greedy behavior,
ABEXA 01498
ABAP_EXAMPLE_END

Look-behind Assertions DescriptionPCRE Syntax
positive look-behind assertion; succeeds if the current match position is preceded by the given pattern (?<(><<)>=...)
negative look-behind assertion; succeeds if the current match position is not preceded by the given pattern(?<(><<)>!...)

ABAP_EXAMPLE_VX5
Leading and trailing look-behind assertions, like look-ahead assertions, are not part of the actual match.
ABEXA 01499
ABAP_EXAMPLE_END

Multiline Mode
In some scenarios it is necessary to respect line feeds during matching, e.g. matching something only if it is located at the beginning of a line. PCRE makes this very convenient by providing a large amount of control over the handling of multiple lines in the matching process.
When creating a regular expression using method CREATE_PCRE of system class CL_ABAP_REGEX multi line handling can be controlled by the following parameters: ParameterDescription
DOT_ALLsingle line mode; if enabled, special character . also matches line feed characters
ENABLE_MULTILINEmulti line mode; if enabled, special characters ^ and $ not only match the start and the end of the character string, but also the start and the end of a line respectively; a line is ended by a line feed character
NEWLINE_MODEcontrols what gets recognized as a line feed character
Despite their names, single line and multi line mode are not mutually exclusive and can be combined.
It is also possible to set these options directly in the pattern, which is especially useful for regular expressions used in statements FIND and REPLACE behind PCRE or in built-in functions behind pcre:
Single line mode can be enabled using the option setting syntax (?s) anywhere in the pattern.
Multi line mode can be enabled using the option setting syntax (?m) anywhere in the pattern.
What gets recognized as a line feed character can be controlled by the following syntax that can only appear at the start of the pattern:
(*CR) carriage return only
(*LF) linefeed only
(*CRLF) carriage return followed by linefeed
(*ANYCRLF) all three of the above
(*ANY) any Unicode line feed sequence
(*NUL) the NUL character (binary zero)

ABAP_EXAMPLE_VX5
While the first regular expression matches only the beginning of the character string, the second one also matches the beginning of new lines that are defined by the syntax n for line feeds in a string template.
ABEXA 01500
ABAP_EXAMPLE_END

Named Capture Groups
PCRE supports the naming of capture groups, meaning you can assign a name to a capture group, e.g. using the (?<(><<)>name>...) syntax. You can refer to a named capture group by its name, e.g. in a backreference using the k<(><<)>name> syntax.

ABAP_EXAMPLE_VX5
The regular expression matches the character string. The capture group is used by its name to match further occurrences of the pattern defined for the group.
ABEXA 01501
ABAP_EXAMPLE_END

Subroutine Calls and Recursion
Apart from referring to the content of a group via backreferences, PCRE supports calling groups as subroutines using the (?n) syntax. It is also possible to call a named group as a subroutine, e.g. using the (? name) syntax.

ABAP_EXAMPLE_VX5
The example shows the calling of groups as subroutines in three blocks:
In the first block, the backreference 1 simply matches whatever the first capture group actually matched most recently, instead of matching all the things the capture group could match.
The second block shows, how this behavior can be achieved by calling the group as a subroutine using the (?n) syntax.
The third block shows, how by recursing over the whole pattern using the (?R) syntax in one branch of the alternation, it is ensured that there is a balanced but arbitrary number of opening ( () and closing ( )) parentheses to either side of the digits. Note that the pattern makes use of the possessive quantifier ++ that acts the same as + but prevents backtracking into what was matched by the quantifier.
ABEXA 01502
ABAP_EXAMPLE_END

Callouts
Callouts are another powerful feature. It invokes ABAP code from within the pattern during the matching process, passing data from the pattern to the callout routine.
Callouts are achieved with the (?C...). A callout routine cannot only access the numeric data n provided by the callout (?Cn) or the string data str provided by the callout (?C'str'), but also a lot of other properties and information about the current matcher state.

ABAP_EXAMPLE_ABEXA
PCRE Regular Expression with Callouts
ABAP_EXAMPLE_END

Making Use of New Features for Replacements

Conditional Substitution
PCRE's conditional substitution syntax allows you to check if a certain capture group did participate in the match, specifying different replacement strings for when it did and did not participate.

ABAP_EXAMPLE_VX5
Conditional substitutions with {id...}.
ABEXA 01503
ABAP_EXAMPLE_END

Case Conversion
Using the u, U, l and L modifiers in the PCRE replacement string, the case of the inserted text can be converted to uppercase or lowercase. While u and l only affect the first character following them, U and L affect all following characters, until a different case conversion modifier or the termination operator E is reached.
The case conversion syntax can also be combined with conditional substitution.

ABAP_EXAMPLE_VX5
Replacements with case conversions. The latter two use conditional substitutions.
ABEXA 01504
ABAP_EXAMPLE_END