Atomic groupings and possessive quantifiersEdit
Atomic groupings and possessive quantifiers are a way of fine-tuning how much an NFA engine will backtrack for quantified parts of the regular expression.
- Atomic grouping: (?>foo*)
- Possessive quantifier: foo*+
Most implementation define a way to set flags on a regular expression, either on a global basis or within subpatterns.
Following perl, these are usually:
- i/I: case. should be case insensitive or case sensitive (default).
- u/U: greediness. quantifiers should be greedy (default) or ungreedy
- s/S: . dot-all mode. should match newline characters or every character except a newline (default)
- m/M: multiline mode. carat ('^') and dollar sign ('$') should match the beginning or end of a line, or should match the beginning or end of the entire input string (default). \A and \Z will always match the beginning and end of a piece of text regardless of multiline mode.
- x/X: free-spacing mode. ignore whitespace in the pattern to increase readability, or not (default).
These flags may be set when you define the pattern, or can be defined within the pattern or a subpattern using the following syntaxes:
- (?i) -- sets the 'i' flag on
- (?i:foo) -- sets the 'i' flag on the 'foo' subpattern. Equivalent to ((?i)foo)
Greediness refers to whether a quantified part of a regular expression will consume the most possible text before the next part of the expression or the least. By default a quantifier will consume the most possible ("greedy"). The opposite is an "ungreedy" quantifier.
Greediness can be controlled on a per-quantifier basis or, in most implementations, via a flag: u for greedy, U for ungreedy.
- Per quantifier: add a following '?' to the quantifier. Example: <a>.*?</a>
- Flag: set a 'u' flag on the regular expression, or within a subpattern. Example (perl): /<a>.*</a>/U
There are four kinds of lookarounds:
- Lookahead: (?=foo)
- Negative lookahead: (?!foo)
- Lookbehind: (?<=foo)
- Negative lookbehind: (?<!foo)
Lookarounds are zero-width assertions, meaning they do not actually consume characters.
See atomic groupings, above.