PCRE(3) manual page

From 7fd9fe181150f166a098eaf4e006f878c28cb770 Mon Sep 17 00:00:00 2001 From: Gluzskiy Alexandr Date: Mon, 15 Feb 2010 05:51:01 +0300 Subject: sort --- Utilities/PCRE/man/html/pcreperform.3.html | 86 ++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 Utilities/PCRE/man/html/pcreperform.3.html (limited to 'Utilities/PCRE/man/html/pcreperform.3.html') diff --git a/Utilities/PCRE/man/html/pcreperform.3.html b/Utilities/PCRE/man/html/pcreperform.3.html new file mode 100644 index 0000000..a4fea50 --- /dev/null +++ b/Utilities/PCRE/man/html/pcreperform.3.html @@ -0,0 +1,86 @@ + + + + + +PCRE(3) manual page + + +Table of Contents

+ +

Name

+PCRE - Perl-compatible regular expressions +

Pcre Performance

+Certain items +that may appear in regular expression patterns are more efficient than +others. It is more efficient to use a character class like [aeiou] than +a set of alternatives such as (a|e|i|o|u). In general, the simplest construction +that provides the required behaviour is usually the most efficient. Jeffrey +Friedl’s book contains a lot of useful general discussion about optimizing +regular expressions for efficient performance. This document contains a +few observations about PCRE.

+Using Unicode character properties (the \p, +\P, and \X escapes) is slow, because PCRE has to scan a structure that contains +data for over fifteen thousand characters whenever it needs a character’s +property. If you can find an alternative pattern that does not use character +properties, it will probably be faster.

+When a pattern begins with .* not +in parentheses, or in parentheses that are not the subject of a backreference, +and the PCRE_DOTALL option is set, the pattern is implicitly anchored by +PCRE, since it can match only at the start of a subject string. However, +if PCRE_DOTALL is not set, PCRE cannot make this optimization, because +the . metacharacter does not then match a newline, and if the subject string +contains newlines, the pattern may match from the character immediately +following one of them instead of from the very start. For example, the pattern +

+ .*second
+

+matches the subject "first\nand second" (where \n stands for a newline character), +with the match starting at the seventh character. In order to do this, PCRE +has to retry the match starting after every newline in the subject.

+If you +are using such a pattern with subject strings that do not contain newlines, +the best performance is obtained by setting PCRE_DOTALL, or starting the +pattern with ^.* to indicate explicit anchoring. That saves PCRE from having +to scan along the subject looking for a newline to restart at.

+Beware of +patterns that contain nested indefinite repeats. These can take a long time +to run when applied to a string that does not match. Consider the pattern +fragment

+ (a+)*
+

+This can match "aaaa" in 33 different ways, and this number increases +very rapidly as the string gets longer. (The * repeat can match 0, 1, 2, +3, or 4 times, and for each of those cases other than 0, the + repeats +can match different numbers of times.) When the remainder of the pattern +is such that the entire match is going to fail, PCRE has in principle to +try every possible variation, and this can take an extremely long time. +

+An optimization catches some of the more simple cases such as

+ (a+)*b
+

+where a literal character follows. Before embarking on the standard matching +procedure, PCRE checks that there is a "b" later in the subject string, +and if there is not, it fails the match immediately. However, when there +is no following literal this optimization cannot be used. You can see the +difference by comparing the behaviour of

+ (a+)*\d
+

+with the pattern above. The former gives a failure almost instantly when +applied to a whole line of "a" characters, whereas the latter takes an +appreciable time with strings longer than about 20 characters.

+In many cases, +the solution to this kind of performance issue is to use an atomic group +or a possessive quantifier.

+ Last updated: 09 September 2004
+Copyright (c) 1997-2004 University of Cambridge.

+ +

+Table of Contents

Name
Pcre Performance

+ + -- cgit v1.2.3