From 7fd9fe181150f166a098eaf4e006f878c28cb770 Mon Sep 17 00:00:00 2001 From: Gluzskiy Alexandr Date: Mon, 15 Feb 2010 05:51:01 +0300 Subject: sort --- Utilities/PCRE/man/html/pcre.3.html | 174 ++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 Utilities/PCRE/man/html/pcre.3.html (limited to 'Utilities/PCRE/man/html/pcre.3.html') diff --git a/Utilities/PCRE/man/html/pcre.3.html b/Utilities/PCRE/man/html/pcre.3.html new file mode 100644 index 0000000..93f32fa --- /dev/null +++ b/Utilities/PCRE/man/html/pcre.3.html @@ -0,0 +1,174 @@ + + + + + +PCRE(3) manual page + + +Table of Contents

+ +

Name

+PCRE - Perl-compatible regular expressions +

Introduction

+

+The PCRE library +is a set of functions that implement regular expression pattern matching +using the same syntax and semantics as Perl, with just a few differences. +The current implementation of PCRE (release 5.x) corresponds approximately +with Perl 5.8, including support for UTF-8 encoded strings and Unicode general +category properties. However, this support has to be explicitly enabled; +it is not the default.

+PCRE is written in C and released as a C library. +A number of people have written wrappers and interfaces of various kinds. +A C++ class is included in these contributions, which can be found in the +Contrib directory at the primary FTP site, which is:

+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre +

+Details of exactly which Perl regular expression features are and are not +supported by PCRE are given in separate documents. See the pcrepattern + and pcrecompat pages.

+Some features of PCRE can be included, excluded, +or changed when the library is built. The pcre_config() function makes +it possible for a client to discover which features are available. The features +themselves are described in the pcrebuild page. Documentation about building +PCRE for various operating systems can be found in the README file in the +source distribution. +

User Documentation

+

+The user documentation for PCRE +comprises a number of different sections. In the "man" format, each of these +is a separate "man page". In the HTML format, each is a separate page, linked +from the index page. In the plain text format, all the sections are concatenated, +for ease of searching. The sections are as follows:

+ pcre +this document
+ pcreapi details of PCRE’s native API
+ pcrebuild options for building PCRE
+ pcrecallout details of the callout feature
+ pcrecompat discussion of Perl compatibility
+ pcregrep description of the pcregrep command
+ pcrepartial details of the partial matching facility
+ pcrepattern syntax and semantics of supported
+ regular expressions
+ pcreperform discussion of performance issues
+ pcreposix the POSIX-compatible API
+ pcreprecompile details of saving and re-using precompiled patterns
+ pcresample discussion of the sample program
+ pcretest description of the pcretest testing command
+

+In addition, in the "man" and HTML formats, there is a short page for +each library function, listing its arguments and results. +

Limitations

+ +

+There are some size limitations in PCRE but it is hoped that they will +never in practice be relevant.

+The maximum length of a compiled pattern +is 65539 (sic) bytes if PCRE is compiled with the default internal linkage +size of 2. If you want to process regular expressions that are truly enormous, +you can compile PCRE with an internal linkage size of 3 or 4 (see the README +file in the source distribution and the pcrebuild documentation for details). +In these cases the limit is substantially larger. However, the speed of +execution will be slower.

+All values in repeating quantifiers must be less +than 65536. The maximum number of capturing subpatterns is 65535.

+There is +no limit to the number of non-capturing subpatterns, but the maximum depth +of nesting of all kinds of parenthesized subpattern, including capturing +subpatterns, assertions, and other types of subpattern, is 200.

+The maximum +length of a subject string is the largest positive number that an integer +variable can hold. However, PCRE uses recursion to handle subpatterns and +indefinite repetition. This means that the available stack space may limit +the size of a subject string that can be processed by certain patterns. +

+ +

Utf-8 and Unicode Property Support

+

+From release 3.3, PCRE has had some +support for character strings encoded in the UTF-8 format. For release 4.0 +this was greatly extended to cover most common requirements, and in release +5.0 additional support for Unicode general category properties was added. +

+In order process UTF-8 strings, you must build PCRE to include UTF-8 support +in the code, and, in addition, you must call pcre_compile() with the +PCRE_UTF8 option flag. When you do this, both the pattern and any subject +strings that are matched against it are treated as UTF-8 strings instead +of just strings of bytes.

+If you compile PCRE with UTF-8 support, but do +not use it at run time, the library will be a bit bigger, but the additional +run time overhead is limited to testing the PCRE_UTF8 flag in several places, +so should not be very large.

+If PCRE is built with Unicode character property +support (which implies UTF-8 support), the escape sequences \p{..}, \P{..}, and +\X are supported. The available properties that can be tested are limited +to the general category properties such as Lu for an upper case letter +or Nd for a decimal number. A full list is given in the pcrepattern documentation. +The PCRE library is increased in size by about 90K when Unicode property +support is included.

+The following comments apply when PCRE is running in +UTF-8 mode:

+1. When you set the PCRE_UTF8 flag, the strings passed as patterns +and subjects are checked for validity on entry to the relevant functions. +If an invalid UTF-8 string is passed, an error return is given. In some situations, +you may already know that your strings are valid, and therefore want to +skip these checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK +flag at compile time or at run time, PCRE assumes that the pattern or subject +it is given (respectively) contains only valid UTF-8 codes. In this case, +it does not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 +string to PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. +Your program may crash.

+2. In a pattern, the escape sequence \x{...}, where the +contents of the braces is a string of hexadecimal digits, is interpreted +as a UTF-8 character whose code number is the given hexadecimal number, +for example: \x{1234}. If a non-hexadecimal digit appears between the braces, +the item is not recognized. This escape sequence can be used either as a +literal, or within a character class.

+3. The original hexadecimal escape +sequence, \xhh, matches a two-byte UTF-8 character if the value is greater +than 127.

+4. Repeat quantifiers apply to complete UTF-8 characters, not to +individual bytes, for example: \x{100}{3}.

+5. The dot metacharacter matches +one UTF-8 character instead of a single byte.

+6. The escape sequence \C can +be used to match a single byte in UTF-8 mode, but its use can lead to some +strange effects.

+7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly +test characters of any code value, but the characters that PCRE recognizes +as digits, spaces, or word characters remain the same set as before, all +with values less than 256. This remains true even when PCRE includes Unicode +property support, because to do otherwise would slow down PCRE in many +common cases. If you really want to test for a wider sense of, say, "digit", +you must use Unicode property tests such as \p{Nd}.

+8. Similarly, characters +that match the POSIX named character classes are all low-valued characters. +

+9. Case-insensitive matching applies only to characters whose values are +less than 128, unless PCRE is built with Unicode property support. Even +when Unicode property support is available, PCRE still uses its own character +tables when checking the case of low-valued characters, so as not to degrade +performance. The Unicode property information is used only for characters +with higher values. +

Author

+

+Philip Hazel <ph10@cam.ac.uk>
+University Computing Service,
+Cambridge CB2 3QG, England.
+Phone: +44 1223 334714

+ Last updated: 09 September 2004
+Copyright (c) 1997-2004 University of Cambridge.

+ +


+Table of Contents

+

+ + -- cgit v1.2.3