summaryrefslogtreecommitdiff
path: root/Utilities/PCRE/man/html/pcre.3.html
diff options
context:
space:
mode:
authorGluzskiy Alexandr <sss123next@gmail.com>2010-02-15 05:51:01 +0300
committerGluzskiy Alexandr <sss123next@gmail.com>2010-02-15 05:51:01 +0300
commit7fd9fe181150f166a098eaf4e006f878c28cb770 (patch)
tree093af9d26a08e6bac60112a9c5f2f870ddef0fe8 /Utilities/PCRE/man/html/pcre.3.html
parent6f32ef233b95d78efb905c97081193a2a454590e (diff)
sort
Diffstat (limited to 'Utilities/PCRE/man/html/pcre.3.html')
-rw-r--r--Utilities/PCRE/man/html/pcre.3.html174
1 files changed, 174 insertions, 0 deletions
diff --git a/Utilities/PCRE/man/html/pcre.3.html b/Utilities/PCRE/man/html/pcre.3.html
new file mode 100644
index 0000000..93f32fa
--- /dev/null
+++ b/Utilities/PCRE/man/html/pcre.3.html
@@ -0,0 +1,174 @@
+<!-- manual page source format generated by PolyglotMan v3.2, -->
+<!-- available at http://polyglotman.sourceforge.net/ -->
+
+<html>
+<head>
+<title>PCRE(3) manual page</title>
+</head>
+<body bgcolor='white'>
+<a href='#toc'>Table of Contents</a><p>
+
+<h2><a name='sect0' href='#toc0'>Name</a></h2>
+PCRE - Perl-compatible regular expressions
+<h2><a name='sect1' href='#toc1'>Introduction</a></h2>
+ <p>
+The PCRE library
+is a set of functions that implement regular expression pattern matching
+using the same syntax and semantics as Perl, with just a few differences.
+The current implementation of PCRE (release 5.x) corresponds approximately
+with Perl 5.8, including support for UTF-8 encoded strings and Unicode general
+category properties. However, this support has to be explicitly enabled;
+it is not the default. <p>
+PCRE is written in C and released as a C library.
+A number of people have written wrappers and interfaces of various kinds.
+A C++ class is included in these contributions, which can be found in the
+<i>Contrib</i> directory at the primary FTP site, which is: <p>
+ ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
+<p>
+Details of exactly which Perl regular expression features are and are not
+supported by PCRE are given in separate documents. See the <b>pcrepattern</b>
+ and <b>pcrecompat</b> pages. <p>
+Some features of PCRE can be included, excluded,
+or changed when the library is built. The <b>pcre_config()</b> function makes
+it possible for a client to discover which features are available. The features
+themselves are described in the <b>pcrebuild</b> page. Documentation about building
+PCRE for various operating systems can be found in the <b>README</b> file in the
+source distribution.
+<h2><a name='sect2' href='#toc2'>User Documentation</a></h2>
+ <p>
+The user documentation for PCRE
+comprises a number of different sections. In the "man" format, each of these
+is a separate "man page". In the HTML format, each is a separate page, linked
+from the index page. In the plain text format, all the sections are concatenated,
+for ease of searching. The sections are as follows: <p>
+ pcre
+this document<br>
+ pcreapi details of PCRE&rsquo;s native API<br>
+ pcrebuild options for building PCRE<br>
+ pcrecallout details of the callout feature<br>
+ pcrecompat discussion of Perl compatibility<br>
+ pcregrep description of the <b>pcregrep</b> command<br>
+ pcrepartial details of the partial matching facility<br>
+ pcrepattern syntax and semantics of supported<br>
+ regular expressions<br>
+ pcreperform discussion of performance issues<br>
+ pcreposix the POSIX-compatible API<br>
+ pcreprecompile details of saving and re-using precompiled patterns<br>
+ pcresample discussion of the sample program<br>
+ pcretest description of the <b>pcretest</b> testing command<br>
+ <p>
+In addition, in the "man" and HTML formats, there is a short page for
+each library function, listing its arguments and results.
+<h2><a name='sect3' href='#toc3'>Limitations</a></h2>
+
+<p>
+There are some size limitations in PCRE but it is hoped that they will
+never in practice be relevant. <p>
+The maximum length of a compiled pattern
+is 65539 (sic) bytes if PCRE is compiled with the default internal linkage
+size of 2. If you want to process regular expressions that are truly enormous,
+you can compile PCRE with an internal linkage size of 3 or 4 (see the <b>README</b>
+file in the source distribution and the <b>pcrebuild</b> documentation for details).
+In these cases the limit is substantially larger. However, the speed of
+execution will be slower. <p>
+All values in repeating quantifiers must be less
+than 65536. The maximum number of capturing subpatterns is 65535. <p>
+There is
+no limit to the number of non-capturing subpatterns, but the maximum depth
+of nesting of all kinds of parenthesized subpattern, including capturing
+subpatterns, assertions, and other types of subpattern, is 200. <p>
+The maximum
+length of a subject string is the largest positive number that an integer
+variable can hold. However, PCRE uses recursion to handle subpatterns and
+indefinite repetition. This means that the available stack space may limit
+the size of a subject string that can be processed by certain patterns.
+<p>
+
+<h2><a name='sect4' href='#toc4'>Utf-8 and Unicode Property Support</a></h2>
+ <p>
+From release 3.3, PCRE has had some
+support for character strings encoded in the UTF-8 format. For release 4.0
+this was greatly extended to cover most common requirements, and in release
+5.0 additional support for Unicode general category properties was added.
+<p>
+In order process UTF-8 strings, you must build PCRE to include UTF-8 support
+in the code, and, in addition, you must call <b>pcre_compile()</b> with the
+PCRE_UTF8 option flag. When you do this, both the pattern and any subject
+strings that are matched against it are treated as UTF-8 strings instead
+of just strings of bytes. <p>
+If you compile PCRE with UTF-8 support, but do
+not use it at run time, the library will be a bit bigger, but the additional
+run time overhead is limited to testing the PCRE_UTF8 flag in several places,
+so should not be very large. <p>
+If PCRE is built with Unicode character property
+support (which implies UTF-8 support), the escape sequences \p{..}, \P{..}, and
+\X are supported. The available properties that can be tested are limited
+to the general category properties such as Lu for an upper case letter
+or Nd for a decimal number. A full list is given in the <b>pcrepattern</b> documentation.
+The PCRE library is increased in size by about 90K when Unicode property
+support is included. <p>
+The following comments apply when PCRE is running in
+UTF-8 mode: <p>
+1. When you set the PCRE_UTF8 flag, the strings passed as patterns
+and subjects are checked for validity on entry to the relevant functions.
+If an invalid UTF-8 string is passed, an error return is given. In some situations,
+you may already know that your strings are valid, and therefore want to
+skip these checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK
+flag at compile time or at run time, PCRE assumes that the pattern or subject
+it is given (respectively) contains only valid UTF-8 codes. In this case,
+it does not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8
+string to PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined.
+Your program may crash. <p>
+2. In a pattern, the escape sequence \x{...}, where the
+contents of the braces is a string of hexadecimal digits, is interpreted
+as a UTF-8 character whose code number is the given hexadecimal number,
+for example: \x{1234}. If a non-hexadecimal digit appears between the braces,
+the item is not recognized. This escape sequence can be used either as a
+literal, or within a character class. <p>
+3. The original hexadecimal escape
+sequence, \xhh, matches a two-byte UTF-8 character if the value is greater
+than 127. <p>
+4. Repeat quantifiers apply to complete UTF-8 characters, not to
+individual bytes, for example: \x{100}{3}. <p>
+5. The dot metacharacter matches
+one UTF-8 character instead of a single byte. <p>
+6. The escape sequence \C can
+be used to match a single byte in UTF-8 mode, but its use can lead to some
+strange effects. <p>
+7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
+test characters of any code value, but the characters that PCRE recognizes
+as digits, spaces, or word characters remain the same set as before, all
+with values less than 256. This remains true even when PCRE includes Unicode
+property support, because to do otherwise would slow down PCRE in many
+common cases. If you really want to test for a wider sense of, say, "digit",
+you must use Unicode property tests such as \p{Nd}. <p>
+8. Similarly, characters
+that match the POSIX named character classes are all low-valued characters.
+<p>
+9. Case-insensitive matching applies only to characters whose values are
+less than 128, unless PCRE is built with Unicode property support. Even
+when Unicode property support is available, PCRE still uses its own character
+tables when checking the case of low-valued characters, so as not to degrade
+performance. The Unicode property information is used only for characters
+with higher values.
+<h2><a name='sect5' href='#toc5'>Author</a></h2>
+ <p>
+Philip Hazel &lt;ph10@cam.ac.uk&gt; <br>
+University Computing Service, <br>
+Cambridge CB2 3QG, England. <br>
+Phone: +44 1223 334714 <p>
+ Last updated: 09 September 2004 <br>
+Copyright (c) 1997-2004 University of Cambridge. <p>
+
+<hr><p>
+<a name='toc'><b>Table of Contents</b></a><p>
+<ul>
+<li><a name='toc0' href='#sect0'>Name</a></li>
+<li><a name='toc1' href='#sect1'>Introduction</a></li>
+<li><a name='toc2' href='#sect2'>User Documentation</a></li>
+<li><a name='toc3' href='#sect3'>Limitations</a></li>
+<li><a name='toc4' href='#sect4'>Utf-8 and Unicode Property Support</a></li>
+<li><a name='toc5' href='#sect5'>Author</a></li>
+</ul>
+</body>
+</html>