summaryrefslogtreecommitdiff
path: root/Utilities/PCRE/man/html/pcreposix.3.html
diff options
context:
space:
mode:
Diffstat (limited to 'Utilities/PCRE/man/html/pcreposix.3.html')
-rw-r--r--Utilities/PCRE/man/html/pcreposix.3.html187
1 files changed, 187 insertions, 0 deletions
diff --git a/Utilities/PCRE/man/html/pcreposix.3.html b/Utilities/PCRE/man/html/pcreposix.3.html
new file mode 100644
index 0000000..0e7cafd
--- /dev/null
+++ b/Utilities/PCRE/man/html/pcreposix.3.html
@@ -0,0 +1,187 @@
+<!-- manual page source format generated by PolyglotMan v3.2, -->
+<!-- available at http://polyglotman.sourceforge.net/ -->
+
+<html>
+<head>
+<title>PCRE(3) manual page</title>
+</head>
+<body bgcolor='white'>
+<a href='#toc'>Table of Contents</a><p>
+
+<h2><a name='sect0' href='#toc0'>Name</a></h2>
+PCRE - Perl-compatible regular expressions.
+<h2><a name='sect1' href='#toc1'>Synopsis of Posix API</a></h2>
+ <p>
+<b>#include
+&lt;pcreposix.h&gt;</b> <p>
+<font size='-1'></font>
+ <br>
+<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b> <b>int <i>cflags</i>);</b> <p>
+<br>
+<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b> <b>size_t <i>nmatch</i>, regmatch_t
+<i>pmatch</i>[], int <i>eflags</i>);</b> <p>
+<br>
+<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b> <b>char *<i>errbuf</i>, size_t
+<i>errbuf_size</i>);</b> <p>
+<br>
+<b>void regfree(regex_t *<i>preg</i>);</b>
+<h2><a name='sect2' href='#toc2'>Description</a></h2>
+ <p>
+This set of functions provides
+a POSIX-style API to the PCRE regular expression package. See the <b>pcreapi</b>
+ documentation for a description of PCRE&rsquo;s native API, which contains additional
+functionality. <p>
+The functions described here are just wrapper functions that
+ultimately call the PCRE native API. Their prototypes are defined in the
+<b>pcreposix.h</b> header file, and on Unix systems the library itself is called
+<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the command for
+linking an application that uses them. Because the POSIX functions call
+the native ones, it is also necessary to add <b>-lpcre</b>. <p>
+I have implemented only
+those option bits that can be reasonably mapped to PCRE native options.
+In addition, the options REG_EXTENDED and REG_NOSUB are defined with the
+value zero. They have no effect, but since programs that are written to
+the POSIX interface often use them, this makes it easier to slot in PCRE
+as a replacement library. Other POSIX options are not even defined. <p>
+When
+PCRE is called via these functions, it is only the API that is POSIX-like
+in style. The syntax and semantics of the regular expressions themselves
+are still those of Perl, subject to the setting of various PCRE options,
+as described below. "POSIX-like in style" means that the API approximates
+to the POSIX definition; it is not fully POSIX-compatible, and in multi-byte
+encoding domains it is probably even less compatible. <p>
+The header for these
+functions is supplied as <b>pcreposix.h</b> to avoid any potential clash with other
+POSIX libraries. It can, of course, be renamed or aliased as <b>regex.h</b>, which
+is the "correct" name. It provides two structure types, <i>regex_t</i> for compiled
+internal forms, and <i>regmatch_t</i> for returning captured substrings. It also
+defines some constants whose names start with "REG_"; these are used for
+setting options and identifying error codes. <p>
+
+<h2><a name='sect3' href='#toc3'>Compiling a Pattern</a></h2>
+ <p>
+The function
+<b>regcomp()</b> is called to compile a pattern into an internal form. The pattern
+is a C string terminated by a binary zero, and is passed in the argument
+<i>pattern</i>. The <i>preg</i> argument is a pointer to a <b>regex_t</b> structure that is used
+as a base for storing information about the compiled expression. <p>
+The argument
+<i>cflags</i> is either zero, or contains one or more of the bits defined by the
+following macros: <p>
+ REG_ICASE<br>
+ <p>
+The PCRE_CASELESS option is set when the expression is passed for compilation
+to the native function. <p>
+ REG_NEWLINE<br>
+ <p>
+The PCRE_MULTILINE option is set when the expression is passed for compilation
+to the native function. Note that this does <i>not</i> mimic the defined POSIX
+behaviour for REG_NEWLINE (see the following section). <p>
+In the absence of
+these flags, no options are passed to the native function. This means the
+the regex is compiled with PCRE default semantics. In particular, the way
+it handles newline characters in the subject string is the Perl way, not
+the POSIX way. Note that setting PCRE_MULTILINE has only <i>some</i> of the effects
+specified for REG_NEWLINE. It does not affect the way newlines are matched
+by . (they aren&rsquo;t) or by a negative class such as [^a] (they are). <p>
+The yield
+of <b>regcomp()</b> is zero on success, and non-zero otherwise. The <i>preg</i> structure
+is filled in on success, and one member of the structure is public: <i>re_nsub</i>
+contains the number of capturing subpatterns in the regular expression.
+Various error codes are defined in the header file.
+<h2><a name='sect4' href='#toc4'>Matching Newline Characters</a></h2>
+
+<p>
+This area is not simple, because POSIX and Perl take different views of
+things. It is not possible to get PCRE to obey POSIX semantics, but then
+PCRE was never intended to be a POSIX engine. The following table lists
+the different possibilities for matching newline characters in PCRE: <p>
+
+ Default Change with<br>
+ <p>
+ . matches newline no PCRE_DOTALL<br>
+ newline matches [^a] yes not changeable<br>
+ $ matches \n at end yes PCRE_DOLLARENDONLY<br>
+ $ matches \n in middle no PCRE_MULTILINE<br>
+ ^ matches \n in middle no PCRE_MULTILINE<br>
+ <p>
+This is the equivalent table for POSIX: <p>
+ Default
+ Change with<br>
+ <p>
+ . matches newline yes REG_NEWLINE<br>
+ newline matches [^a] yes REG_NEWLINE<br>
+ $ matches \n at end no REG_NEWLINE<br>
+ $ matches \n in middle no REG_NEWLINE<br>
+ ^ matches \n in middle no REG_NEWLINE<br>
+ <p>
+PCRE&rsquo;s behaviour is the same as Perl&rsquo;s, except that there is no equivalent
+for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way
+to stop newline from matching [^a]. <p>
+The default POSIX newline handling can
+be obtained by setting PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is
+no way to make PCRE behave exactly as for the REG_NEWLINE action.
+<h2><a name='sect5' href='#toc5'>Matching
+a Pattern</a></h2>
+ <p>
+The function <b>regexec()</b> is called to match a compiled pattern
+<i>preg</i> against a given <i>string</i>, which is terminated by a zero byte, subject
+to the options in <i>eflags</i>. These can be: <p>
+ REG_NOTBOL<br>
+ <p>
+The PCRE_NOTBOL option is set when calling the underlying PCRE matching
+function. <p>
+ REG_NOTEOL<br>
+ <p>
+The PCRE_NOTEOL option is set when calling the underlying PCRE matching
+function. <p>
+The portion of the string that was matched, and also any captured
+substrings, are returned via the <i>pmatch</i> argument, which points to an array
+of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members <i>rm_so</i> and
+<i>rm_eo</i>. These contain the offset to the first character of each substring
+and the offset to the first character after the end of each substring,
+respectively. The 0th element of the vector relates to the entire portion
+of <i>string</i> that was matched; subsequent elements relate to the capturing
+subpatterns of the regular expression. Unused entries in the array have
+both structure members set to -1. <p>
+A successful match yields a zero return;
+various error codes are defined in the header file, of which REG_NOMATCH
+is the "expected" failure code.
+<h2><a name='sect6' href='#toc6'>Error Messages</a></h2>
+ <p>
+The <b>regerror()</b> function
+maps a non-zero errorcode from either <b>regcomp()</b> or <b>regexec()</b> to a printable
+message. If <i>preg</i> is not NULL, the error should have arisen from the use
+of that structure. A message terminated by a binary zero is placed in <i>errbuf</i>.
+The length of the message, including the zero, is limited to <i>errbuf_size</i>.
+The yield of the function is the size of buffer needed to hold the whole
+message.
+<h2><a name='sect7' href='#toc7'>Memory Usage</a></h2>
+ <p>
+Compiling a regular expression causes memory to
+be allocated and associated with the <i>preg</i> structure. The function <b>regfree()</b>
+frees all such memory, after which <i>preg</i> may no longer be used as a compiled
+expression.
+<h2><a name='sect8' href='#toc8'>Author</a></h2>
+ <p>
+Philip Hazel &lt;ph10@cam.ac.uk&gt; <br>
+University Computing Service, <br>
+Cambridge CB2 3QG, England. <p>
+ Last updated: 07 September 2004 <br>
+Copyright (c) 1997-2004 University of Cambridge. <p>
+
+<hr><p>
+<a name='toc'><b>Table of Contents</b></a><p>
+<ul>
+<li><a name='toc0' href='#sect0'>Name</a></li>
+<li><a name='toc1' href='#sect1'>Synopsis of Posix API</a></li>
+<li><a name='toc2' href='#sect2'>Description</a></li>
+<li><a name='toc3' href='#sect3'>Compiling a Pattern</a></li>
+<li><a name='toc4' href='#sect4'>Matching Newline Characters</a></li>
+<li><a name='toc5' href='#sect5'>Matching a Pattern</a></li>
+<li><a name='toc6' href='#sect6'>Error Messages</a></li>
+<li><a name='toc7' href='#sect7'>Memory Usage</a></li>
+<li><a name='toc8' href='#sect8'>Author</a></li>
+</ul>
+</body>
+</html>