diff options
Diffstat (limited to 'Utilities/PCRE/man/html/pcreapi.3.html')
-rw-r--r-- | Utilities/PCRE/man/html/pcreapi.3.html | 1069 |
1 files changed, 1069 insertions, 0 deletions
diff --git a/Utilities/PCRE/man/html/pcreapi.3.html b/Utilities/PCRE/man/html/pcreapi.3.html new file mode 100644 index 0000000..a083204 --- /dev/null +++ b/Utilities/PCRE/man/html/pcreapi.3.html @@ -0,0 +1,1069 @@ +<!-- manual page source format generated by PolyglotMan v3.2, -->
+<!-- available at http://polyglotman.sourceforge.net/ -->
+
+<html>
+<head>
+<title>PCRE(3) manual page</title>
+</head>
+<body bgcolor='white'>
+<a href='#toc'>Table of Contents</a><p>
+
+<h2><a name='sect0' href='#toc0'>Name</a></h2>
+PCRE - Perl-compatible regular expressions
+<h2><a name='sect1' href='#toc1'>Pcre Native API</a></h2>
+ <p>
+<b>#include <pcre.h></b>
+<p>
+<font size='-1'></font>
+ <br>
+<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b> <b>const char **<i>errptr</i>,
+int *<i>erroffset</i>,</b> <b>const unsigned char *<i>tableptr</i>);</b> <p>
+<br>
+<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b> <b>const char **<i>errptr</i>);</b>
+<p>
+<br>
+<b>int pcre_exec(const pcre *<i>code</i>, "const pcre_extra *<i>extra</i>,"</b> <b>const char
+*<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b> <b>int <i>options</i>, int *<i>ovector</i>, int
+<i>ovecsize</i>);</b> <p>
+<br>
+<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b> <b>const char *<i>subject</i>, int
+*<i>ovector</i>,</b> <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b> <b>char *<i>buffer</i>, int
+<i>buffersize</i>);</b> <p>
+<br>
+<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b> <b>int <i>stringcount</i>,
+int <i>stringnumber</i>, char *<i>buffer</i>,</b> <b>int <i>buffersize</i>);</b> <p>
+<br>
+<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b> <b>const char *<i>subject</i>, int
+*<i>ovector</i>,</b> <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b> <b>const char **<i>stringptr</i>);</b>
+<p>
+<br>
+<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b> <b>const char *<i>name</i>);</b> <p>
+<br>
+<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b> <b>int <i>stringcount</i>,
+int <i>stringnumber</i>,</b> <b>const char **<i>stringptr</i>);</b> <p>
+<br>
+<b>int pcre_get_substring_list(const char *<i>subject</i>,</b> <b>int *<i>ovector</i>, int <i>stringcount</i>,
+"const char ***<i>listptr</i>);"</b> <p>
+<br>
+<b>void pcre_free_substring(const char *<i>stringptr</i>);</b> <p>
+<br>
+<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b> <p>
+<br>
+<b>const unsigned char *pcre_maketables(void);</b> <p>
+<br>
+<b>int pcre_fullinfo(const pcre *<i>code</i>, "const pcre_extra *<i>extra</i>,"</b> <b>int <i>what</i>,
+void *<i>where</i>);</b> <p>
+<br>
+<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b> <b>*<i>firstcharptr</i>);</b> <p>
+<br>
+<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b> <p>
+<br>
+<b>char *pcre_version(void);</b> <p>
+<br>
+<b>void *(*pcre_malloc)(size_t);</b> <p>
+<br>
+<b>void (*pcre_free)(void *);</b> <p>
+<br>
+<b>void *(*pcre_stack_malloc)(size_t);</b> <p>
+<br>
+<b>void (*pcre_stack_free)(void *);</b> <p>
+<br>
+<b>int (*pcre_callout)(pcre_callout_block *);</b>
+<h2><a name='sect2' href='#toc2'>Pcre API Overview</a></h2>
+ <p>
+PCRE has
+its own native API, which is described in this document. There is also a
+set of wrapper functions that correspond to the POSIX regular expression
+API. These are described in the <b>pcreposix</b> documentation. <p>
+The native API
+function prototypes are defined in the header file <b>pcre.h</b>, and on Unix systems
+the library itself is called <b>libpcre</b>. It can normally be accessed by adding
+<b>-lpcre</b> to the command for linking an application that uses PCRE. The header
+file defines the macros PCRE_MAJOR and PCRE_MINOR to contain the major
+and minor release numbers for the library. Applications can use these to
+include support for different releases of PCRE. <p>
+The functions <b>pcre_compile()</b>,
+<b>pcre_study()</b>, and <b>pcre_exec()</b> are used for compiling and matching regular
+expressions. A sample program that demonstrates the simplest way of using
+them is provided in the file called <i>pcredemo.c</i> in the source distribution.
+The <b>pcresample</b> documentation describes how to run it. <p>
+In addition to the
+main compiling and matching functions, there are convenience functions
+for extracting captured substrings from a matched subject string. They are:
+<p>
+ <b>pcre_copy_substring()</b><br>
+ <b>pcre_copy_named_substring()</b><br>
+ <b>pcre_get_substring()</b><br>
+ <b>pcre_get_named_substring()</b><br>
+ <b>pcre_get_substring_list()</b><br>
+ <b>pcre_get_stringnumber()</b><br>
+ <p>
+<b>pcre_free_substring()</b> and <b>pcre_free_substring_list()</b> are also provided,
+to free the memory used for extracted strings. <p>
+The function <b>pcre_maketables()</b>
+is used to build a set of character tables in the current locale for passing
+to <b>pcre_compile()</b> or <b>pcre_exec()</b>. This is an optional facility that is provided
+for specialist use. Most commonly, no special tables are passed, in which
+case internal tables that are generated when PCRE is built are used. <p>
+The
+function <b>pcre_fullinfo()</b> is used to find out information about a compiled
+pattern; <b>pcre_info()</b> is an obsolete version that returns only some of the
+available information, but is retained for backwards compatibility. The
+function <b>pcre_version()</b> returns a pointer to a string containing the version
+of PCRE and its date of release. <p>
+The global variables <b>pcre_malloc</b> and <b>pcre_free</b>
+initially contain the entry points of the standard <b>malloc()</b> and <b>free()</b>
+functions, respectively. PCRE calls the memory management functions via
+these variables, so a calling program can replace them if it wishes to
+intercept the calls. This should be done before calling any PCRE functions.
+<p>
+The global variables <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are also indirections
+to memory management functions. These special functions are used only when
+PCRE is compiled to use the heap for remembering data, instead of recursive
+function calls. This is a non-standard way of building PCRE, for use in environments
+that have limited stacks. Because of the greater use of memory management,
+it runs more slowly. Separate functions are provided so that special-purpose
+external code can be used for this case. When used, these functions are
+always called in a stack-like manner (last obtained, first freed), and always
+for memory blocks of the same size. <p>
+The global variable <b>pcre_callout</b> initially
+contains NULL. It can be set by the caller to a "callout" function, which
+PCRE will then call at specified points during a matching operation. Details
+are given in the <b>pcrecallout</b> documentation.
+<h2><a name='sect3' href='#toc3'>Multithreading</a></h2>
+ <p>
+The PCRE
+functions can be used in multi-threading applications, with the proviso
+that the memory management functions pointed to by <b>pcre_malloc</b>, <b>pcre_free</b>,
+<b>pcre_stack_malloc</b>, and <b>pcre_stack_free</b>, and the callout function pointed
+to by <b>pcre_callout</b>, are shared by all threads. <p>
+The compiled form of a regular
+expression is not altered during matching, so the same compiled pattern
+can safely be used by several threads at once.
+<h2><a name='sect4' href='#toc4'>Saving Precompiled Patterns
+for Later Use</a></h2>
+ <p>
+The compiled form of a regular expression can be saved and
+re-used at a later time, possibly by a different program, and even on a
+host other than the one on which it was compiled. Details are given in the
+ <b>pcreprecompile</b> documentation.
+<h2><a name='sect5' href='#toc5'>Checking Build-time Options</a></h2>
+ <p>
+<b>int pcre_config(int
+<i>what</i>, void *<i>where</i>);</b> <p>
+The function <b>pcre_config()</b> makes it possible for a
+PCRE client to discover which optional features have been compiled into
+the PCRE library. The <b>pcrebuild</b> documentation has more details about these
+optional features. <p>
+The first argument for <b>pcre_config()</b> is an integer, specifying
+which information is required; the second argument is a pointer to a variable
+into which the information is placed. The following information is available:
+<p>
+ PCRE_CONFIG_UTF8<br>
+ <p>
+The output is an integer that is set to one if UTF-8 support is available;
+otherwise it is set to zero. <p>
+ PCRE_CONFIG_UNICODE_PROPERTIES<br>
+ <p>
+The output is an integer that is set to one if support for Unicode character
+properties is available; otherwise it is set to zero. <p>
+ PCRE_CONFIG_NEWLINE<br>
+ <p>
+The output is an integer that is set to the value of the code that is
+used for the newline character. It is either linefeed (10) or carriage return
+(13), and should normally be the standard character for your operating
+system. <p>
+ PCRE_CONFIG_LINK_SIZE<br>
+ <p>
+The output is an integer that contains the number of bytes used for internal
+linkage in compiled regular expressions. The value is 2, 3, or 4. Larger
+values allow larger regular expressions to be compiled, at the expense
+of slower matching. The default value of 2 is sufficient for all but the
+most massive patterns, since it allows the compiled pattern to be up to
+64K in size. <p>
+ PCRE_CONFIG_POSIX_MALLOC_THRESHOLD<br>
+ <p>
+The output is an integer that contains the threshold above which the POSIX
+interface uses <b>malloc()</b> for output vectors. Further details are given in
+the <b>pcreposix</b> documentation. <p>
+ PCRE_CONFIG_MATCH_LIMIT<br>
+ <p>
+The output is an integer that gives the default limit for the number of
+internal matching function calls in a <b>pcre_exec()</b> execution. Further details
+are given with <b>pcre_exec()</b> below. <p>
+ PCRE_CONFIG_STACKRECURSE<br>
+ <p>
+The output is an integer that is set to one if internal recursion is implemented
+by recursive function calls that use the stack to remember their state.
+This is the usual way that PCRE is compiled. The output is zero if PCRE
+was compiled to use blocks of data on the heap instead of recursive function
+calls. In this case, <b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> are called to
+manage memory blocks on the heap, thus avoiding the use of the stack.
+
+<h2><a name='sect6' href='#toc6'>Compiling a Pattern</a></h2>
+ <p>
+<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
+ <b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b> <b>const unsigned char *<i>tableptr</i>);</b>
+<p>
+The function <b>pcre_compile()</b> is called to compile a pattern into an internal
+form. The pattern is a C string terminated by a binary zero, and is passed
+in the <i>pattern</i> argument. A pointer to a single block of memory that is obtained
+via <b>pcre_malloc</b> is returned. This contains the compiled code and related
+data. The <b>pcre</b> type is defined for the returned block; this is a typedef
+for a structure whose contents are not externally defined. It is up to the
+caller to free the memory when it is no longer required. <p>
+Although the compiled
+code of a PCRE regex is relocatable, that is, it does not depend on memory
+location, the complete <b>pcre</b> data block is not fully relocatable, because
+it may contain a copy of the <i>tableptr</i> argument, which is an address (see
+below). <p>
+The <i>options</i> argument contains independent bits that affect the compilation.
+It should be zero if no options are required. The available options are
+described below. Some of them, in particular, those that are compatible
+with Perl, can also be set and unset from within the pattern (see the detailed
+description in the <b>pcrepattern</b> documentation). For these options, the
+contents of the <i>options</i> argument specifies their initial settings at the
+start of compilation and execution. The PCRE_ANCHORED option can be set
+at the time of matching as well as at compile time. <p>
+If <i>errptr</i> is NULL, <b>pcre_compile()</b>
+returns NULL immediately. Otherwise, if compilation of a pattern fails,
+<b>pcre_compile()</b> returns NULL, and sets the variable pointed to by <i>errptr</i>
+to point to a textual error message. The offset from the start of the pattern
+to the character where the error was discovered is placed in the variable
+pointed to by <i>erroffset</i>, which must not be NULL. If it is, an immediate
+error is given. <p>
+If the final argument, <i>tableptr</i>, is NULL, PCRE uses a default
+set of character tables that are built when PCRE is compiled, using the
+default C locale. Otherwise, <i>tableptr</i> must be an address that is the result
+of a call to <b>pcre_maketables()</b>. This value is stored with the compiled pattern,
+and used again by <b>pcre_exec()</b>, unless another table pointer is passed to
+it. For more discussion, see the section on locale support below. <p>
+This code
+fragment shows a typical straightforward call to <b>pcre_compile()</b>: <p>
+ pcre
+*re;<br>
+ const char *error;<br>
+ int erroffset;<br>
+ re = pcre_compile(<br>
+ "^A.*Z", /* the pattern */<br>
+ 0, /* default options */<br>
+ &error, /* for error message */<br>
+ &erroffset, /* for error offset */<br>
+ NULL); /* use default character tables */<br>
+ <p>
+The following names for option bits are defined in the <b>pcre.h</b> header file:
+<p>
+ PCRE_ANCHORED<br>
+ <p>
+If this bit is set, the pattern is forced to be "anchored", that is, it
+is constrained to match only at the first matching point in the string
+that is being searched (the "subject string"). This effect can also be achieved
+by appropriate constructs in the pattern itself, which is the only way
+to do it in Perl. <p>
+ PCRE_AUTO_CALLOUT<br>
+ <p>
+If this bit is set, <b>pcre_compile()</b> automatically inserts callout items,
+all with number 255, before each pattern item. For discussion of the callout
+facility, see the <b>pcrecallout</b> documentation. <p>
+ PCRE_CASELESS<br>
+ <p>
+If this bit is set, letters in the pattern match both upper and lower
+case letters. It is equivalent to Perl’s /i option, and it can be changed
+within a pattern by a (?i) option setting. When running in UTF-8 mode, case
+support for high-valued characters is available only when PCRE is built
+with Unicode character property support. <p>
+ PCRE_DOLLAR_ENDONLY<br>
+ <p>
+If this bit is set, a dollar metacharacter in the pattern matches only
+at the end of the subject string. Without this option, a dollar also matches
+immediately before the final character if it is a newline (but not before
+any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE
+is set. There is no equivalent to this option in Perl, and no way to set
+it within a pattern. <p>
+ PCRE_DOTALL<br>
+ <p>
+If this bit is set, a dot metacharater in the pattern matches all characters,
+including newlines. Without it, newlines are excluded. This option is equivalent
+to Perl’s /s option, and it can be changed within a pattern by a (?s) option
+setting. A negative class such as [^a] always matches a newline character,
+independent of the setting of this option. <p>
+ PCRE_EXTENDED<br>
+ <p>
+If this bit is set, whitespace data characters in the pattern are totally
+ignored except when escaped or inside a character class. Whitespace does
+not include the VT character (code 11). In addition, characters between
+an unescaped # outside a character class and the next newline character,
+inclusive, are also ignored. This is equivalent to Perl’s /x option, and
+it can be changed within a pattern by a (?x) option setting. <p>
+This option
+makes it possible to include comments inside complicated patterns. Note,
+however, that this applies only to data characters. Whitespace characters
+may never appear within special character sequences in a pattern, for example
+within the sequence (?( which introduces a conditional subpattern. <p>
+ PCRE_EXTRA<br>
+ <p>
+This option was invented in order to turn on additional functionality
+of PCRE that is incompatible with Perl, but it is currently of very little
+use. When set, any backslash in a pattern that is followed by a letter that
+has no special meaning causes an error, thus reserving these combinations
+for future expansion. By default, as in Perl, a backslash followed by a
+letter with no special meaning is treated as a literal. There are at present
+no other features controlled by this option. It can also be set by a (?X)
+option setting within a pattern. <p>
+ PCRE_MULTILINE<br>
+ <p>
+By default, PCRE treats the subject string as consisting of a single line
+of characters (even if it actually contains newlines). The "start of line"
+metacharacter (^) matches only at the start of the string, while the "end
+of line" metacharacter ($) matches only at the end of the string, or before
+a terminating newline (unless PCRE_DOLLAR_ENDONLY is set). This is the same
+as Perl. <p>
+When PCRE_MULTILINE it is set, the "start of line" and "end of
+line" constructs match immediately following or immediately before any
+newline in the subject string, respectively, as well as at the very start
+and end. This is equivalent to Perl’s /m option, and it can be changed within
+a pattern by a (?m) option setting. If there are no "\n" characters in a
+subject string, or no occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE
+has no effect. <p>
+ PCRE_NO_AUTO_CAPTURE<br>
+ <p>
+If this option is set, it disables the use of numbered capturing parentheses
+in the pattern. Any opening parenthesis that is not followed by ? behaves
+as if it were followed by ?: but named parentheses can still be used for
+capturing (and they acquire numbers in the usual way). There is no equivalent
+of this option in Perl. <p>
+ PCRE_UNGREEDY<br>
+ <p>
+This option inverts the "greediness" of the quantifiers so that they are
+not greedy by default, but become greedy if followed by "?". It is not compatible
+with Perl. It can also be set by a (?U) option setting within the pattern.
+<p>
+ PCRE_UTF8<br>
+ <p>
+This option causes PCRE to regard both the pattern and the subject as
+strings of UTF-8 characters instead of single-byte character strings. However,
+it is available only when PCRE is built to include UTF-8 support. If not,
+the use of this option provokes an error. Details of how this option changes
+the behaviour of PCRE are given in the section on UTF-8 support in the
+main <b>pcre</b> page. <p>
+ PCRE_NO_UTF8_CHECK<br>
+ <p>
+When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
+automatically checked. If an invalid UTF-8 sequence of bytes is found, <b>pcre_compile()</b>
+returns an error. If you already know that your pattern is valid, and you
+want to skip this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK
+option. When it is set, the effect of passing an invalid UTF-8 string as
+a pattern is undefined. It may cause your program to crash. Note that this
+option can also be passed to <b>pcre_exec()</b>, to suppress the UTF-8 validity
+checking of subject strings.
+<h2><a name='sect7' href='#toc7'>Studying a Pattern</a></h2>
+ <p>
+<b>pcre_extra *pcre_study(const
+pcre *<i>code</i>, int <i>options</i>,</b> <b>const char **<i>errptr</i>);</b> <p>
+If a compiled pattern is
+going to be used several times, it is worth spending more time analyzing
+it in order to speed up the time taken for matching. The function <b>pcre_study()</b>
+takes a pointer to a compiled pattern as its first argument. If studying
+the pattern produces additional information that will help speed up matching,
+<b>pcre_study()</b> returns a pointer to a <b>pcre_extra</b> block, in which the <i>study_data</i>
+field points to the results of the study. <p>
+The returned value from <b>pcre_study()</b>
+can be passed directly to <b>pcre_exec()</b>. However, a <b>pcre_extra</b> block also
+contains other fields that can be set by the caller before the block is
+passed; these are described below in the section on matching a pattern.
+<p>
+If studying the pattern does not produce any additional information, <b>pcre_study()</b>
+returns NULL. In that circumstance, if the calling program wants to pass
+any of the other fields to <b>pcre_exec()</b>, it must set up its own <b>pcre_extra</b>
+block. <p>
+The second argument of <b>pcre_study()</b> contains option bits. At present,
+no options are defined, and this argument should always be zero. <p>
+The third
+argument for <b>pcre_study()</b> is a pointer for an error message. If studying
+succeeds (even if no data is returned), the variable it points to is set
+to NULL. Otherwise it points to a textual error message. You should therefore
+test the error pointer for NULL after calling <b>pcre_study()</b>, to be sure
+that it has run successfully. <p>
+This is a typical call to <b>pcre_study</b>(): <p>
+
+pcre_extra *pe;<br>
+ pe = pcre_study(<br>
+ re, /* result of pcre_compile() */<br>
+ 0, /* no options exist */<br>
+ &error); /* set to NULL or points to a message */<br>
+ <p>
+At present, studying a pattern is useful only for non-anchored patterns
+that do not have a single fixed starting character. A bitmap of possible
+starting bytes is created.
+<h2><a name='sect8' href='#toc8'>Locale Support</a></h2>
+ <p>
+PCRE handles caseless matching,
+and determines whether characters are letters, digits, or whatever, by
+reference to a set of tables, indexed by character value. (When running
+in UTF-8 mode, this applies only to characters with codes less than 128.
+Higher-valued codes never match escapes such as \w or \d, but can be tested
+with \p if PCRE is built with Unicode character property support.) <p>
+An internal
+set of tables is created in the default C locale when PCRE is built. This
+is used when the final argument of <b>pcre_compile()</b> is NULL, and is sufficient
+for many applications. An alternative set of tables can, however, be supplied.
+These may be created in a different locale from the default. As more and
+more applications change to using Unicode, the need for this locale support
+is expected to die away. <p>
+External tables are built by calling the <b>pcre_maketables()</b>
+function, which has no arguments, in the relevant locale. The result can
+then be passed to <b>pcre_compile()</b> or <b>pcre_exec()</b> as often as necessary. For
+example, to build and use tables that are appropriate for the French locale
+(where accented characters with values greater than 128 are treated as
+letters), the following code could be used: <p>
+ setlocale(LC_CTYPE, "fr_FR");<br>
+ tables = pcre_maketables();<br>
+ re = pcre_compile(..., tables);<br>
+ <p>
+When <b>pcre_maketables()</b> runs, the tables are built in memory that is obtained
+via <b>pcre_malloc</b>. It is the caller’s responsibility to ensure that the memory
+containing the tables remains available for as long as it is needed. <p>
+The
+pointer that is passed to <b>pcre_compile()</b> is saved with the compiled pattern,
+and the same tables are used via this pointer by <b>pcre_study()</b> and normally
+also by <b>pcre_exec()</b>. Thus, by default, for any single pattern, compilation,
+studying and matching all happen in the same locale, but different patterns
+can be compiled in different locales. <p>
+It is possible to pass a table pointer
+or NULL (indicating the use of the internal tables) to <b>pcre_exec()</b>. Although
+not intended for this purpose, this facility could be used to match a pattern
+in a different locale from the one in which it was compiled. Passing table
+pointers at run time is discussed below in the section on matching a pattern.
+
+<h2><a name='sect9' href='#toc9'>Information About a Pattern</a></h2>
+ <p>
+<b>int pcre_fullinfo(const pcre *<i>code</i>, "const
+pcre_extra *<i>extra</i>,"</b> <b>int <i>what</i>, void *<i>where</i>);</b> <p>
+The <b>pcre_fullinfo()</b> function
+returns information about a compiled pattern. It replaces the obsolete <b>pcre_info()</b>
+function, which is nevertheless retained for backwards compability (and
+is documented below). <p>
+The first argument for <b>pcre_fullinfo()</b> is a pointer
+to the compiled pattern. The second argument is the result of <b>pcre_study()</b>,
+or NULL if the pattern was not studied. The third argument specifies which
+piece of information is required, and the fourth argument is a pointer
+to a variable to receive the data. The yield of the function is zero for
+success, or one of the following negative numbers: <p>
+ PCRE_ERROR_NULL
+ the argument <i>code</i> was NULL<br>
+ the argument <i>where</i> was NULL<br>
+ PCRE_ERROR_BADMAGIC the "magic number" was not found<br>
+ PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid<br>
+ <p>
+The "magic number" is placed at the start of each compiled pattern as
+an simple check against passing an arbitrary memory pointer. Here is a typical
+call of <b>pcre_fullinfo()</b>, to obtain the length of the compiled pattern:
+<p>
+ int rc;<br>
+ unsigned long int length;<br>
+ rc = pcre_fullinfo(<br>
+ re, /* result of pcre_compile() */<br>
+ pe, /* result of pcre_study(), or NULL */<br>
+ PCRE_INFO_SIZE, /* what is required */<br>
+ &length); /* where to put the data */<br>
+ <p>
+The possible values for the third argument are defined in <b>pcre.h</b>, and are
+as follows: <p>
+ PCRE_INFO_BACKREFMAX<br>
+ <p>
+Return the number of the highest back reference in the pattern. The fourth
+argument should point to an <b>int</b> variable. Zero is returned if there are
+no back references. <p>
+ PCRE_INFO_CAPTURECOUNT<br>
+ <p>
+Return the number of capturing subpatterns in the pattern. The fourth argument
+should point to an <b>int</b> variable. <p>
+ PCRE_INFO_DEFAULTTABLES<br>
+ <p>
+Return a pointer to the internal default character tables within PCRE.
+The fourth argument should point to an <b>unsigned char *</b> variable. This information
+call is provided for internal use by the <b>pcre_study()</b> function. External
+callers can cause PCRE to use its internal tables by passing a NULL table
+pointer. <p>
+ PCRE_INFO_FIRSTBYTE<br>
+ <p>
+Return information about the first byte of any matched string, for a non-anchored
+pattern. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
+is still recognized for backwards compatibility.) <p>
+If there is a fixed first
+byte, for example, from a pattern such as (cat|cow|coyote), it is returned
+in the integer pointed to by <i>where</i>. Otherwise, if either <p>
+(a) the pattern
+was compiled with the PCRE_MULTILINE option, and every branch starts with
+"^", or <p>
+(b) every branch of the pattern starts with ".*" and PCRE_DOTALL
+is not set (if it were set, the pattern would be anchored), <p>
+-1 is returned,
+indicating that the pattern matches only at the start of a subject string
+or after any newline within the string. Otherwise -2 is returned. For anchored
+patterns, -2 is returned. <p>
+ PCRE_INFO_FIRSTTABLE<br>
+ <p>
+If the pattern was studied, and this resulted in the construction of a
+256-bit table indicating a fixed set of bytes for the first byte in any
+matching string, a pointer to the table is returned. Otherwise NULL is returned.
+The fourth argument should point to an <b>unsigned char *</b> variable. <p>
+ PCRE_INFO_LASTLITERAL<br>
+ <p>
+Return the value of the rightmost literal byte that must exist in any
+matched string, other than at its start, if such a byte has been recorded.
+The fourth argument should point to an <b>int</b> variable. If there is no such
+byte, -1 is returned. For anchored patterns, a last literal byte is recorded
+only if it follows something of variable length. For example, for the pattern
+/^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value is
+-1. <p>
+ PCRE_INFO_NAMECOUNT<br>
+ PCRE_INFO_NAMEENTRYSIZE<br>
+ PCRE_INFO_NAMETABLE<br>
+ <p>
+PCRE supports the use of named as well as numbered capturing parentheses.
+The names are just an additional way of identifying the parentheses, which
+still acquire numbers. A convenience function called <b>pcre_get_named_substring()</b>
+is provided for extracting an individual captured substring by name. It
+is also possible to extract the data directly, by first converting the
+name to a number in order to access the correct pointers in the output
+vector (described with <b>pcre_exec()</b> below). To do the conversion, you need
+to use the name-to-number map, which is described by these three values. <p>
+The
+map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives
+the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each
+entry; both of these return an <b>int</b> value. The entry size depends on the
+length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the
+first entry of the table (a pointer to <b>char</b>). The first two bytes of each
+entry are the number of the capturing parenthesis, most significant byte
+first. The rest of the entry is the corresponding name, zero terminated.
+The names are in alphabetical order. For example, consider the following
+pattern (assume PCRE_EXTENDED is set, so white space - including newlines
+- is ignored): <p>
+ (?P<date> (?P<year>(\d\d)?\d\d) -<br>
+ (?P<month>\d\d) - (?P<day>\d\d) )<br>
+ <p>
+There are four named subpatterns, so the table has four entries, and each
+entry in the table is eight bytes long. The table is as follows, with non-printing
+bytes shows in hexadecimal, and undefined bytes shown as ??: <p>
+ 00 01 d
+ a t e 00 ??<br>
+ 00 05 d a y 00 ?? ??<br>
+ 00 04 m o n t h 00<br>
+ 00 02 y e a r 00 ??<br>
+ <p>
+When writing code to extract data from named subpatterns using the name-to-number
+map, remember that the length of each entry is likely to be different for
+each compiled pattern. <p>
+ PCRE_INFO_OPTIONS<br>
+ <p>
+Return a copy of the options with which the pattern was compiled. The fourth
+argument should point to an <b>unsigned long int</b> variable. These option bits
+are those specified in the call to <b>pcre_compile()</b>, modified by any top-level
+option settings within the pattern itself. <p>
+A pattern is automatically anchored
+by PCRE if all of its top-level alternatives begin with one of the following:
+<p>
+ ^ unless PCRE_MULTILINE is set<br>
+ \A always<br>
+ \G always<br>
+ .* if PCRE_DOTALL is set and there are no back<br>
+ references to the subpattern in which .* appears<br>
+ <p>
+For such patterns, the PCRE_ANCHORED bit is set in the options returned
+by <b>pcre_fullinfo()</b>. <p>
+ PCRE_INFO_SIZE<br>
+ <p>
+Return the size of the compiled pattern, that is, the value that was passed
+as the argument to <b>pcre_malloc()</b> when PCRE was getting memory in which
+to place the compiled data. The fourth argument should point to a <b>size_t</b>
+variable. <p>
+ PCRE_INFO_STUDYSIZE<br>
+ <p>
+Return the size of the data block pointed to by the <i>study_data</i> field in
+a <b>pcre_extra</b> block. That is, it is the value that was passed to <b>pcre_malloc()</b>
+when PCRE was getting memory into which to place the data created by <b>pcre_study()</b>.
+The fourth argument should point to a <b>size_t</b> variable.
+<h2><a name='sect10' href='#toc10'>Obsolete Info Function</a></h2>
+
+<p>
+<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b> <b>*<i>firstcharptr</i>);</b> <p>
+The <b>pcre_info()</b>
+function is now obsolete because its interface is too restrictive to return
+all the available data about a compiled pattern. New programs should use
+<b>pcre_fullinfo()</b> instead. The yield of <b>pcre_info()</b> is the number of capturing
+subpatterns, or one of the following negative numbers: <p>
+ PCRE_ERROR_NULL
+ the argument <i>code</i> was NULL<br>
+ PCRE_ERROR_BADMAGIC the "magic number" was not found<br>
+ <p>
+If the <i>optptr</i> argument is not NULL, a copy of the options with which the
+pattern was compiled is placed in the integer it points to (see PCRE_INFO_OPTIONS
+above). <p>
+If the pattern is not anchored and the <i>firstcharptr</i> argument is
+not NULL, it is used to pass back information about the first character
+of any matched string (see PCRE_INFO_FIRSTBYTE above).
+<h2><a name='sect11' href='#toc11'>Matching a Pattern</a></h2>
+
+<p>
+<b>int pcre_exec(const pcre *<i>code</i>, "const pcre_extra *<i>extra</i>,"</b> <b>const char
+*<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b> <b>int <i>options</i>, int *<i>ovector</i>, int
+<i>ovecsize</i>);</b> <p>
+The function <b>pcre_exec()</b> is called to match a subject string
+against a compiled pattern, which is passed in the <i>code</i> argument. If the
+pattern has been studied, the result of the study should be passed in the
+<i>extra</i> argument. <p>
+In most applications, the pattern will have been compiled
+(and optionally studied) in the same process that calls <b>pcre_exec()</b>. However,
+it is possible to save compiled patterns and study data, and then use them
+later in different processes, possibly even on different hosts. For a discussion
+about this, see the <b>pcreprecompile</b> documentation. <p>
+Here is an example of
+a simple call to <b>pcre_exec()</b>: <p>
+ int rc;<br>
+ int ovector[30];<br>
+ rc = pcre_exec(<br>
+ re, /* result of pcre_compile() */<br>
+ NULL, /* we didn’t study the pattern */<br>
+ "some string", /* the subject string */<br>
+ 11, /* the length of the subject string */<br>
+ 0, /* start at offset 0 in the subject */<br>
+ 0, /* default options */<br>
+ ovector, /* vector of integers for substring information */<br>
+ 30); /* number of elements in the vector (NOT size in bytes)
+*/<br>
+
+<h3><a name='sect12' href='#toc12'>Extra data for <b>pcre_exec()</b></a></h3>
+ <p>
+If the <i>extra</i> argument is not NULL, it must
+point to a <b>pcre_extra</b> data block. The <b>pcre_study()</b> function returns such
+a block (when it doesn’t return NULL), but you can also create one for yourself,
+and pass additional information in it. The fields in a <b>pcre_extra</b> block
+are as follows: <p>
+ unsigned long int <i>flags</i>;<br>
+ void *<i>study_data</i>;<br>
+ unsigned long int <i>match_limit</i>;<br>
+ void *<i>callout_data</i>;<br>
+ const unsigned char *<i>tables</i>;<br>
+ <p>
+The <i>flags</i> field is a bitmap that specifies which of the other fields are
+set. The flag bits are: <p>
+ PCRE_EXTRA_STUDY_DATA<br>
+ PCRE_EXTRA_MATCH_LIMIT<br>
+ PCRE_EXTRA_CALLOUT_DATA<br>
+ PCRE_EXTRA_TABLES<br>
+ <p>
+Other flag bits should be set to zero. The <i>study_data</i> field is set in the
+<b>pcre_extra</b> block that is returned by <b>pcre_study()</b>, together with the appropriate
+flag bit. You should not set this yourself, but you may add to the block
+by setting the other fields and their corresponding flag bits. <p>
+The <i>match_limit</i>
+field provides a means of preventing PCRE from using up a vast amount of
+resources when running patterns that are not going to match, but which
+have a very large number of possibilities in their search trees. The classic
+example is the use of nested unlimited repeats. <p>
+Internally, PCRE uses a
+function called <b>match()</b> which it calls repeatedly (sometimes recursively).
+The limit is imposed on the number of times this function is called during
+a match, which has the effect of limiting the amount of recursion and backtracking
+that can take place. For patterns that are not anchored, the count starts
+from zero for each position in the subject string. <p>
+The default limit for
+the library can be set when PCRE is built; the default default is 10 million,
+which handles all but the most extreme cases. You can reduce the default
+by suppling <b>pcre_exec()</b> with a <b>pcre_extra</b> block in which <i>match_limit</i> is
+set to a smaller value, and PCRE_EXTRA_MATCH_LIMIT is set in the <i>flags</i>
+field. If the limit is exceeded, <b>pcre_exec()</b> returns PCRE_ERROR_MATCHLIMIT.
+<p>
+The <i>pcre_callout</i> field is used in conjunction with the "callout" feature,
+which is described in the <b>pcrecallout</b> documentation. <p>
+The <i>tables</i> field
+is used to pass a character tables pointer to <b>pcre_exec()</b>; this overrides
+the value that is stored with the compiled pattern. A non-NULL value is stored
+with the compiled pattern only if custom tables were supplied to <b>pcre_compile()</b>
+via its <i>tableptr</i> argument. If NULL is passed to <b>pcre_exec()</b> using this mechanism,
+it forces PCRE’s internal tables to be used. This facility is helpful when
+re-using patterns that have been saved after compiling with an external
+set of tables, because the external tables might be at a different address
+when <b>pcre_exec()</b> is called. See the <b>pcreprecompile</b> documentation for a
+discussion of saving compiled patterns for later use.
+<h3><a name='sect13' href='#toc13'>Option bits for <b>pcre_exec()</b></a></h3>
+
+<p>
+The unused bits of the <i>options</i> argument for <b>pcre_exec()</b> must be zero. The
+only bits that may be set are PCRE_ANCHORED, PCRE_NOTBOL, PCRE_NOTEOL,
+PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and PCRE_PARTIAL. <p>
+ PCRE_ANCHORED<br>
+ <p>
+The PCRE_ANCHORED option limits <b>pcre_exec()</b> to matching at the first matching
+position. If a pattern was compiled with PCRE_ANCHORED, or turned out to
+be anchored by virtue of its contents, it cannot be made unachored at matching
+time. <p>
+ PCRE_NOTBOL<br>
+ <p>
+This option specifies that first character of the subject string is not
+the beginning of a line, so the circumflex metacharacter should not match
+before it. Setting this without PCRE_MULTILINE (at compile time) causes
+circumflex never to match. This option affects only the behaviour of the
+circumflex metacharacter. It does not affect \A. <p>
+ PCRE_NOTEOL<br>
+ <p>
+This option specifies that the end of the subject string is not the end
+of a line, so the dollar metacharacter should not match it nor (except
+in multiline mode) a newline immediately before it. Setting this without
+PCRE_MULTILINE (at compile time) causes dollar never to match. This option
+affects only the behaviour of the dollar metacharacter. It does not affect
+\Z or \z. <p>
+ PCRE_NOTEMPTY<br>
+ <p>
+An empty string is not considered to be a valid match if this option is
+set. If there are alternatives in the pattern, they are tried. If all the
+alternatives match the empty string, the entire match fails. For example,
+if the pattern <p>
+ a?b?<br>
+ <p>
+is applied to a string not beginning with "a" or "b", it matches the empty
+string at the start of the subject. With PCRE_NOTEMPTY set, this match is
+not valid, so PCRE searches further into the string for occurrences of
+"a" or "b". <p>
+Perl has no direct equivalent of PCRE_NOTEMPTY, but it does
+make a special case of a pattern match of the empty string within its <b>split()</b>
+function, and when using the /g modifier. It is possible to emulate Perl’s
+behaviour after matching a null string by first trying the match again
+at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then if that
+fails by advancing the starting offset (see below) and trying an ordinary
+match again. There is some code that demonstrates how to do this in the
+<i>pcredemo.c</i> sample program. <p>
+ PCRE_NO_UTF8_CHECK<br>
+ <p>
+When PCRE_UTF8 is set at compile time, the validity of the subject as
+a UTF-8 string is automatically checked when <b>pcre_exec()</b> is subsequently
+called. The value of <i>startoffset</i> is also checked to ensure that it points
+to the start of a UTF-8 character. If an invalid UTF-8 sequence of bytes is
+found, <b>pcre_exec()</b> returns the error PCRE_ERROR_BADUTF8. If <i>startoffset</i>
+contains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned. <p>
+If you
+already know that your subject is valid, and you want to skip these checks
+for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when
+calling <b>pcre_exec()</b>. You might want to do this for the second and subsequent
+calls to <b>pcre_exec()</b> if you are making repeated calls to find all the matches
+in a single subject string. However, you should be sure that the value of
+<i>startoffset</i> points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK
+is set, the effect of passing an invalid UTF-8 string as a subject, or a
+value of <i>startoffset</i> that does not point to the start of a UTF-8 character,
+is undefined. Your program may crash. <p>
+ PCRE_PARTIAL<br>
+ <p>
+This option turns on the partial matching feature. If the subject string
+fails to match the pattern, but at some point during the matching process
+the end of the subject was reached (that is, the subject partially matches
+the pattern and the failure to match occurred only because there were not
+enough subject characters), <b>pcre_exec()</b> returns PCRE_ERROR_PARTIAL instead
+of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is used, there are restrictions
+on what may appear in the pattern. These are discussed in the <b>pcrepartial</b>
+ documentation.
+<h3><a name='sect14' href='#toc14'>The string to be matched by <b>pcre_exec()</b></a></h3>
+ <p>
+The subject string
+is passed to <b>pcre_exec()</b> as a pointer in <i>subject</i>, a length in <i>length</i>, and
+a starting byte offset in <i>startoffset</i>. In UTF-8 mode, the byte offset must
+point to the start of a UTF-8 character. Unlike the pattern string, the subject
+may contain binary zero bytes. When the starting offset is zero, the search
+for a match starts at the beginning of the subject, and this is by far
+the most common case. <p>
+A non-zero starting offset is useful when searching
+for another match in the same subject by calling <b>pcre_exec()</b> again after
+a previous success. Setting <i>startoffset</i> differs from just passing over a
+shortened string and setting PCRE_NOTBOL in the case of a pattern that
+begins with any kind of lookbehind. For example, consider the pattern <p>
+
+\Biss\B<br>
+ <p>
+which finds occurrences of "iss" in the middle of words. (\B matches only
+if the current position in the subject is not a word boundary.) When applied
+to the string "Mississipi" the first call to <b>pcre_exec()</b> finds the first
+occurrence. If <b>pcre_exec()</b> is called again with just the remainder of the
+subject, namely "issipi", it does not match, because \B is always false
+at the start of the subject, which is deemed to be a word boundary. However,
+if <b>pcre_exec()</b> is passed the entire string again, but with <i>startoffset</i>
+set to 4, it finds the second occurrence of "iss" because it is able to
+look behind the starting point to discover that it is preceded by a letter.
+<p>
+If a non-zero starting offset is passed when the pattern is anchored, one
+attempt to match at the given offset is made. This can only succeed if the
+pattern does not require the match to be at the start of the subject.
+<h3><a name='sect15' href='#toc15'>How
+<b>pcre_exec()</b> returns captured substrings</a></h3>
+ <p>
+In general, a pattern matches a
+certain portion of the subject, and in addition, further substrings from
+the subject may be picked out by parts of the pattern. Following the usage
+in Jeffrey Friedl’s book, this is called "capturing" in what follows, and
+the phrase "capturing subpattern" is used for a fragment of a pattern that
+picks out a substring. PCRE supports several other kinds of parenthesized
+subpattern that do not cause substrings to be captured. <p>
+Captured substrings
+are returned to the caller via a vector of integer offsets whose address
+is passed in <i>ovector</i>. The number of elements in the vector is passed in
+<i>ovecsize</i>, which must be a non-negative number. <b>Note</b>: this argument is NOT
+the size of <i>ovector</i> in bytes. <p>
+The first two-thirds of the vector is used
+to pass back captured substrings, each substring using a pair of integers.
+The remaining third of the vector is used as workspace by <b>pcre_exec()</b> while
+matching capturing subpatterns, and is not available for passing back information.
+The length passed in <i>ovecsize</i> should always be a multiple of three. If it
+is not, it is rounded down. <p>
+When a match is successful, information about
+captured substrings is returned in pairs of integers, starting at the beginning
+of <i>ovector</i>, and continuing up to two-thirds of its length at the most. The
+first element of a pair is set to the offset of the first character in
+a substring, and the second is set to the offset of the first character
+after the end of a substring. The first pair, <i>ovector[0]</i> and <i>ovector[1]</i>,
+identify the portion of the subject string matched by the entire pattern.
+The next pair is used for the first capturing subpattern, and so on. The
+value returned by <b>pcre_exec()</b> is the number of pairs that have been set.
+If there are no capturing subpatterns, the return value from a successful
+match is 1, indicating that just the first pair of offsets has been set.
+<p>
+Some convenience functions are provided for extracting the captured substrings
+as separate strings. These are described in the following section. <p>
+It is
+possible for an capturing subpattern number <i>n+1</i> to match some part of the
+subject when subpattern <i>n</i> has not been used at all. For example, if the
+string "abc" is matched against the pattern (a|(z))(bc) subpatterns 1 and
+3 are matched, but 2 is not. When this happens, both offset values corresponding
+to the unused subpattern are set to -1. <p>
+If a capturing subpattern is matched
+repeatedly, it is the last portion of the string that it matched that is
+returned. <p>
+If the vector is too small to hold all the captured substring
+offsets, it is used as far as possible (up to two-thirds of its length),
+and the function returns a value of zero. In particular, if the substring
+offsets are not of interest, <b>pcre_exec()</b> may be called with <i>ovector</i> passed
+as NULL and <i>ovecsize</i> as zero. However, if the pattern contains back references
+and the <i>ovector</i> is not big enough to remember the related substrings, PCRE
+has to get additional memory for use during matching. Thus it is usually
+advisable to supply an <i>ovector</i>. <p>
+Note that <b>pcre_info()</b> can be used to find
+out how many capturing subpatterns there are in a compiled pattern. The
+smallest size for <i>ovector</i> that will allow for <i>n</i> captured substrings, in
+addition to the offsets of the substring matched by the whole pattern,
+is (<i>n</i>+1)*3.
+<h3><a name='sect16' href='#toc16'>Return values from <b>pcre_exec()</b></a></h3>
+ <p>
+If <b>pcre_exec()</b> fails, it returns
+a negative number. The following are defined in the header file: <p>
+ PCRE_ERROR_NOMATCH
+ (-1)<br>
+ <p>
+The subject string did not match the pattern. <p>
+ PCRE_ERROR_NULL
+ (-2)<br>
+ <p>
+Either <i>code</i> or <i>subject</i> was passed as NULL, or <i>ovector</i> was NULL and <i>ovecsize</i>
+was not zero. <p>
+ PCRE_ERROR_BADOPTION (-3)<br>
+ <p>
+An unrecognized bit was set in the <i>options</i> argument. <p>
+ PCRE_ERROR_BADMAGIC
+ (-4)<br>
+ <p>
+PCRE stores a 4-byte "magic number" at the start of the compiled code,
+to catch the case when it is passed a junk pointer and to detect when a
+pattern that was compiled in an environment of one endianness is run in
+an environment with the other endianness. This is the error that PCRE gives
+when the magic number is not present. <p>
+ PCRE_ERROR_UNKNOWN_NODE (-5)<br>
+ <p>
+While running the pattern match, an unknown item was encountered in the
+compiled pattern. This error could be caused by a bug in PCRE or by overwriting
+of the compiled pattern. <p>
+ PCRE_ERROR_NOMEMORY (-6)<br>
+ <p>
+If a pattern contains back references, but the <i>ovector</i> that is passed
+to <b>pcre_exec()</b> is not big enough to remember the referenced substrings,
+PCRE gets a block of memory at the start of matching to use for this purpose.
+If the call via <b>pcre_malloc()</b> fails, this error is given. The memory is
+automatically freed at the end of matching. <p>
+ PCRE_ERROR_NOSUBSTRING
+(-7)<br>
+ <p>
+This error is used by the <b>pcre_copy_substring()</b>, <b>pcre_get_substring()</b>,
+and <b>pcre_get_substring_list()</b> functions (see below). It is never returned
+by <b>pcre_exec()</b>. <p>
+ PCRE_ERROR_MATCHLIMIT (-8)<br>
+ <p>
+The recursion and backtracking limit, as specified by the <i>match_limit</i>
+field in a <b>pcre_extra</b> structure (or defaulted) was reached. See the description
+above. <p>
+ PCRE_ERROR_CALLOUT (-9)<br>
+ <p>
+This error is never generated by <b>pcre_exec()</b> itself. It is provided for
+use by callout functions that want to yield a distinctive error code. See
+the <b>pcrecallout</b> documentation for details. <p>
+ PCRE_ERROR_BADUTF8
+ (-10)<br>
+ <p>
+A string that contains an invalid UTF-8 byte sequence was passed as a subject.
+<p>
+ PCRE_ERROR_BADUTF8_OFFSET (-11)<br>
+ <p>
+The UTF-8 byte sequence that was passed as a subject was valid, but the
+value of <i>startoffset</i> did not point to the beginning of a UTF-8 character.
+<p>
+ PCRE_ERROR_PARTIAL (-12)<br>
+ <p>
+The subject string did not match, but it did match partially. See the
+<b>pcrepartial</b> documentation for details of partial matching. <p>
+ PCRE_ERROR_BAD_PARTIAL
+(-13)<br>
+ <p>
+The PCRE_PARTIAL option was used with a compiled pattern containing items
+that are not supported for partial matching. See the <b>pcrepartial</b> documentation
+for details of partial matching. <p>
+ PCRE_ERROR_INTERNAL (-14)<br>
+ <p>
+An unexpected internal error has occurred. This error could be caused by
+a bug in PCRE or by overwriting of the compiled pattern. <p>
+ PCRE_ERROR_BADCOUNT
+(-15)<br>
+ <p>
+This error is given if the value of the <i>ovecsize</i> argument is negative.
+
+<h2><a name='sect17' href='#toc17'>Extracting Captured Substrings by Number</a></h2>
+ <p>
+<b>int pcre_copy_substring(const
+char *<i>subject</i>, int *<i>ovector</i>,</b> <b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
+ <b>int <i>buffersize</i>);</b> <p>
+<br>
+<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b> <b>int <i>stringcount</i>,
+int <i>stringnumber</i>,</b> <b>const char **<i>stringptr</i>);</b> <p>
+<br>
+<b>int pcre_get_substring_list(const char *<i>subject</i>,</b> <b>int *<i>ovector</i>, int <i>stringcount</i>,
+"const char ***<i>listptr</i>);"</b> <p>
+Captured substrings can be accessed directly
+by using the offsets returned by <b>pcre_exec()</b> in <i>ovector</i>. For convenience,
+the functions <b>pcre_copy_substring()</b>, <b>pcre_get_substring()</b>, and <b>pcre_get_substring_list()</b>
+are provided for extracting captured substrings as new, separate, zero-terminated
+strings. These functions identify substrings by number. The next section
+describes functions for extracting named substrings. A substring that contains
+a binary zero is correctly extracted and has a further zero added on the
+end, but the result is not, of course, a C string. <p>
+The first three arguments
+are the same for all three of these functions: <i>subject</i> is the subject string
+that has just been successfully matched, <i>ovector</i> is a pointer to the vector
+of integer offsets that was passed to <b>pcre_exec()</b>, and <i>stringcount</i> is the
+number of substrings that were captured by the match, including the substring
+that matched the entire regular expression. This is the value returned by
+<b>pcre_exec()</b> if it is greater than zero. If <b>pcre_exec()</b> returned zero, indicating
+that it ran out of space in <i>ovector</i>, the value passed as <i>stringcount</i> should
+be the number of elements in the vector divided by three. <p>
+The functions
+<b>pcre_copy_substring()</b> and <b>pcre_get_substring()</b> extract a single substring,
+whose number is given as <i>stringnumber</i>. A value of zero extracts the substring
+that matched the entire pattern, whereas higher values extract the captured
+substrings. For <b>pcre_copy_substring()</b>, the string is placed in <i>buffer</i>, whose
+length is given by <i>buffersize</i>, while for <b>pcre_get_substring()</b> a new block
+of memory is obtained via <b>pcre_malloc</b>, and its address is returned via
+<i>stringptr</i>. The yield of the function is the length of the string, not including
+the terminating zero, or one of <p>
+ PCRE_ERROR_NOMEMORY (-6)<br>
+ <p>
+The buffer was too small for <b>pcre_copy_substring()</b>, or the attempt to
+get memory failed for <b>pcre_get_substring()</b>. <p>
+ PCRE_ERROR_NOSUBSTRING
+(-7)<br>
+ <p>
+There is no substring whose number is <i>stringnumber</i>. <p>
+The <b>pcre_get_substring_list()</b>
+function extracts all available substrings and builds a list of pointers
+to them. All this is done in a single block of memory that is obtained via
+<b>pcre_malloc</b>. The address of the memory block is returned via <i>listptr</i>, which
+is also the start of the list of string pointers. The end of the list is
+marked by a NULL pointer. The yield of the function is zero if all went
+well, or <p>
+ PCRE_ERROR_NOMEMORY (-6)<br>
+ <p>
+if the attempt to get the memory block failed. <p>
+When any of these functions
+encounter a substring that is unset, which can happen when capturing subpattern
+number <i>n+1</i> matches some part of the subject, but subpattern <i>n</i> has not been
+used at all, they return an empty string. This can be distinguished from
+a genuine zero-length substring by inspecting the appropriate offset in
+<i>ovector</i>, which is negative for unset substrings. <p>
+The two convenience functions
+<b>pcre_free_substring()</b> and <b>pcre_free_substring_list()</b> can be used to free
+the memory returned by a previous call of <b>pcre_get_substring()</b> or <b>pcre_get_substring_list()</b>,
+respectively. They do nothing more than call the function pointed to by
+<b>pcre_free</b>, which of course could be called directly from a C program. However,
+PCRE is used in some situations where it is linked via a special interface
+to another programming language which cannot use <b>pcre_free</b> directly; it
+is for these cases that the functions are provided.
+<h2><a name='sect18' href='#toc18'>Extracting Captured
+Substrings by Name</a></h2>
+ <p>
+<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b> <b>const char
+*<i>name</i>);</b> <p>
+<br>
+<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b> <b>const char *<i>subject</i>, int
+*<i>ovector</i>,</b> <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b> <b>char *<i>buffer</i>, int
+<i>buffersize</i>);</b> <p>
+<br>
+<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b> <b>const char *<i>subject</i>, int
+*<i>ovector</i>,</b> <b>int <i>stringcount</i>, const char *<i>stringname</i>,</b> <b>const char **<i>stringptr</i>);</b>
+<p>
+To extract a substring by name, you first have to find associated number.
+For example, for this pattern <p>
+ (a+)b(?<xxx>\d+)...<br>
+ <p>
+the number of the subpattern called "xxx" is 2. You can find the number
+from the name by calling <b>pcre_get_stringnumber()</b>. The first argument is
+the compiled pattern, and the second is the name. The yield of the function
+is the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no
+subpattern of that name. <p>
+Given the number, you can extract the substring
+directly, or use one of the functions described in the previous section.
+For convenience, there are also two functions that do the whole job. <p>
+Most
+of the arguments of <i>pcre_copy_named_substring()</i> and <i>pcre_get_named_substring()</i>
+are the same as those for the similarly named functions that extract by
+number. As these are described in the previous section, they are not re-described
+here. There are just two differences: <p>
+First, instead of a substring number,
+a substring name is given. Second, there is an extra argument, given at
+the start, which is a pointer to the compiled pattern. This is needed in
+order to gain access to the name-to-number translation table. <p>
+These functions
+call <b>pcre_get_stringnumber()</b>, and if it succeeds, they then call <i>pcre_copy_substring()</i>
+or <i>pcre_get_substring()</i>, as appropriate. <p>
+ Last updated: 09 September 2004
+<br>
+Copyright (c) 1997-2004 University of Cambridge. <p>
+
+<hr><p>
+<a name='toc'><b>Table of Contents</b></a><p>
+<ul>
+<li><a name='toc0' href='#sect0'>Name</a></li>
+<li><a name='toc1' href='#sect1'>Pcre Native API</a></li>
+<li><a name='toc2' href='#sect2'>Pcre API Overview</a></li>
+<li><a name='toc3' href='#sect3'>Multithreading</a></li>
+<li><a name='toc4' href='#sect4'>Saving Precompiled Patterns for Later Use</a></li>
+<li><a name='toc5' href='#sect5'>Checking Build-time Options</a></li>
+<li><a name='toc6' href='#sect6'>Compiling a Pattern</a></li>
+<li><a name='toc7' href='#sect7'>Studying a Pattern</a></li>
+<li><a name='toc8' href='#sect8'>Locale Support</a></li>
+<li><a name='toc9' href='#sect9'>Information About a Pattern</a></li>
+<li><a name='toc10' href='#sect10'>Obsolete Info Function</a></li>
+<li><a name='toc11' href='#sect11'>Matching a Pattern</a></li>
+<ul>
+<li><a name='toc12' href='#sect12'>Extra data for pcre_exec()</a></li>
+<li><a name='toc13' href='#sect13'>Option bits for pcre_exec()</a></li>
+<li><a name='toc14' href='#sect14'>The string to be matched by pcre_exec()</a></li>
+<li><a name='toc15' href='#sect15'>How pcre_exec() returns captured substrings</a></li>
+<li><a name='toc16' href='#sect16'>Return values from pcre_exec()</a></li>
+</ul>
+<li><a name='toc17' href='#sect17'>Extracting Captured Substrings by Number</a></li>
+<li><a name='toc18' href='#sect18'>Extracting Captured Substrings by Name</a></li>
+</ul>
+</body>
+</html>
|