PCRE(3) manual page

From 7fd9fe181150f166a098eaf4e006f878c28cb770 Mon Sep 17 00:00:00 2001 From: Gluzskiy Alexandr Date: Mon, 15 Feb 2010 05:51:01 +0300 Subject: sort --- Utilities/PCRE/man/html/pcreposix.3.html | 187 +++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 Utilities/PCRE/man/html/pcreposix.3.html (limited to 'Utilities/PCRE/man/html/pcreposix.3.html') diff --git a/Utilities/PCRE/man/html/pcreposix.3.html b/Utilities/PCRE/man/html/pcreposix.3.html new file mode 100644 index 0000000..0e7cafd --- /dev/null +++ b/Utilities/PCRE/man/html/pcreposix.3.html @@ -0,0 +1,187 @@ + + + + + +PCRE(3) manual page + + +Table of Contents

+ +

Name

+PCRE - Perl-compatible regular expressions. +

Synopsis of Posix API

+#include +<pcreposix.h>

+ +
+int regcomp(regex_t *preg, const char *pattern, int cflags);

+
+int regexec(regex_t *preg, const char *string, size_t nmatch, regmatch_t +pmatch[], int eflags);

+
+size_t regerror(int errcode, const regex_t *preg, char *errbuf, size_t +errbuf_size);

+
+void regfree(regex_t *preg); +

Description

+This set of functions provides +a POSIX-style API to the PCRE regular expression package. See the pcreapi + documentation for a description of PCRE’s native API, which contains additional +functionality.

+The functions described here are just wrapper functions that +ultimately call the PCRE native API. Their prototypes are defined in the +pcreposix.h header file, and on Unix systems the library itself is called +pcreposix.a, so can be accessed by adding -lpcreposix to the command for +linking an application that uses them. Because the POSIX functions call +the native ones, it is also necessary to add -lpcre.

+I have implemented only +those option bits that can be reasonably mapped to PCRE native options. +In addition, the options REG_EXTENDED and REG_NOSUB are defined with the +value zero. They have no effect, but since programs that are written to +the POSIX interface often use them, this makes it easier to slot in PCRE +as a replacement library. Other POSIX options are not even defined.

+When +PCRE is called via these functions, it is only the API that is POSIX-like +in style. The syntax and semantics of the regular expressions themselves +are still those of Perl, subject to the setting of various PCRE options, +as described below. "POSIX-like in style" means that the API approximates +to the POSIX definition; it is not fully POSIX-compatible, and in multi-byte +encoding domains it is probably even less compatible.

+The header for these +functions is supplied as pcreposix.h to avoid any potential clash with other +POSIX libraries. It can, of course, be renamed or aliased as regex.h, which +is the "correct" name. It provides two structure types, regex_t for compiled +internal forms, and regmatch_t for returning captured substrings. It also +defines some constants whose names start with "REG_"; these are used for +setting options and identifying error codes.

+ +

Compiling a Pattern

+The function +regcomp() is called to compile a pattern into an internal form. The pattern +is a C string terminated by a binary zero, and is passed in the argument +pattern. The preg argument is a pointer to a regex_t structure that is used +as a base for storing information about the compiled expression.

+The argument +cflags is either zero, or contains one or more of the bits defined by the +following macros:

+ REG_ICASE
+

+The PCRE_CASELESS option is set when the expression is passed for compilation +to the native function.

+ REG_NEWLINE
+

+The PCRE_MULTILINE option is set when the expression is passed for compilation +to the native function. Note that this does not mimic the defined POSIX +behaviour for REG_NEWLINE (see the following section).

+In the absence of +these flags, no options are passed to the native function. This means the +the regex is compiled with PCRE default semantics. In particular, the way +it handles newline characters in the subject string is the Perl way, not +the POSIX way. Note that setting PCRE_MULTILINE has only some of the effects +specified for REG_NEWLINE. It does not affect the way newlines are matched +by . (they aren’t) or by a negative class such as [^a] (they are).

+The yield +of regcomp() is zero on success, and non-zero otherwise. The preg structure +is filled in on success, and one member of the structure is public: re_nsub +contains the number of capturing subpatterns in the regular expression. +Various error codes are defined in the header file. +

Matching Newline Characters

+ +

+This area is not simple, because POSIX and Perl take different views of +things. It is not possible to get PCRE to obey POSIX semantics, but then +PCRE was never intended to be a POSIX engine. The following table lists +the different possibilities for matching newline characters in PCRE:

+ + Default Change with
+

+ . matches newline no PCRE_DOTALL
+ newline matches [^a] yes not changeable
+ $ matches \n at end yes PCRE_DOLLARENDONLY
+ $ matches \n in middle no PCRE_MULTILINE
+ ^ matches \n in middle no PCRE_MULTILINE
+

+This is the equivalent table for POSIX:

+ Default + Change with
+

+ . matches newline yes REG_NEWLINE
+ newline matches [^a] yes REG_NEWLINE
+ $ matches \n at end no REG_NEWLINE
+ $ matches \n in middle no REG_NEWLINE
+ ^ matches \n in middle no REG_NEWLINE
+

+PCRE’s behaviour is the same as Perl’s, except that there is no equivalent +for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way +to stop newline from matching [^a].

+The default POSIX newline handling can +be obtained by setting PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is +no way to make PCRE behave exactly as for the REG_NEWLINE action. +

Matching +a Pattern

+The function regexec() is called to match a compiled pattern +preg against a given string, which is terminated by a zero byte, subject +to the options in eflags. These can be:

+ REG_NOTBOL
+

+The PCRE_NOTBOL option is set when calling the underlying PCRE matching +function.

+ REG_NOTEOL
+

+The PCRE_NOTEOL option is set when calling the underlying PCRE matching +function.

+The portion of the string that was matched, and also any captured +substrings, are returned via the pmatch argument, which points to an array +of nmatch structures of type regmatch_t, containing the members rm_so and +rm_eo. These contain the offset to the first character of each substring +and the offset to the first character after the end of each substring, +respectively. The 0th element of the vector relates to the entire portion +of string that was matched; subsequent elements relate to the capturing +subpatterns of the regular expression. Unused entries in the array have +both structure members set to -1.

+A successful match yields a zero return; +various error codes are defined in the header file, of which REG_NOMATCH +is the "expected" failure code. +

Error Messages

+The regerror() function +maps a non-zero errorcode from either regcomp() or regexec() to a printable +message. If preg is not NULL, the error should have arisen from the use +of that structure. A message terminated by a binary zero is placed in errbuf. +The length of the message, including the zero, is limited to errbuf_size. +The yield of the function is the size of buffer needed to hold the whole +message. +

Memory Usage

+Compiling a regular expression causes memory to +be allocated and associated with the preg structure. The function regfree() +frees all such memory, after which preg may no longer be used as a compiled +expression. +

Author

+Philip Hazel <ph10@cam.ac.uk>
+University Computing Service,
+Cambridge CB2 3QG, England.

+ Last updated: 07 September 2004
+Copyright (c) 1997-2004 University of Cambridge.

+ +

+Table of Contents

Name
Synopsis of Posix API
Description
Compiling a Pattern
Matching Newline Characters
Matching a Pattern
Error Messages
Memory Usage
Author

+ + -- cgit v1.2.3