From 4b47e5a4bb656ebb5bd493d1ad6f79eaf4f298e1 Mon Sep 17 00:00:00 2001
From: Kirill Volinsky
+This document relates to PCRE releases that use the original API,
+with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
+first release of a new API, known as PCRE2, with release numbers starting at
+10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
+libraries (now called PCRE1) are still being maintained for bug fixes, but
+there will be no new development. New projects are advised to use the new PCRE2
+libraries.
+
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
@@ -115,7 +126,7 @@ clashes. In some environments, it is possible to control which external symbols
are exported when a shared library is built, and in these cases the
undocumented symbols are not exported.
If you are using PCRE in a non-UTF application that permits users to supply
arbitrary patterns for compilation, you should be aware of a feature that
@@ -149,7 +160,7 @@ against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the
pcreapi
page.
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
@@ -188,7 +199,7 @@ follows:
In the "man" and HTML formats, there is also a short page for each C library
function, listing its arguments and results.
Philip Hazel
-Last updated: 08 January 2014
+Last updated: 10 February 2015
Return to the PCRE index page.
diff --git a/libs/Pcre16/docs/doc/html/pcre_config.html b/libs/Pcre16/docs/doc/html/pcre_config.html
index bcdcdded70..72fb9caa1f 100644
--- a/libs/Pcre16/docs/doc/html/pcre_config.html
+++ b/libs/Pcre16/docs/doc/html/pcre_config.html
@@ -39,8 +39,10 @@ arguments are as follows:
where Points to where to put the data
The where argument must point to an integer variable, except for
-PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
-point to an unsigned long integer. The available codes are:
+PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
+PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
+and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
+The available codes are:
-
-
INTRODUCTION
+
PLEASE TAKE NOTE
+
INTRODUCTION
SECURITY CONSIDERATIONS
+
SECURITY CONSIDERATIONS
USER DOCUMENTATION
+
USER DOCUMENTATION
AUTHOR
+
AUTHOR
@@ -202,11 +213,11 @@ Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my two initials, followed by the
two digits 10, at the domain cam.ac.uk.
REVISION
+
REVISION
-Copyright © 1997-2014 University of Cambridge.
+Copyright © 1997-2015 University of Cambridge.
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
diff --git a/libs/Pcre16/docs/doc/html/pcre_fullinfo.html b/libs/Pcre16/docs/doc/html/pcre_fullinfo.html
index b88fc1155b..2b7c72b3b9 100644
--- a/libs/Pcre16/docs/doc/html/pcre_fullinfo.html
+++ b/libs/Pcre16/docs/doc/html/pcre_fullinfo.html
@@ -57,6 +57,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
+ PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
+ 0 otherwise
+ PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
+ PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
@@ -72,6 +76,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
+ PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
@@ -79,14 +84,18 @@ The following information is available:
The where argument must point to an integer variable, except for the
following what values:
- PCRE_INFO_DEFAULT_TABLES const unsigned char *
- PCRE_INFO_FIRSTTABLE const unsigned char *
+ PCRE_INFO_DEFAULT_TABLES const uint8_t *
+ PCRE_INFO_FIRSTCHARACTER uint32_t
+ PCRE_INFO_FIRSTTABLE const uint8_t *
+ PCRE_INFO_JITSIZE size_t
+ PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
- PCRE_INFO_FIRSTCHARACTER uint32_t
+ PCRE_INFO_STUDYSIZE size_t
+ PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
The yield of the function is zero on success or:
@@ -95,6 +104,7 @@ The yield of the function is zero on success or:
the argument where was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of what was invalid
+ PCRE_ERROR_UNSET the option was not set
diff --git a/libs/Pcre16/docs/doc/html/pcreapi.html b/libs/Pcre16/docs/doc/html/pcreapi.html index b401ecc76d..2d7adf185a 100644 --- a/libs/Pcre16/docs/doc/html/pcreapi.html +++ b/libs/Pcre16/docs/doc/html/pcreapi.html @@ -315,9 +315,8 @@ documentation for details of how to do this. It is a non-standard way of building PCRE, for use in environments that have limited stacks. Because of the greater use of memory management, it runs more slowly. Separate functions are provided so that special-purpose external code can be used for this case. When -used, these functions are always called in a stack-like manner (last obtained, -first freed), and always for memory blocks of the same size. There is a -discussion about PCRE's stack usage in the +used, these functions always allocate memory blocks of the same size. There is +a discussion about PCRE's stack usage in the pcrestack documentation.
@@ -2913,9 +2912,9 @@ Cambridge CB2 3QH, England.
-Last updated: 09 February 2014
+Last updated: 18 December 2015
-Copyright © 1997-2014 University of Cambridge.
+Copyright © 1997-2015 University of Cambridge.
Return to the PCRE index page. diff --git a/libs/Pcre16/docs/doc/html/pcrecompat.html b/libs/Pcre16/docs/doc/html/pcrecompat.html index 3e6226692e..d95570ef17 100644 --- a/libs/Pcre16/docs/doc/html/pcrecompat.html +++ b/libs/Pcre16/docs/doc/html/pcrecompat.html @@ -128,7 +128,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". 14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern names is not as general as Perl's. This is a consequence of the fact the PCRE works internally just with numbers, using an external table to translate -between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B), +between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), where the two capturing parentheses have the same number but different names, is not supported, and causes an error at compile time. If it were allowed, it would not be possible to distinguish which parentheses matched, because both diff --git a/libs/Pcre16/docs/doc/html/pcrejit.html b/libs/Pcre16/docs/doc/html/pcrejit.html index 210f1da026..abb342522f 100644 --- a/libs/Pcre16/docs/doc/html/pcrejit.html +++ b/libs/Pcre16/docs/doc/html/pcrejit.html @@ -79,9 +79,12 @@ API that is JIT-specific.
If your program may sometimes be linked with versions of PCRE that are older -than 8.20, but you want to use JIT when it is available, you can test -the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such -as PCRE_CONFIG_JIT, for compile-time control of your code. +than 8.20, but you want to use JIT when it is available, you can test the +values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such as +PCRE_CONFIG_JIT, for compile-time control of your code. Also beware that the +pcre_jit_exec() function was not available at all before 8.32, +and may not be available at all if PCRE isn't compiled with +--enable-jit. See the "JIT FAST PATH API" section below for details.
@@ -119,6 +122,20 @@ when you call pcre_study(): PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE +If using pcre_jit_exec() and supporting a pre-8.32 version of +PCRE, you can insert: +
+ #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 + pcre_jit_exec(...); + #else + pcre_exec(...) + #endif ++but as described in the "JIT FAST PATH API" section below this assumes +version 8.32 and later are compiled with --enable-jit, which may +break. +
+Note that the pcre_jit_exec() function is not available in versions of +PCRE before 8.32 (released in November 2012). If you need to support versions +that old you must either use the slower pcre_exec(), or switch between +the two codepaths by checking the values of PCRE_MAJOR and PCRE_MINOR. +
++Due to an unfortunate implementation oversight, even in versions 8.32 +and later there will be no pcre_jit_exec() stub function defined +when PCRE is compiled with --disable-jit, which is the default, and +there's no way to detect whether PCRE was compiled with --enable-jit +via a macro. +
++If you need to support versions older than 8.32, or versions that may +not build with --enable-jit, you must either use the slower +pcre_exec(), or switch between the two codepaths by checking the +values of PCRE_MAJOR and PCRE_MINOR. +
++Switching between the two by checking the version assumes that all the +versions being targeted are built with --enable-jit. To also support +builds that may use --disable-jit either pcre_exec() must be +used, or a compile-time check for JIT via pcre_config() (which +assumes the runtime environment will be the same), or as the Git +project decided to do, simply assume that pcre_jit_exec() is +present in 8.32 or later unless a compile-time flag is provided, see +the "grep: un-break building with PCRE >= 8.32 without --enable-jit" +commit in git.git for an example of that. +
pcreapi(3) @@ -443,9 +490,9 @@ Cambridge CB2 3QH, England.
-Last updated: 17 March 2013
+Last updated: 05 July 2017
-Copyright © 1997-2013 University of Cambridge.
+Copyright © 1997-2017 University of Cambridge.
Return to the PCRE index page. diff --git a/libs/Pcre16/docs/doc/html/pcrepattern.html b/libs/Pcre16/docs/doc/html/pcrepattern.html index c06d1e03f1..96fc72986f 100644 --- a/libs/Pcre16/docs/doc/html/pcrepattern.html +++ b/libs/Pcre16/docs/doc/html/pcrepattern.html @@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern, but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents: +one of the following escape sequences than the binary character it represents. +In an ASCII or Unicode environment, these escapes are as follows:
\a alarm, that is, the BEL character (hex 07) \cx "control-x", where x is any ASCII character @@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a compile-time error occurs. This locks out non-ASCII characters in all modes.-The \c facility was designed for use with ASCII characters, but with the -extension to Unicode it is even less useful than it once was. It is, however, -recognized when PCRE is compiled in EBCDIC mode, where data items are always -bytes. In this mode, all values are valid after \c. If the next character is a -lower case letter, it is converted to upper case. Then the 0xc0 bits of the -byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because -the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other -characters also generate different values. +When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t +generate the appropriate EBCDIC code values. The \c escape is processed +as specified for Perl in the perlebcdic document. The only characters +that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any +other character provokes a compile-time error. The sequence \c@ encodes +character code 0; after \c the letters (in either case) encode characters 1-26 +(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex +1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F). +
++Thus, apart from \c?, these escapes generate the same character code values as +they do in an ASCII environment, though the meanings of the values mostly +differ. For example, \cG always generates code value 7, which is BEL in ASCII +but DEL in EBCDIC. +
++The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but +because 127 is not a control character in EBCDIC, Perl makes it generate the +APC character. Unfortunately, there are several variants of EBCDIC. In most of +them the APC character has the value 255 (hex FF), but in the one Perl calls +POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC +values, PCRE makes \c? generate 95; otherwise it generates 255.
After \0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \0\x\07 -specifies two binary zeros followed by a BEL character (code value 7). Make +digits, just those that are present are used. Thus the sequence \0\x\015 +specifies two binary zeros followed by a CR character (code value 13). Make sure you supply two digits after the initial zero if the pattern character that follows is itself an octal digit.
@@ -703,6 +718,7 @@ Armenian, Avestan, Balinese, Bamum, +Bassa_Vah, Batak, Bengali, Bopomofo, @@ -712,6 +728,7 @@ Buginese, Buhid, Canadian_Aboriginal, Carian, +Caucasian_Albanian, Chakma, Cham, Cherokee, @@ -722,11 +739,14 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Duployan, Egyptian_Hieroglyphs, +Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, +Grantha, Greek, Gujarati, Gurmukhi, @@ -746,40 +766,56 @@ Katakana, Kayah_Li, Kharoshthi, Khmer, +Khojki, +Khudawadi, Lao, Latin, Lepcha, Limbu, +Linear_A, Linear_B, Lisu, Lycian, Lydian, +Mahajani, Malayalam, Mandaic, +Manichaean, Meetei_Mayek, +Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, +Modi, Mongolian, +Mro, Myanmar, +Nabataean, New_Tai_Lue, Nko, Ogham, +Ol_Chiki, Old_Italic, +Old_North_Arabian, +Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, -Ol_Chiki, Oriya, Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, Phags_Pa, Phoenician, +Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, +Siddham, Sinhala, Sora_Sompeng, Sundanese, @@ -797,8 +833,10 @@ Thaana, Thai, Tibetan, Tifinagh, +Tirhuta, Ugaritic, Vai, +Warang_Citi, Yi.@@ -1474,13 +1512,8 @@ J, U and X respectively.
When one of these option changes occurs at top level (that is, not inside subpattern parentheses), the change applies to the remainder of the pattern -that follows. If the change is placed right at the start of a pattern, PCRE -extracts it into the global options (and it will therefore show up in data -extracted by the pcre_fullinfo() function). -
--An option change within a subpattern (see below for a description of -subpatterns) affects only that part of the subpattern that follows it, so +that follows. An option change within a subpattern (see below for a description +of subpatterns) affects only that part of the subpattern that follows it, so
(a(?i)b)c@@ -2122,6 +2155,14 @@ capturing is carried out only for positive assertions. (Perl sometimes, but not always, does do capturing in negative assertions.)+WARNING: If a positive assertion containing one or more capturing subpatterns +succeeds, but failure to match later in the pattern causes backtracking over +this assertion, the captures within the assertion are reset only if no higher +numbered captures are already set. This is, unfortunately, a fundamental +limitation of the current implementation, and as PCRE1 is now in +maintenance-only status, it is unlikely ever to change. +
+For compatibility with Perl, assertion subpatterns may be repeated; though it makes no sense to assert the same thing several times, the side effect of capturing parentheses may occasionally be useful. In practice, there only three @@ -3226,9 +3267,9 @@ Cambridge CB2 3QH, England.
REVISION
-Last updated: 08 January 2014 +Last updated: 23 October 2016
-Copyright © 1997-2014 University of Cambridge. +Copyright © 1997-2016 University of Cambridge.
Return to the PCRE index page. diff --git a/libs/Pcre16/docs/doc/html/pcresyntax.html b/libs/Pcre16/docs/doc/html/pcresyntax.html index 89f35737b4..5896b9e068 100644 --- a/libs/Pcre16/docs/doc/html/pcresyntax.html +++ b/libs/Pcre16/docs/doc/html/pcresyntax.html @@ -171,6 +171,7 @@ Armenian, Avestan, Balinese, Bamum, +Bassa_Vah, Batak, Bengali, Bopomofo, @@ -180,6 +181,7 @@ Buginese, Buhid, Canadian_Aboriginal, Carian, +Caucasian_Albanian, Chakma, Cham, Cherokee, @@ -190,11 +192,14 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Duployan, Egyptian_Hieroglyphs, +Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, +Grantha, Greek, Gujarati, Gurmukhi, @@ -214,40 +219,56 @@ Katakana, Kayah_Li, Kharoshthi, Khmer, +Khojki, +Khudawadi, Lao, Latin, Lepcha, Limbu, +Linear_A, Linear_B, Lisu, Lycian, Lydian, +Mahajani, Malayalam, Mandaic, +Manichaean, Meetei_Mayek, +Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, +Modi, Mongolian, +Mro, Myanmar, +Nabataean, New_Tai_Lue, Nko, Ogham, +Ol_Chiki, Old_Italic, +Old_North_Arabian, +Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, -Ol_Chiki, Oriya, Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, Phags_Pa, Phoenician, +Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, +Siddham, Sinhala, Sora_Sompeng, Sundanese, @@ -265,8 +286,10 @@ Thaana, Thai, Tibetan, Tifinagh, +Tirhuta, Ugaritic, Vai, +Warang_Citi, Yi.
CHARACTER CLASSES
diff --git a/libs/Pcre16/docs/doc/html/pcretest.html b/libs/Pcre16/docs/doc/html/pcretest.html index 839fabf189..ba540d3c38 100644 --- a/libs/Pcre16/docs/doc/html/pcretest.html +++ b/libs/Pcre16/docs/doc/html/pcretest.html @@ -74,6 +74,11 @@ newline as data characters. However, in some Windows environments character 26 maximum portability, therefore, it is safest to use only ASCII characters in pcretest input files. ++The input is processed using using C's string functions, so must not +contain binary zeroes, even though in Unix-like environments, fgets() +treats any bytes other than newline as data characters. +
PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
From release 8.30, two separate PCRE libraries can be built. The original one @@ -1149,9 +1154,9 @@ Cambridge CB2 3QH, England.
REVISION
-Last updated: 09 February 2014 +Last updated: 23 February 2017
-Copyright © 1997-2014 University of Cambridge. +Copyright © 1997-2017 University of Cambridge.
Return to the PCRE index page. -- cgit v1.2.3