diff options
Diffstat (limited to 'libs/Pcre16/docs/doc/pcrepattern.3')
-rw-r--r-- | libs/Pcre16/docs/doc/pcrepattern.3 | 81 |
1 files changed, 60 insertions, 21 deletions
diff --git a/libs/Pcre16/docs/doc/pcrepattern.3 b/libs/Pcre16/docs/doc/pcrepattern.3 index f1c45cda5d..97df217fdb 100644 --- a/libs/Pcre16/docs/doc/pcrepattern.3 +++ b/libs/Pcre16/docs/doc/pcrepattern.3 @@ -1,4 +1,4 @@ -.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35" +.TH PCREPATTERN 3 "23 October 2016" "PCRE 8.40" .SH NAME PCRE - Perl-compatible regular expressions .SH "PCRE REGULAR EXPRESSION DETAILS" @@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters, apart from the binary zero that terminates a pattern, but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents: +one of the following escape sequences than the binary character it represents. +In an ASCII or Unicode environment, these escapes are as follows: .sp \ea alarm, that is, the BEL character (hex 07) \ecx "control-x", where x is any ASCII character @@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the data item (byte or 16-bit value) following \ec has a value greater than 127, a compile-time error occurs. This locks out non-ASCII characters in all modes. .P -The \ec facility was designed for use with ASCII characters, but with the -extension to Unicode it is even less useful than it once was. It is, however, -recognized when PCRE is compiled in EBCDIC mode, where data items are always -bytes. In this mode, all values are valid after \ec. If the next character is a -lower case letter, it is converted to upper case. Then the 0xc0 bits of the -byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because -the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other -characters also generate different values. +When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et +generate the appropriate EBCDIC code values. The \ec escape is processed +as specified for Perl in the \fBperlebcdic\fP document. The only characters +that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any +other character provokes a compile-time error. The sequence \ec@ encodes +character code 0; after \ec the letters (in either case) encode characters 1-26 +(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex +1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F). +.P +Thus, apart from \ec?, these escapes generate the same character code values as +they do in an ASCII environment, though the meanings of the values mostly +differ. For example, \ecG always generates code value 7, which is BEL in ASCII +but DEL in EBCDIC. +.P +The sequence \ec? generates DEL (127, hex 7F) in an ASCII environment, but +because 127 is not a control character in EBCDIC, Perl makes it generate the +APC character. Unfortunately, there are several variants of EBCDIC. In most of +them the APC character has the value 255 (hex FF), but in the one Perl calls +POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC +values, PCRE makes \ec? generate 95; otherwise it generates 255. .P After \e0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \e0\ex\e07 -specifies two binary zeros followed by a BEL character (code value 7). Make +digits, just those that are present are used. Thus the sequence \e0\ex\e015 +specifies two binary zeros followed by a CR character (code value 13). Make sure you supply two digits after the initial zero if the pattern character that follows is itself an octal digit. .P @@ -708,6 +721,7 @@ Armenian, Avestan, Balinese, Bamum, +Bassa_Vah, Batak, Bengali, Bopomofo, @@ -717,6 +731,7 @@ Buginese, Buhid, Canadian_Aboriginal, Carian, +Caucasian_Albanian, Chakma, Cham, Cherokee, @@ -727,11 +742,14 @@ Cypriot, Cyrillic, Deseret, Devanagari, +Duployan, Egyptian_Hieroglyphs, +Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, +Grantha, Greek, Gujarati, Gurmukhi, @@ -751,40 +769,56 @@ Katakana, Kayah_Li, Kharoshthi, Khmer, +Khojki, +Khudawadi, Lao, Latin, Lepcha, Limbu, +Linear_A, Linear_B, Lisu, Lycian, Lydian, +Mahajani, Malayalam, Mandaic, +Manichaean, Meetei_Mayek, +Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, +Modi, Mongolian, +Mro, Myanmar, +Nabataean, New_Tai_Lue, Nko, Ogham, +Ol_Chiki, Old_Italic, +Old_North_Arabian, +Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, -Ol_Chiki, Oriya, Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, Phags_Pa, Phoenician, +Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Shavian, +Siddham, Sinhala, Sora_Sompeng, Sundanese, @@ -802,8 +836,10 @@ Thaana, Thai, Tibetan, Tifinagh, +Tirhuta, Ugaritic, Vai, +Warang_Citi, Yi. .P Each character has exactly one Unicode general category property, specified by @@ -1475,12 +1511,8 @@ J, U and X respectively. .P When one of these option changes occurs at top level (that is, not inside subpattern parentheses), the change applies to the remainder of the pattern -that follows. If the change is placed right at the start of a pattern, PCRE -extracts it into the global options (and it will therefore show up in data -extracted by the \fBpcre_fullinfo()\fP function). -.P -An option change within a subpattern (see below for a description of -subpatterns) affects only that part of the subpattern that follows it, so +that follows. An option change within a subpattern (see below for a description +of subpatterns) affects only that part of the subpattern that follows it, so .sp (a(?i)b)c .sp @@ -2135,6 +2167,13 @@ numbering the capturing subpatterns in the whole pattern. However, substring capturing is carried out only for positive assertions. (Perl sometimes, but not always, does do capturing in negative assertions.) .P +WARNING: If a positive assertion containing one or more capturing subpatterns +succeeds, but failure to match later in the pattern causes backtracking over +this assertion, the captures within the assertion are reset only if no higher +numbered captures are already set. This is, unfortunately, a fundamental +limitation of the current implementation, and as PCRE1 is now in +maintenance-only status, it is unlikely ever to change. +.P For compatibility with Perl, assertion subpatterns may be repeated; though it makes no sense to assert the same thing several times, the side effect of capturing parentheses may occasionally be useful. In practice, there only three @@ -3260,6 +3299,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 08 January 2014 -Copyright (c) 1997-2014 University of Cambridge. +Last updated: 23 October 2016 +Copyright (c) 1997-2016 University of Cambridge. .fi |