diff options
Diffstat (limited to 'libs/Pcre16/docs/doc/html/pcresyntax.html')
-rw-r--r-- | libs/Pcre16/docs/doc/html/pcresyntax.html | 561 |
1 files changed, 0 insertions, 561 deletions
diff --git a/libs/Pcre16/docs/doc/html/pcresyntax.html b/libs/Pcre16/docs/doc/html/pcresyntax.html deleted file mode 100644 index 5896b9e068..0000000000 --- a/libs/Pcre16/docs/doc/html/pcresyntax.html +++ /dev/null @@ -1,561 +0,0 @@ -<html> -<head> -<title>pcresyntax specification</title> -</head> -<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> -<h1>pcresyntax man page</h1> -<p> -Return to the <a href="index.html">PCRE index page</a>. -</p> -<p> -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -<br> -<ul> -<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> -<li><a name="TOC2" href="#SEC2">QUOTING</a> -<li><a name="TOC3" href="#SEC3">CHARACTERS</a> -<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> -<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a> -<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a> -<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a> -<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a> -<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a> -<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a> -<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a> -<li><a name="TOC12" href="#SEC12">ALTERNATION</a> -<li><a name="TOC13" href="#SEC13">CAPTURING</a> -<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> -<li><a name="TOC15" href="#SEC15">COMMENT</a> -<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> -<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a> -<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a> -<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> -<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a> -<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> -<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a> -<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a> -<li><a name="TOC24" href="#SEC24">CALLOUTS</a> -<li><a name="TOC25" href="#SEC25">SEE ALSO</a> -<li><a name="TOC26" href="#SEC26">AUTHOR</a> -<li><a name="TOC27" href="#SEC27">REVISION</a> -</ul> -<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> -<P> -The full syntax and semantics of the regular expressions that are supported by -PCRE are described in the -<a href="pcrepattern.html"><b>pcrepattern</b></a> -documentation. This document contains a quick-reference summary of the syntax. -</P> -<br><a name="SEC2" href="#TOC1">QUOTING</a><br> -<P> -<pre> - \x where x is non-alphanumeric is a literal x - \Q...\E treat enclosed characters as literal -</PRE> -</P> -<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> -<P> -<pre> - \a alarm, that is, the BEL character (hex 07) - \cx "control-x", where x is any ASCII character - \e escape (hex 1B) - \f form feed (hex 0C) - \n newline (hex 0A) - \r carriage return (hex 0D) - \t tab (hex 09) - \0dd character with octal code 0dd - \ddd character with octal code ddd, or backreference - \o{ddd..} character with octal code ddd.. - \xhh character with hex code hh - \x{hhh..} character with hex code hhh.. -</pre> -Note that \0dd is always an octal code, and that \8 and \9 are the literal -characters "8" and "9". -</P> -<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> -<P> -<pre> - . any character except newline; - in dotall mode, any character whatsoever - \C one data unit, even in UTF mode (best avoided) - \d a decimal digit - \D a character that is not a decimal digit - \h a horizontal white space character - \H a character that is not a horizontal white space character - \N a character that is not a newline - \p{<i>xx</i>} a character with the <i>xx</i> property - \P{<i>xx</i>} a character without the <i>xx</i> property - \R a newline sequence - \s a white space character - \S a character that is not a white space character - \v a vertical white space character - \V a character that is not a vertical white space character - \w a "word" character - \W a "non-word" character - \X a Unicode extended grapheme cluster -</pre> -By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode -or in the 16- bit and 32-bit libraries. However, if locale-specific matching is -happening, \s and \w may also match characters with code points in the range -128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences -is changed to use Unicode properties and they match many more characters. -</P> -<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> -<P> -<pre> - C Other - Cc Control - Cf Format - Cn Unassigned - Co Private use - Cs Surrogate - - L Letter - Ll Lower case letter - Lm Modifier letter - Lo Other letter - Lt Title case letter - Lu Upper case letter - L& Ll, Lu, or Lt - - M Mark - Mc Spacing mark - Me Enclosing mark - Mn Non-spacing mark - - N Number - Nd Decimal number - Nl Letter number - No Other number - - P Punctuation - Pc Connector punctuation - Pd Dash punctuation - Pe Close punctuation - Pf Final punctuation - Pi Initial punctuation - Po Other punctuation - Ps Open punctuation - - S Symbol - Sc Currency symbol - Sk Modifier symbol - Sm Mathematical symbol - So Other symbol - - Z Separator - Zl Line separator - Zp Paragraph separator - Zs Space separator -</PRE> -</P> -<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br> -<P> -<pre> - Xan Alphanumeric: union of properties L and N - Xps POSIX space: property Z or tab, NL, VT, FF, CR - Xsp Perl space: property Z or tab, NL, VT, FF, CR - Xuc Univerally-named character: one that can be - represented by a Universal Character Name - Xwd Perl word: property Xan or underscore -</pre> -Perl and POSIX space are now the same. Perl added VT to its space character set -at release 5.18 and PCRE changed at release 8.34. -</P> -<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> -<P> -Arabic, -Armenian, -Avestan, -Balinese, -Bamum, -Bassa_Vah, -Batak, -Bengali, -Bopomofo, -Brahmi, -Braille, -Buginese, -Buhid, -Canadian_Aboriginal, -Carian, -Caucasian_Albanian, -Chakma, -Cham, -Cherokee, -Common, -Coptic, -Cuneiform, -Cypriot, -Cyrillic, -Deseret, -Devanagari, -Duployan, -Egyptian_Hieroglyphs, -Elbasan, -Ethiopic, -Georgian, -Glagolitic, -Gothic, -Grantha, -Greek, -Gujarati, -Gurmukhi, -Han, -Hangul, -Hanunoo, -Hebrew, -Hiragana, -Imperial_Aramaic, -Inherited, -Inscriptional_Pahlavi, -Inscriptional_Parthian, -Javanese, -Kaithi, -Kannada, -Katakana, -Kayah_Li, -Kharoshthi, -Khmer, -Khojki, -Khudawadi, -Lao, -Latin, -Lepcha, -Limbu, -Linear_A, -Linear_B, -Lisu, -Lycian, -Lydian, -Mahajani, -Malayalam, -Mandaic, -Manichaean, -Meetei_Mayek, -Mende_Kikakui, -Meroitic_Cursive, -Meroitic_Hieroglyphs, -Miao, -Modi, -Mongolian, -Mro, -Myanmar, -Nabataean, -New_Tai_Lue, -Nko, -Ogham, -Ol_Chiki, -Old_Italic, -Old_North_Arabian, -Old_Permic, -Old_Persian, -Old_South_Arabian, -Old_Turkic, -Oriya, -Osmanya, -Pahawh_Hmong, -Palmyrene, -Pau_Cin_Hau, -Phags_Pa, -Phoenician, -Psalter_Pahlavi, -Rejang, -Runic, -Samaritan, -Saurashtra, -Sharada, -Shavian, -Siddham, -Sinhala, -Sora_Sompeng, -Sundanese, -Syloti_Nagri, -Syriac, -Tagalog, -Tagbanwa, -Tai_Le, -Tai_Tham, -Tai_Viet, -Takri, -Tamil, -Telugu, -Thaana, -Thai, -Tibetan, -Tifinagh, -Tirhuta, -Ugaritic, -Vai, -Warang_Citi, -Yi. -</P> -<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> -<P> -<pre> - [...] positive character class - [^...] negative character class - [x-y] range (can be used for hex characters) - [[:xxx:]] positive POSIX named set - [[:^xxx:]] negative POSIX named set - - alnum alphanumeric - alpha alphabetic - ascii 0-127 - blank space or tab - cntrl control character - digit decimal digit - graph printing, excluding space - lower lower case letter - print printing, including space - punct printing, excluding alphanumeric - space white space - upper upper case letter - word same as \w - xdigit hexadecimal digit -</pre> -In PCRE, POSIX character set names recognize only ASCII characters by default, -but some of them use Unicode properties if PCRE_UCP is set. You can use -\Q...\E inside a character class. -</P> -<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br> -<P> -<pre> - ? 0 or 1, greedy - ?+ 0 or 1, possessive - ?? 0 or 1, lazy - * 0 or more, greedy - *+ 0 or more, possessive - *? 0 or more, lazy - + 1 or more, greedy - ++ 1 or more, possessive - +? 1 or more, lazy - {n} exactly n - {n,m} at least n, no more than m, greedy - {n,m}+ at least n, no more than m, possessive - {n,m}? at least n, no more than m, lazy - {n,} n or more, greedy - {n,}+ n or more, possessive - {n,}? n or more, lazy -</PRE> -</P> -<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> -<P> -<pre> - \b word boundary - \B not a word boundary - ^ start of subject - also after internal newline in multiline mode - \A start of subject - $ end of subject - also before newline at end of subject - also before internal newline in multiline mode - \Z end of subject - also before newline at end of subject - \z end of subject - \G first matching position in subject -</PRE> -</P> -<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br> -<P> -<pre> - \K reset start of match -</pre> -\K is honoured in positive assertions, but ignored in negative ones. -</P> -<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> -<P> -<pre> - expr|expr|expr... -</PRE> -</P> -<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> -<P> -<pre> - (...) capturing group - (?<name>...) named capturing group (Perl) - (?'name'...) named capturing group (Perl) - (?P<name>...) named capturing group (Python) - (?:...) non-capturing group - (?|...) non-capturing group; reset group numbers for - capturing groups in each alternative -</PRE> -</P> -<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> -<P> -<pre> - (?>...) atomic, non-capturing group -</PRE> -</P> -<br><a name="SEC15" href="#TOC1">COMMENT</a><br> -<P> -<pre> - (?#....) comment (not nestable) -</PRE> -</P> -<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br> -<P> -<pre> - (?i) caseless - (?J) allow duplicate names - (?m) multiline - (?s) single line (dotall) - (?U) default ungreedy (lazy) - (?x) extended (ignore white space) - (?-...) unset option(s) -</pre> -The following are recognized only at the very start of a pattern or after one -of the newline or \R options with similar syntax. More than one of them may -appear. -<pre> - (*LIMIT_MATCH=d) set the match limit to d (decimal number) - (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) - (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS) - (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) - (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) - (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) - (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) - (*UTF) set appropriate UTF mode for the library in use - (*UCP) set PCRE_UCP (use Unicode properties for \d etc) -</pre> -Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the -limits set by the caller of pcre_exec(), not increase them. -</P> -<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br> -<P> -These are recognized only at the very start of the pattern or after option -settings with a similar syntax. -<pre> - (*CR) carriage return only - (*LF) linefeed only - (*CRLF) carriage return followed by linefeed - (*ANYCRLF) all three of the above - (*ANY) any Unicode newline sequence -</PRE> -</P> -<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br> -<P> -These are recognized only at the very start of the pattern or after option -setting with a similar syntax. -<pre> - (*BSR_ANYCRLF) CR, LF, or CRLF - (*BSR_UNICODE) any Unicode newline sequence -</PRE> -</P> -<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> -<P> -<pre> - (?=...) positive look ahead - (?!...) negative look ahead - (?<=...) positive look behind - (?<!...) negative look behind -</pre> -Each top-level branch of a look behind must be of a fixed length. -</P> -<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br> -<P> -<pre> - \n reference by number (can be ambiguous) - \gn reference by number - \g{n} reference by number - \g{-n} relative reference by number - \k<name> reference by name (Perl) - \k'name' reference by name (Perl) - \g{name} reference by name (Perl) - \k{name} reference by name (.NET) - (?P=name) reference by name (Python) -</PRE> -</P> -<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> -<P> -<pre> - (?R) recurse whole pattern - (?n) call subpattern by absolute number - (?+n) call subpattern by relative number - (?-n) call subpattern by relative number - (?&name) call subpattern by name (Perl) - (?P>name) call subpattern by name (Python) - \g<name> call subpattern by name (Oniguruma) - \g'name' call subpattern by name (Oniguruma) - \g<n> call subpattern by absolute number (Oniguruma) - \g'n' call subpattern by absolute number (Oniguruma) - \g<+n> call subpattern by relative number (PCRE extension) - \g'+n' call subpattern by relative number (PCRE extension) - \g<-n> call subpattern by relative number (PCRE extension) - \g'-n' call subpattern by relative number (PCRE extension) -</PRE> -</P> -<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br> -<P> -<pre> - (?(condition)yes-pattern) - (?(condition)yes-pattern|no-pattern) - - (?(n)... absolute reference condition - (?(+n)... relative reference condition - (?(-n)... relative reference condition - (?(<name>)... named reference condition (Perl) - (?('name')... named reference condition (Perl) - (?(name)... named reference condition (PCRE) - (?(R)... overall recursion condition - (?(Rn)... specific group recursion condition - (?(R&name)... specific recursion condition - (?(DEFINE)... define subpattern for reference - (?(assert)... assertion condition -</PRE> -</P> -<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br> -<P> -The following act immediately they are reached: -<pre> - (*ACCEPT) force successful match - (*FAIL) force backtrack; synonym (*F) - (*MARK:NAME) set name to be passed back; synonym (*:NAME) -</pre> -The following act only when a subsequent match failure causes a backtrack to -reach them. They all force a match failure, but they differ in what happens -afterwards. Those that advance the start-of-match point do so only if the -pattern is not anchored. -<pre> - (*COMMIT) overall failure, no advance of starting point - (*PRUNE) advance to next starting character - (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) - (*SKIP) advance to current matching position - (*SKIP:NAME) advance to position corresponding to an earlier - (*MARK:NAME); if not found, the (*SKIP) is ignored - (*THEN) local failure, backtrack to next alternation - (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) -</PRE> -</P> -<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> -<P> -<pre> - (?C) callout - (?Cn) callout with data n -</PRE> -</P> -<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> -<P> -<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), -<b>pcrematching</b>(3), <b>pcre</b>(3). -</P> -<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> -<P> -Philip Hazel -<br> -University Computing Service -<br> -Cambridge CB2 3QH, England. -<br> -</P> -<br><a name="SEC27" href="#TOC1">REVISION</a><br> -<P> -Last updated: 08 January 2014 -<br> -Copyright © 1997-2014 University of Cambridge. -<br> -<p> -Return to the <a href="index.html">PCRE index page</a>. -</p> |