diff options
Diffstat (limited to 'libs/Pcre16/docs/doc/html/pcresyntax.html')
-rw-r--r-- | libs/Pcre16/docs/doc/html/pcresyntax.html | 538 |
1 files changed, 538 insertions, 0 deletions
diff --git a/libs/Pcre16/docs/doc/html/pcresyntax.html b/libs/Pcre16/docs/doc/html/pcresyntax.html new file mode 100644 index 0000000000..89f35737b4 --- /dev/null +++ b/libs/Pcre16/docs/doc/html/pcresyntax.html @@ -0,0 +1,538 @@ +<html> +<head> +<title>pcresyntax specification</title> +</head> +<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> +<h1>pcresyntax man page</h1> +<p> +Return to the <a href="index.html">PCRE index page</a>. +</p> +<p> +This page is part of the PCRE HTML documentation. It was generated automatically +from the original man page. If there is any nonsense in it, please consult the +man page, in case the conversion went wrong. +<br> +<ul> +<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> +<li><a name="TOC2" href="#SEC2">QUOTING</a> +<li><a name="TOC3" href="#SEC3">CHARACTERS</a> +<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> +<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a> +<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a> +<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a> +<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a> +<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a> +<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a> +<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a> +<li><a name="TOC12" href="#SEC12">ALTERNATION</a> +<li><a name="TOC13" href="#SEC13">CAPTURING</a> +<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> +<li><a name="TOC15" href="#SEC15">COMMENT</a> +<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> +<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a> +<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a> +<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> +<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a> +<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> +<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a> +<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a> +<li><a name="TOC24" href="#SEC24">CALLOUTS</a> +<li><a name="TOC25" href="#SEC25">SEE ALSO</a> +<li><a name="TOC26" href="#SEC26">AUTHOR</a> +<li><a name="TOC27" href="#SEC27">REVISION</a> +</ul> +<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> +<P> +The full syntax and semantics of the regular expressions that are supported by +PCRE are described in the +<a href="pcrepattern.html"><b>pcrepattern</b></a> +documentation. This document contains a quick-reference summary of the syntax. +</P> +<br><a name="SEC2" href="#TOC1">QUOTING</a><br> +<P> +<pre> + \x where x is non-alphanumeric is a literal x + \Q...\E treat enclosed characters as literal +</PRE> +</P> +<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> +<P> +<pre> + \a alarm, that is, the BEL character (hex 07) + \cx "control-x", where x is any ASCII character + \e escape (hex 1B) + \f form feed (hex 0C) + \n newline (hex 0A) + \r carriage return (hex 0D) + \t tab (hex 09) + \0dd character with octal code 0dd + \ddd character with octal code ddd, or backreference + \o{ddd..} character with octal code ddd.. + \xhh character with hex code hh + \x{hhh..} character with hex code hhh.. +</pre> +Note that \0dd is always an octal code, and that \8 and \9 are the literal +characters "8" and "9". +</P> +<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> +<P> +<pre> + . any character except newline; + in dotall mode, any character whatsoever + \C one data unit, even in UTF mode (best avoided) + \d a decimal digit + \D a character that is not a decimal digit + \h a horizontal white space character + \H a character that is not a horizontal white space character + \N a character that is not a newline + \p{<i>xx</i>} a character with the <i>xx</i> property + \P{<i>xx</i>} a character without the <i>xx</i> property + \R a newline sequence + \s a white space character + \S a character that is not a white space character + \v a vertical white space character + \V a character that is not a vertical white space character + \w a "word" character + \W a "non-word" character + \X a Unicode extended grapheme cluster +</pre> +By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode +or in the 16- bit and 32-bit libraries. However, if locale-specific matching is +happening, \s and \w may also match characters with code points in the range +128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences +is changed to use Unicode properties and they match many more characters. +</P> +<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> +<P> +<pre> + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate + + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter + L& Ll, Lu, or Lt + + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark + + N Number + Nd Decimal number + Nl Letter number + No Other number + + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation + + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol + + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator +</PRE> +</P> +<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br> +<P> +<pre> + Xan Alphanumeric: union of properties L and N + Xps POSIX space: property Z or tab, NL, VT, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR + Xuc Univerally-named character: one that can be + represented by a Universal Character Name + Xwd Perl word: property Xan or underscore +</pre> +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18 and PCRE changed at release 8.34. +</P> +<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> +<P> +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Batak, +Bengali, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Chakma, +Cham, +Cherokee, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Egyptian_Hieroglyphs, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Greek, +Gujarati, +Gurmukhi, +Han, +Hangul, +Hanunoo, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khmer, +Lao, +Latin, +Lepcha, +Limbu, +Linear_B, +Lisu, +Lycian, +Lydian, +Malayalam, +Mandaic, +Meetei_Mayek, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Mongolian, +Myanmar, +New_Tai_Lue, +Nko, +Ogham, +Old_Italic, +Old_Persian, +Old_South_Arabian, +Old_Turkic, +Ol_Chiki, +Oriya, +Osmanya, +Phags_Pa, +Phoenician, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Sinhala, +Sora_Sompeng, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Ugaritic, +Vai, +Yi. +</P> +<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> +<P> +<pre> + [...] positive character class + [^...] negative character class + [x-y] range (can be used for hex characters) + [[:xxx:]] positive POSIX named set + [[:^xxx:]] negative POSIX named set + + alnum alphanumeric + alpha alphabetic + ascii 0-127 + blank space or tab + cntrl control character + digit decimal digit + graph printing, excluding space + lower lower case letter + print printing, including space + punct printing, excluding alphanumeric + space white space + upper upper case letter + word same as \w + xdigit hexadecimal digit +</pre> +In PCRE, POSIX character set names recognize only ASCII characters by default, +but some of them use Unicode properties if PCRE_UCP is set. You can use +\Q...\E inside a character class. +</P> +<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br> +<P> +<pre> + ? 0 or 1, greedy + ?+ 0 or 1, possessive + ?? 0 or 1, lazy + * 0 or more, greedy + *+ 0 or more, possessive + *? 0 or more, lazy + + 1 or more, greedy + ++ 1 or more, possessive + +? 1 or more, lazy + {n} exactly n + {n,m} at least n, no more than m, greedy + {n,m}+ at least n, no more than m, possessive + {n,m}? at least n, no more than m, lazy + {n,} n or more, greedy + {n,}+ n or more, possessive + {n,}? n or more, lazy +</PRE> +</P> +<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> +<P> +<pre> + \b word boundary + \B not a word boundary + ^ start of subject + also after internal newline in multiline mode + \A start of subject + $ end of subject + also before newline at end of subject + also before internal newline in multiline mode + \Z end of subject + also before newline at end of subject + \z end of subject + \G first matching position in subject +</PRE> +</P> +<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br> +<P> +<pre> + \K reset start of match +</pre> +\K is honoured in positive assertions, but ignored in negative ones. +</P> +<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> +<P> +<pre> + expr|expr|expr... +</PRE> +</P> +<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> +<P> +<pre> + (...) capturing group + (?<name>...) named capturing group (Perl) + (?'name'...) named capturing group (Perl) + (?P<name>...) named capturing group (Python) + (?:...) non-capturing group + (?|...) non-capturing group; reset group numbers for + capturing groups in each alternative +</PRE> +</P> +<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> +<P> +<pre> + (?>...) atomic, non-capturing group +</PRE> +</P> +<br><a name="SEC15" href="#TOC1">COMMENT</a><br> +<P> +<pre> + (?#....) comment (not nestable) +</PRE> +</P> +<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br> +<P> +<pre> + (?i) caseless + (?J) allow duplicate names + (?m) multiline + (?s) single line (dotall) + (?U) default ungreedy (lazy) + (?x) extended (ignore white space) + (?-...) unset option(s) +</pre> +The following are recognized only at the very start of a pattern or after one +of the newline or \R options with similar syntax. More than one of them may +appear. +<pre> + (*LIMIT_MATCH=d) set the match limit to d (decimal number) + (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) + (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS) + (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) + (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) + (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) + (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) + (*UTF) set appropriate UTF mode for the library in use + (*UCP) set PCRE_UCP (use Unicode properties for \d etc) +</pre> +Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the +limits set by the caller of pcre_exec(), not increase them. +</P> +<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br> +<P> +These are recognized only at the very start of the pattern or after option +settings with a similar syntax. +<pre> + (*CR) carriage return only + (*LF) linefeed only + (*CRLF) carriage return followed by linefeed + (*ANYCRLF) all three of the above + (*ANY) any Unicode newline sequence +</PRE> +</P> +<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br> +<P> +These are recognized only at the very start of the pattern or after option +setting with a similar syntax. +<pre> + (*BSR_ANYCRLF) CR, LF, or CRLF + (*BSR_UNICODE) any Unicode newline sequence +</PRE> +</P> +<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> +<P> +<pre> + (?=...) positive look ahead + (?!...) negative look ahead + (?<=...) positive look behind + (?<!...) negative look behind +</pre> +Each top-level branch of a look behind must be of a fixed length. +</P> +<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br> +<P> +<pre> + \n reference by number (can be ambiguous) + \gn reference by number + \g{n} reference by number + \g{-n} relative reference by number + \k<name> reference by name (Perl) + \k'name' reference by name (Perl) + \g{name} reference by name (Perl) + \k{name} reference by name (.NET) + (?P=name) reference by name (Python) +</PRE> +</P> +<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> +<P> +<pre> + (?R) recurse whole pattern + (?n) call subpattern by absolute number + (?+n) call subpattern by relative number + (?-n) call subpattern by relative number + (?&name) call subpattern by name (Perl) + (?P>name) call subpattern by name (Python) + \g<name> call subpattern by name (Oniguruma) + \g'name' call subpattern by name (Oniguruma) + \g<n> call subpattern by absolute number (Oniguruma) + \g'n' call subpattern by absolute number (Oniguruma) + \g<+n> call subpattern by relative number (PCRE extension) + \g'+n' call subpattern by relative number (PCRE extension) + \g<-n> call subpattern by relative number (PCRE extension) + \g'-n' call subpattern by relative number (PCRE extension) +</PRE> +</P> +<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br> +<P> +<pre> + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) + + (?(n)... absolute reference condition + (?(+n)... relative reference condition + (?(-n)... relative reference condition + (?(<name>)... named reference condition (Perl) + (?('name')... named reference condition (Perl) + (?(name)... named reference condition (PCRE) + (?(R)... overall recursion condition + (?(Rn)... specific group recursion condition + (?(R&name)... specific recursion condition + (?(DEFINE)... define subpattern for reference + (?(assert)... assertion condition +</PRE> +</P> +<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br> +<P> +The following act immediately they are reached: +<pre> + (*ACCEPT) force successful match + (*FAIL) force backtrack; synonym (*F) + (*MARK:NAME) set name to be passed back; synonym (*:NAME) +</pre> +The following act only when a subsequent match failure causes a backtrack to +reach them. They all force a match failure, but they differ in what happens +afterwards. Those that advance the start-of-match point do so only if the +pattern is not anchored. +<pre> + (*COMMIT) overall failure, no advance of starting point + (*PRUNE) advance to next starting character + (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) + (*SKIP) advance to current matching position + (*SKIP:NAME) advance to position corresponding to an earlier + (*MARK:NAME); if not found, the (*SKIP) is ignored + (*THEN) local failure, backtrack to next alternation + (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) +</PRE> +</P> +<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> +<P> +<pre> + (?C) callout + (?Cn) callout with data n +</PRE> +</P> +<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> +<P> +<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), +<b>pcrematching</b>(3), <b>pcre</b>(3). +</P> +<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> +<P> +Philip Hazel +<br> +University Computing Service +<br> +Cambridge CB2 3QH, England. +<br> +</P> +<br><a name="SEC27" href="#TOC1">REVISION</a><br> +<P> +Last updated: 08 January 2014 +<br> +Copyright © 1997-2014 University of Cambridge. +<br> +<p> +Return to the <a href="index.html">PCRE index page</a>. +</p> |