luaforwindows/SciTE/docs/SciTERegEx.html

164 lines
8.2 KiB
HTML
Executable File

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="SciTE" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>
SciTE Regular Expressions
</title>
<style type="text/css">
h3 {
background-color: #FEC;
}
.ref {
color: #80C;
}
code {
font-weight: bold;
}
dt {
margin-top: 15px;
}
</style>
</head>
<body bgcolor="#FFFFFF" text="#000000">
<table bgcolor="#000000" width="100%" cellspacing="0" cellpadding="0" border="0">
<tr>
<td>
<img src="SciTEIco.png" border="3" height="64" width="64" alt="Scintilla icon" />
</td>
<td>
<a href="index.html" style="color:white;text-decoration:none"><font size="5">
Regular Expressions</font></a>
</td>
</tr>
</table>
<h2>
Regular Expressions in SciTE
</h2>
<h3>Purpose</h3>
<p>
Regular expressions can be used for searching for patterns
rather than literals. For example, it is possible to
search for variables in SciTE property files,
which look like $(name.subname) with the regular expression:<br />
<code>\$([a-z.]+)</code> (or <code>\$\([a-z.]+\)</code> in posix mode).
</p>
<p>
Replacement with regular expressions allows complex
transformations with the use of tagged expressions.
For example, pairs of numbers separated by a ',' could
be reordered by replacing the regular expression:<br />
<code>\([0-9]+\),\([0-9]+\)</code> (or <code>([0-9]+),([0-9]+)</code>
in posix mode, or even <code>(\d+),(\d+)</code>)<br />
with:<br />
<code>\2,\1</code>
</p>
<h3>Syntax</h3>
<p>
Regular expression syntax depends on a parameter: find.replace.regexp.posix<br />
If set to 0, syntax uses the old Unix style where <code>\(</code> and <code>\)</code>
mark capturing sections while <code>(</code> and <code>)</code> are themselves.<br />
If set to 1, syntax uses the more common style where <code>(</code> and <code>)</code>
mark capturing sections while <code>\(</code> and <code>\)</code> are plain parentheses.
</p>
<dl><dt><span class="ref">[1]</span> char</dt>
<dd>matches itself, unless it is a special character
(metachar): <code>. \ [ ] * + ^ $</code> and <code>( )</code> in posix mode.
</dd><dt><span class="ref">[2]</span> <code>.</code></dt>
<dd>matches any character.
</dd><dt><span class="ref">[3]</span> <code>\</code></dt>
<dd>matches the character following it, except:
<ul><li><code>\a</code>, <code>\b</code>, <code>\f</code>,
<code>\n</code>, <code>\r</code>, <code>\t</code>, <code>\v</code>
match the corresponding C escape char,
respectively BEL, BS, FF, LF, CR, TAB and VT;<br />
Note that <code>\r</code> and <code>\n</code> are never matched because in Scintilla,
regular expression searches are made line per line (stripped of end-of-line chars).
</li><li>if not in posix mode, when followed by a left or right round bracket (see <span class="ref">[7]</span>);
</li><li>when followed by a digit 1 to 9 (see <span class="ref">[8]</span>);
</li><li>when followed by a left or right angle bracket (see <span class="ref">[9]</span>);
</li><li>when followed by d, D, s, S, w or W (see <span class="ref">[10]</span>);
</li><li>when followed by x and two hexa digits (see <span class="ref">[11]</span>);
</li></ul>
Backslash is used as an escape character for all other meta-characters, and itself.
</dd><dt><span class="ref">[4]</span> <code>[</code>set<code>]</code></dt>
<dd>matches one of the characters in the set.
If the first character in the set is <code>^</code>, it matches the characters NOT in the set,
i.e. complements the set. A shorthand <code>S-E</code> (start dash end) is
used to specify a set of characters S up to E, inclusive. The special characters <code>]</code> and
<code>-</code> have no special meaning if they appear as the first chars in the set. To include both,
put - first: <code>[-]A-Z]</code> (or just backslash them).
<table><tr><td>example</td><td>match</td></tr>
<tr><td><code>[-]|]</code></td><td>matches these 3 chars,</td></tr>
<tr><td><code>[]-|]</code></td><td>matches from ] to | chars</td></tr>
<tr><td><code>[a-z]</code></td><td>any lowercase alpha</td></tr>
<tr><td><code>[^-]]</code></td><td>any char except - and ]</td></tr>
<tr><td><code>[^A-Z]</code></td><td>any char except uppercase alpha</td></tr>
<tr><td><code>[a-zA-Z]</code></td><td>any alpha</td></tr>
</table>
</dd><dt><span class="ref">[5]</span> <code>*</code></dt>
<dd>any regular expression form <span class="ref">[1]</span> to <span class="ref">[4]</span>
(except <span class="ref">[7]</span>, <span class="ref">[8]</span> and <span class="ref">[9]</span>
forms of <span class="ref">[3]</span>),
followed by closure char (<code>*</code>) matches zero or more matches of that form.
</dd><dt><span class="ref">[6]</span> <code>+</code></dt>
<dd>same as <span class="ref">[5]</span>, except it matches one or more.
Both <span class="ref">[5]</span> and <span class="ref">[6]</span> are greedy (they match as much as possible).
</dd><dt><span class="ref">[7]</span></dt>
<dd>a regular expression in the form <span class="ref">[1]</span> to <span class="ref">[12]</span>, enclosed
as <code>\(<i>form</i>\)</code> (or <code>(<i>form</i>)</code> with posix flag) matches
what <i>form</i> matches.
The enclosure creates a set of tags, used for <span class="ref">[8]</span> and for
pattern substitution. The tagged forms are numbered starting from 1.
</dd><dt><span class="ref">[8]</span></dt>
<dd>a <code>\</code> followed by a digit 1 to 9 matches whatever a
previously tagged regular expression (<span class="ref">[7]</span>) matched.
</dd><dt><span class="ref">[9]</span> <code>\&lt; \&gt;</code></dt>
<dd>a regular expression starting with a <code>\&lt;</code> construct
and/or ending with a <code>\&gt;</code> construct, restricts the
pattern matching to the beginning of a word, and/or
the end of a word. A word is defined to be a character
string beginning and/or ending with the characters
A-Z a-z 0-9 and _. Scintilla extends this definition
by user setting. The word must also be preceded and/or
followed by any character outside those mentioned.
</dd><dt><span class="ref">[10]</span> <code>\l</code></dt>
<dd>a backslash followed by d, D, s, S, w or W,
becomes a character class (both inside and outside sets []).
<ul><li>d: decimal digits
</li><li>D: any char except decimal digits
</li><li>s: whitespace (space, \t \n \r \f \v)
</li><li>S: any char except whitespace (see above)
</li><li>w: alphanumeric &amp; underscore (changed by user setting)
</li><li>W: any char except alphanumeric &amp; underscore (see above)
</li></ul>
</dd><dt><span class="ref">[11]</span> <code>\xHH</code></dt>
<dd>a backslash followed by x and two hexa digits,
becomes the character whose Ascii code is equal
to these digits. If not followed by two digits,
it is 'x' char itself.
</dd><dt><span class="ref">[12]</span></dt>
<dd>a composite regular expression xy where x and y
are in the form <span class="ref">[1]</span> to <span class="ref">[10]</span> matches the longest
match of x followed by a match for y.
</dd><dt><span class="ref">[13]</span> <code>^ $</code></dt>
<dd>a regular expression starting with a ^ character
and/or ending with a $ character, restricts the
pattern matching to the beginning of the line,
or the end of line. [anchors] Elsewhere in the
pattern, ^ and $ are treated as ordinary characters.
</dd></dl>
<h3>Acknowledgments</h3>
<p>
Most of this documentation was originally written by Ozan S. Yigit.<br />
Additions by Neil Hodgson and Philippe Lhoste.<br />
All of this document is in the public domain.
</p>
</body>
</html>