164 lines
8.2 KiB
HTML
Executable File
164 lines
8.2 KiB
HTML
Executable File
<?xml version="1.0"?>
|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content="SciTE" />
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
|
|
<title>
|
|
SciTE Regular Expressions
|
|
</title>
|
|
<style type="text/css">
|
|
h3 {
|
|
background-color: #FEC;
|
|
}
|
|
.ref {
|
|
color: #80C;
|
|
}
|
|
code {
|
|
font-weight: bold;
|
|
}
|
|
dt {
|
|
margin-top: 15px;
|
|
}
|
|
</style>
|
|
</head>
|
|
<body bgcolor="#FFFFFF" text="#000000">
|
|
<table bgcolor="#000000" width="100%" cellspacing="0" cellpadding="0" border="0">
|
|
<tr>
|
|
<td>
|
|
<img src="SciTEIco.png" border="3" height="64" width="64" alt="Scintilla icon" />
|
|
</td>
|
|
<td>
|
|
<a href="index.html" style="color:white;text-decoration:none"><font size="5">
|
|
Regular Expressions</font></a>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<h2>
|
|
Regular Expressions in SciTE
|
|
</h2>
|
|
<h3>Purpose</h3>
|
|
<p>
|
|
Regular expressions can be used for searching for patterns
|
|
rather than literals. For example, it is possible to
|
|
search for variables in SciTE property files,
|
|
which look like $(name.subname) with the regular expression:<br />
|
|
<code>\$([a-z.]+)</code> (or <code>\$\([a-z.]+\)</code> in posix mode).
|
|
</p>
|
|
<p>
|
|
Replacement with regular expressions allows complex
|
|
transformations with the use of tagged expressions.
|
|
For example, pairs of numbers separated by a ',' could
|
|
be reordered by replacing the regular expression:<br />
|
|
<code>\([0-9]+\),\([0-9]+\)</code> (or <code>([0-9]+),([0-9]+)</code>
|
|
in posix mode, or even <code>(\d+),(\d+)</code>)<br />
|
|
with:<br />
|
|
<code>\2,\1</code>
|
|
</p>
|
|
<h3>Syntax</h3>
|
|
<p>
|
|
Regular expression syntax depends on a parameter: find.replace.regexp.posix<br />
|
|
If set to 0, syntax uses the old Unix style where <code>\(</code> and <code>\)</code>
|
|
mark capturing sections while <code>(</code> and <code>)</code> are themselves.<br />
|
|
If set to 1, syntax uses the more common style where <code>(</code> and <code>)</code>
|
|
mark capturing sections while <code>\(</code> and <code>\)</code> are plain parentheses.
|
|
</p>
|
|
<dl><dt><span class="ref">[1]</span> char</dt>
|
|
<dd>matches itself, unless it is a special character
|
|
(metachar): <code>. \ [ ] * + ^ $</code> and <code>( )</code> in posix mode.
|
|
</dd><dt><span class="ref">[2]</span> <code>.</code></dt>
|
|
<dd>matches any character.
|
|
</dd><dt><span class="ref">[3]</span> <code>\</code></dt>
|
|
<dd>matches the character following it, except:
|
|
<ul><li><code>\a</code>, <code>\b</code>, <code>\f</code>,
|
|
<code>\n</code>, <code>\r</code>, <code>\t</code>, <code>\v</code>
|
|
match the corresponding C escape char,
|
|
respectively BEL, BS, FF, LF, CR, TAB and VT;<br />
|
|
Note that <code>\r</code> and <code>\n</code> are never matched because in Scintilla,
|
|
regular expression searches are made line per line (stripped of end-of-line chars).
|
|
</li><li>if not in posix mode, when followed by a left or right round bracket (see <span class="ref">[7]</span>);
|
|
</li><li>when followed by a digit 1 to 9 (see <span class="ref">[8]</span>);
|
|
</li><li>when followed by a left or right angle bracket (see <span class="ref">[9]</span>);
|
|
</li><li>when followed by d, D, s, S, w or W (see <span class="ref">[10]</span>);
|
|
</li><li>when followed by x and two hexa digits (see <span class="ref">[11]</span>);
|
|
</li></ul>
|
|
Backslash is used as an escape character for all other meta-characters, and itself.
|
|
</dd><dt><span class="ref">[4]</span> <code>[</code>set<code>]</code></dt>
|
|
<dd>matches one of the characters in the set.
|
|
If the first character in the set is <code>^</code>, it matches the characters NOT in the set,
|
|
i.e. complements the set. A shorthand <code>S-E</code> (start dash end) is
|
|
used to specify a set of characters S up to E, inclusive. The special characters <code>]</code> and
|
|
<code>-</code> have no special meaning if they appear as the first chars in the set. To include both,
|
|
put - first: <code>[-]A-Z]</code> (or just backslash them).
|
|
<table><tr><td>example</td><td>match</td></tr>
|
|
<tr><td><code>[-]|]</code></td><td>matches these 3 chars,</td></tr>
|
|
<tr><td><code>[]-|]</code></td><td>matches from ] to | chars</td></tr>
|
|
<tr><td><code>[a-z]</code></td><td>any lowercase alpha</td></tr>
|
|
<tr><td><code>[^-]]</code></td><td>any char except - and ]</td></tr>
|
|
<tr><td><code>[^A-Z]</code></td><td>any char except uppercase alpha</td></tr>
|
|
<tr><td><code>[a-zA-Z]</code></td><td>any alpha</td></tr>
|
|
</table>
|
|
</dd><dt><span class="ref">[5]</span> <code>*</code></dt>
|
|
<dd>any regular expression form <span class="ref">[1]</span> to <span class="ref">[4]</span>
|
|
(except <span class="ref">[7]</span>, <span class="ref">[8]</span> and <span class="ref">[9]</span>
|
|
forms of <span class="ref">[3]</span>),
|
|
followed by closure char (<code>*</code>) matches zero or more matches of that form.
|
|
</dd><dt><span class="ref">[6]</span> <code>+</code></dt>
|
|
<dd>same as <span class="ref">[5]</span>, except it matches one or more.
|
|
Both <span class="ref">[5]</span> and <span class="ref">[6]</span> are greedy (they match as much as possible).
|
|
</dd><dt><span class="ref">[7]</span></dt>
|
|
<dd>a regular expression in the form <span class="ref">[1]</span> to <span class="ref">[12]</span>, enclosed
|
|
as <code>\(<i>form</i>\)</code> (or <code>(<i>form</i>)</code> with posix flag) matches
|
|
what <i>form</i> matches.
|
|
The enclosure creates a set of tags, used for <span class="ref">[8]</span> and for
|
|
pattern substitution. The tagged forms are numbered starting from 1.
|
|
</dd><dt><span class="ref">[8]</span></dt>
|
|
<dd>a <code>\</code> followed by a digit 1 to 9 matches whatever a
|
|
previously tagged regular expression (<span class="ref">[7]</span>) matched.
|
|
</dd><dt><span class="ref">[9]</span> <code>\< \></code></dt>
|
|
<dd>a regular expression starting with a <code>\<</code> construct
|
|
and/or ending with a <code>\></code> construct, restricts the
|
|
pattern matching to the beginning of a word, and/or
|
|
the end of a word. A word is defined to be a character
|
|
string beginning and/or ending with the characters
|
|
A-Z a-z 0-9 and _. Scintilla extends this definition
|
|
by user setting. The word must also be preceded and/or
|
|
followed by any character outside those mentioned.
|
|
</dd><dt><span class="ref">[10]</span> <code>\l</code></dt>
|
|
<dd>a backslash followed by d, D, s, S, w or W,
|
|
becomes a character class (both inside and outside sets []).
|
|
<ul><li>d: decimal digits
|
|
</li><li>D: any char except decimal digits
|
|
</li><li>s: whitespace (space, \t \n \r \f \v)
|
|
</li><li>S: any char except whitespace (see above)
|
|
</li><li>w: alphanumeric & underscore (changed by user setting)
|
|
</li><li>W: any char except alphanumeric & underscore (see above)
|
|
</li></ul>
|
|
</dd><dt><span class="ref">[11]</span> <code>\xHH</code></dt>
|
|
<dd>a backslash followed by x and two hexa digits,
|
|
becomes the character whose Ascii code is equal
|
|
to these digits. If not followed by two digits,
|
|
it is 'x' char itself.
|
|
</dd><dt><span class="ref">[12]</span></dt>
|
|
<dd>a composite regular expression xy where x and y
|
|
are in the form <span class="ref">[1]</span> to <span class="ref">[10]</span> matches the longest
|
|
match of x followed by a match for y.
|
|
</dd><dt><span class="ref">[13]</span> <code>^ $</code></dt>
|
|
<dd>a regular expression starting with a ^ character
|
|
and/or ending with a $ character, restricts the
|
|
pattern matching to the beginning of the line,
|
|
or the end of line. [anchors] Elsewhere in the
|
|
pattern, ^ and $ are treated as ordinary characters.
|
|
</dd></dl>
|
|
<h3>Acknowledgments</h3>
|
|
<p>
|
|
Most of this documentation was originally written by Ozan S. Yigit.<br />
|
|
Additions by Neil Hodgson and Philippe Lhoste.<br />
|
|
All of this document is in the public domain.
|
|
</p>
|
|
</body>
|
|
</html>
|
|
|
|
|