REGULAR EXPRESSIONS Multi-threaded NewsWatcher |
Contents |
Go back to the page on filtering. See the page on creating and editing filters. |
REGULAR EXPRESSIONS
|
|||||||||||||||||||||||||||||||||||||
What are
|
Regular expressions are a system for matching patterns in text data, which
are widely used in UNIX systems, and occasionally on personal computers
as well. They provide a very powerful, but also rather obtuse, set of tools
for finding particular words or combinations of characters in strings.
On first reading, this all seems particularly complicated and not of much use over and above the standard string matching provided in the Edit Filters dialog (Word matching, for example). In actual fact, in these cases NewsWatcher converts your string matching criteria into a regular expression when applying filters to articles. However, you can use some of the simpler matching criteria with ease (some examples are suggested below), and gradually build up the complexity of the regular expressions that you use.
One point to note is that regular expressions are not wildcards. The
regular expression ' |
||||||||||||||||||||||||||||||||||||
Information sources | The information here is an amalgamation of the documentation of regular expressions in the Metrowerks CodeWarrior IDE, and of a chapter in the book UNIX Power Tools (Peek, O'Reilly & Loukides). Online information (often the man pages for UNIX utilities) is available by using one of the search engines (e.g. InfoSeek) to search for 'regular expressions'. | ||||||||||||||||||||||||||||||||||||
REGEX BASICS
|
|||||||||||||||||||||||||||||||||||||
Matching simple expressions |
Most characters match themselves. The only exceptions are called
special characters:
\* .
For example,
|
||||||||||||||||||||||||||||||||||||
Matching any character |
A period (.) matches any character except a newline character.
|
||||||||||||||||||||||||||||||||||||
Repeating expressions |
You can repeat expressions with an asterisk or plus sign.
A regular expression followed by an asterisk (
A regular expression followed by a plus sign (
A regular expression followed by a question mark ( For example:
So to match any series of zero or more characters, use " |
||||||||||||||||||||||||||||||||||||
Grouping expressions |
If an expression is enclosed in parentheses (( and ) ), the editor
treats it as one expression and applies any asterisk (* ) or plus (+ )
to the whole expression.
For example
|
||||||||||||||||||||||||||||||||||||
Choosing one character from many |
A string of characters enclosed in square brackets ([] ) matches any one character
in that string. If the first character in the brackets is a caret (^ ), it matches
any character except those in the string. For example, [abc] matches a, b, or c,
but not x, y, or z. However, [^abc] matches x, y, or z, but not a, b, or c.
A minus sign (-) within square brackets indicates a range of consecutive ASCII
characters. For example,
If a right square bracket is immediately after a left square bracket, it does
not terminate the string but is considered to be one of the characters to match.
If any special character, such as backslash (
|
||||||||||||||||||||||||||||||||||||
Matching the beginning or end of a line |
You can specify that a regular expression match only the beginning or end of the line.
In NewsWatcher, a line is the whole field that is being matched, for example the
author or subject fields. These are called anchor characters:
If a caret (
If a dollar sign (
If an entire regular expression is enclosed by a caret and dollar sign
(
^.$ ".
|
||||||||||||||||||||||||||||||||||||
REGEX EXTENSIONS
|
|||||||||||||||||||||||||||||||||||||
Matching words |
You can specify that a regular expression match parts of words with \< ;
(match the start of a word) and \> ; (match the end of a word).
An expression like "\<app " will match "apple" and "application", while
"ing\> " will match all words ending in -ing. To match a whole word,
using an expression like "\<this\> ".
NewsWatcher provides facilities for doing words matches (which use these expressions internally), but if you want more flexibility, these come in useful. For example, you might want
to match
MS Excel, Microsoft Excel, Microsquish Excel etc. To remind you, the |
||||||||||||||||||||||||||||||||||||
Alternatives |
You can define an expression like (cash|money) to match
strings which contain either the word 'cash', or the word 'money', or both.
Note that the parentheses around the expression are required.
|
||||||||||||||||||||||||||||||||||||
REGEX EXAMPLES
|
|||||||||||||||||||||||||||||||||||||
Examples |
Here are some sample regular expressions that I've found useful.
Go back to the page on filtering. |
||||||||||||||||||||||||||||||||||||
|
|