next up previous contents
Next: Pattern Memory Up: Regular Expressions Previous: Character Classes

Quantifiers

Perl provides several different quantifiers that let you specify how many times a given component must be present before the match is true. They are used when you don't know in advance how many characters need to be matched.

The Six Types of Quantifiers are:

If you need to match a word whose length is unknown, you need to use the + quantifier. You can't use an * because a zero length word makes no sense. So, the match statement might look like this:

m/\w+/;
\begin{verbatim}


\par This pattern will match {\tt "QQQ"} and {\tt "AAAAA"}
but not {\tt ""} or {\tt " BBB"}. In order to account for
the leading white space, which may or may not be at the beginning of a string, you
need to use the asterisk ({\tt *}) quantifier in conjunction with the {\tt
\verb|\|{\tt s} symbolic character class in the following way:

\begin{verbatim}
m/\s*\w+/;

Tip Be careful when using the * quantifier because it can match an empty string, which might not be your intention. The pattern /b*/ will match any string-even one without any b characters.

At times, you may need to match an exact number of components. The following match statement will be true only if five words are present in the $_ variable (quant1.pl):

$_ = "AA AB AC AD AE";
m/(\w+\s+){5}/;

In this example, we are matching at least one word character followed by zero or more white space characters. The {5} quantifier is used to ensure that that combination of components is present five times.

The * and + quantifiers are greedy. They match as many characters as possible. This may not always be the behavior that you need. You can create non-greedy components by following the quantifier with a ?.

Use the following file specification in order to look at the * and + quantifiers more closely:

$_ = '/user/Jackie/temp/names.dat';

The regular expression .* will match the entire file specification. This can be seen in the following small program (quant2.pl):

$_ = '/user/Jackie/temp/names.dat';
m/.*/;
print $&;

This program displays

/user/Jackie/temp/names.dat

You can see that the * quantifier is greedy. It matched the whole string. If you add the ? modifier to make the .* component non-greedy, what do you think the program would display (quant3.pl)?

$_ = '/user/Jackie/temp/names.dat';
m/.*?/;
print $&;

This program displays nothing because the least amount of characters that the * matches is zero. If we change the * to a +, then the program will display /

Next, let's look at the concept of pattern memory, which lets you keep bits of matched string around after the match is complete.


next up previous contents
Next: Pattern Memory Up: Regular Expressions Previous: Character Classes
dave@cs.cf.ac.uk