Here are some handy uses of the match operator:
m/(.)\1/;
This pattern uses pattern memory to store a single character. Then a back-reference
(\
1) is used to repeat the first character. The back-reference
is used to reference the pattern memory while still inside the pattern. Anywhere
else in the program, use the $1 variable. After this statement, $1
will hold the repeated character. This pattern will match two of any non-newline
character.
m/^\s*(\w+)/;
After this statement, $1 will hold the first word in the string. Any
whitespace at the beginning of the string will be skipped by the
\
s* meta-character sequence. Then the \
w+
meta-character sequence will match the next word. Note that the *-which
matches zero or more-is used to match the whitespace because there may not be
any. The +-which matches one or more-is used for the word.
m/ (\w+) (?# Match a word, store its value into pattern memory) [.!?]? (?# Some strings might hold a sentence. If so, this) (?# component will match zero or one punctuation) (?# characters) \s* (?# Match trailing whitespace using the * because there) (?# might not be any) $ (?# Anchor the match to the end of the string) /x;
After this statement, $1 will hold the last word in the string. You need to expand the character class, [.!?], by adding more punctuation.
m/^(\w+)\W+(\w+)$/x;
After this statement, $1 will hold the first word and $2 will
hold the second word, assuming that the pattern matches. The pattern starts with
a caret and ends with a dollar sign, which means that the entire string must match
the pattern. The \
w+ meta-character sequence matches one
word. The \
W+ meta-character sequence matches the whitespace
between words. You can test for additional words by adding one
\
W+(\
w+) meta-character sequence for each
additional word to match.
m/^\s*(\w+)\W+(\w+)\s*$/;
After this statement, $1 will hold the first word and $2 will
hold the second word, assuming that the pattern matches. The \
s* meta-character
sequence will match any leading or trailing whitespace.
$_ = "This is the way to San Jose."; $word = '\w+'; # match a whole word. $space = '\W+'; # match at least one character of whitespace $string = '.*'; # match any number of anything except # for the newline character. ($one, $two, $rest) = (m/^($word) $space ($word) $space ($string)/x);
After this statement, $1 will hold the first word, $2 will hold the second word, and $rest will hold everything else in the $_ variable. This example uses variable interpolation to, hopefully, make the match pattern easier to read. This technique also emphasizes which meta-sequence is used to match words and whitespace. It lets the reader focus on the whole of the pattern rather than the individual pattern components by adding a level of abstraction.
$result = m/ ^ (?# Anchor the pattern to the start of the string) [\$\@\%] (?# Use a character class to match the first) (?# character of a variable name) [a-z] (?# Use a character class to ensure that the) (?# character of the name is a letter) \w* (?# Use a character class to ensure that the) (?# rest of the variable name is either an) (?# alphanumeric or an underscore character) $ (?# Anchor the pattern to the end of the) (?# string. This means that for the pattern to) (?# match, the variable name must be the only) (?# value in $_. /ix; # Use the /i option so that the search is # case-insensitive and use the /x option to # allow extensions.
After this statement, $result will be true if $_ contains a legal variable name and false if it does not.
$result = m/ (?# First check for just numbers in $_) ^ (?# Anchor to the start of the string) \d+ (?# Match one or more digits) $ (?# Anchor to the end of the string) | (?# or) (?# Now check for hexadecimal numbers) ^ (?# Anchor to the start of the string) 0x (?# The "0x" sequence starts a hexadecimal number) [\da-f]+ (?# Match one or more hexadecimal characters) $ (?# Anchor to the end of the string) /i;
After this statement, $result will be true if $_ contains an integer literal and false if it does not.
@results = m/^\d+$|^0[x][\da-f]+$/gi;
After this statement, @result will contain a list of all integer literals in $_. @result will contain an empty list if no literals are found.
m/\w\W/;
After this statement is executed, $& will hold the last character of the first word and the next character that follows it. If you want only the last character, use pattern memory,
m/(\w)\W/};.
Then $1 will be equal to the last character of the first word. If you use the global option,
@array = m/\w\W/g;,
then you can create an array that holds the last character of each word in the string.
m/\W\w/;
After this statement, $& will hold the first character of the second
word and the whitespace character that immediately precedes it. While this pattern
is the opposite of the pattern that matches the end of words, it will not match the
beginning of the first word! This is because of the \
W meta-character. Simply
adding a * meta-character to the pattern after the \
W does not
help, because then it would match on zero non-word characters and therefore match
every word character in the string.
$_ = '/user/Jackie/temp/names.dat'; m!^.*/(.*)!;
After this match statement, $1 will equal names.dat. The match is anchored to the beginning of the string, and the .* component matches everything up to the last slash because regular expressions are greedy. Then the next (.*) matches the file name and stores it into pattern memory. You can store the file path into pattern memory by placing parentheses around the first .* component.
m/(?:rock|monk)fish/x;
The alternative meta-character is used to say that either rock or monk followed by fish needs to be found. If you need to know which alternative was found, then use regular parentheses in the pattern. After the match, $1 will be equal to either rock or monk.
# read the whole file into memory. open(FILE, "<fndstr.dat"); @array = <FILE>; close(FILE); # specify which string to find. $stringToFind = "A"; # iterate over the array looking for the # string. for ($index = 0; $index <= $#array; $index++) { last if $array[$index] =~ /$stringToFind/; } # Use $index to print two lines before # and two lines after the line that contains # the match. foreach (@array[$index-2..$index+2]) { print("$index: $_"); $index++; }
There are many ways to perform this type of search, and this is just one of them. This technique is only good for relatively small files because the entire file is read into memory at once. In addition, the program assumes that the input file always contains the string that you are looking for.