In this section we will develop a Perl script that can open a log file and iterate over the lines of the log file. It is usually unwise to read entire log files into memory because they can get quite large even over over 113 Megabytes!
Regardless of the way that you'd like to process the data, you must open a log file and read it. You can read the entry into one variable for processing, or you can split the entry into it's components. To read each line into a single variable, use the following code sample:
$LOGFILE = "access.log"; open(LOGFILE) or die("Could not open log file."); foreach $line (<LOGFILE>) { chomp($line); # remove the newline from $line. # do line-by-line processing. }
Note If you don't have your own server logs, you can use the file server.log that is included on the CD-ROM that accompanies this book.
The code snippet will open the log file for reading and will access the file one line at a time, loading the line into the $line variable. This type of processing is pretty limiting because you need to deal with the entire log entry at once.
A more popular way to read the log file is to split the contents of the entry into different variables. For example, the code below logfile.pl uses the split() command and some processing to value 11 variables. It opertates as follows:
'['
character.
The Perl code for logfile.pl is:
#!/usr/bin/perl -w $LOGFILE = "access.log"; open(LOGFILE) or die("Could not open log file."); foreach $line (<LOGFILE>) { ($site, $logName, $fullName, $date, $gmt, $req, $file, $proto, $status, $length) = split(' ',$line); $time = substr($date, 13); $date = substr($date, 1, 11); $req = substr($req, 1); chop($gmt); chop($proto); # do line-by-line processing. } close(LOGFILE);
If you print out the variables, you might get a display like this:
$site = ros.algonet.se $logName = - $fullName = - $date = 09/Aug/1996 $time = 08:30:52 $gmt = -0500 $req = GET $file = /~jltinche/songs/rib_supp.gif $proto = HTTP/1.0 $status = 200 $length = 1543
You can see that after the split is done, further manipulation is needed in order to "clean up" the values inside the variable. At the very least, the square brackets and the double-quotes needed to be removed.
You could use a regular expression to extract the information from the log file entries.This approach is more straightforward -- assuming that you are comfortable with regular expressions and you should be by now.
The logfile_regex.pl is as follows:
#!/usr/bin/perl -w $LOGFILE = "access.log"; open(LOGFILE) or die("Could not open log file."); foreach $line (<LOGFILE>) { $w = "(.+?)"; $line =~ m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/; $site = $1; $logName = $2; $fullName = $3; $date = $4; $time = $5; $gmt = $6; $req = $7; $file = $8; $proto = $9; $status = $10; $length = $11; # do line-by-line processing. } close(LOGFILE);
The main advantage to using regular expressions to extract information is the ease with which you can adjust the pattern to account for different log file formats. If you use a server that delimits the date/time item with curly brackets, you only need to change the line with the matching operator to accommodate the different format.