One easy and useful analysis that you can do is to find out how many times each document at your site has been visited. access.pl reports on the access counts of documents beginning with the letter s.
Note The parseLogEntry() function uses $_ as the pattern space. This eliminates the need to pass parameters but is generally considered bad programming practice. But this is a small program, so perhaps it's okay.
access.pl operates as follows:
The Perl for access.pl is as follows:
#!/usr/bin/perl -w format = @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @>>>>>>> $document, $count . format STDOUT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "Access Counts for S* Documents",, $% Document Access Count --------------------------------------- ------------ . sub parseLogEntry { my($w) = "(.+?)"; m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/; return($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11); } $LOGFILE = "access.log"; open(LOGFILE) or die("Could not open log file."); foreach (<LOGFILE>) { $fileSpec = (parseLogEntry())[7]; $fileSpec =~ m!.+/(.+)!; $fileName = $1; # some requests don't specify a filename, just a directory. if (defined($fileName)) { $docList{$fileSpec}++ if $fileName =~ m/^s/i; } } close(LOGFILE); foreach $document (sort(keys(%docList))) { $count = $docList{$document}; write; }
This program displays:
Access Counts for S* Documents Pg 1 Document Access Count -------------------------------------- ------------ /~bamohr/scapenow.gif 1 /~jltinche/songs/song2.gif 5 /~mtmortoj/mortoja_html/song.html 1 /~scmccubb/pics/shock.gif 1
This program has a couple of points that deserve a comment or two. First, notice that the program takes advantage of the fact that Perl's variables default to a global scope. The main program values $_ with each log file entry and parseLogEntry() also directly accesses $_. This is okay for a small program but for larger programs, you need to use local variables. Second, notice that it takes two steps to specify files that start with a letter. The filename needs to be extracted from $fileSpec and then the filename can be filtered inside the if statement. If the file that was requested has no filename, the server will probably default to index.html. However, this program doesn't take this into account. It simply ignores the log file entry if no file was explicitly requested.
You can use this same counting technique to display the most frequent remote sites that contact your server. You can also check the status code to see how many requests have been rejected. The next section looks at status codes.