next up previous contents
Next: Reading a Log File Up: Using Perl with Web Previous: Using Perl with Web

Server Log Files

The most useful tool to assist in understanding how and when your Web site pages and applications are being accessed is the log file generated by your Web server. This log file contains, among other things, which pages are being accessed, by whom, and when.

Each Web server will provide some form of log file that records who and what accesses a specific HTML page or graphic. A terrific site to get an overall comparison of the major Web servers can be found at

http://www.webcompare.com/.

From this site one can see which Web servers follow the CERN/NCSA common log format that is detailed below. In addition, you can also find out which sites can customize log files, or write to multiple log files. You might also be surprised at the number of Web servers there are on the market.

Understanding the contents of the server log files is a worthwhile endeavor. And in this section, you'll see several ways that the information in the log files can be manipulated. However, if you're like most people, you'll use one of the log file analyzers that you'll read about in the section "Existing Log File Analyzing Programs" to do most of your work. After all, you don't want to create a program that others are giving away for free.

Note This section about server log files is one that you can read when the need arises. If you are not actively running a Web server now, you won't be able to get full value from the examples. The CD-ROM that accompanies this book has a sample log file to you to experiment on but it is very limited in size and scope.

Nearly all of the major Web servers use a common format for their log files. These log files contain information such as the IP address of the remote host, the document that was requested, and a timestamp. The syntax for each line of a log file is:

site logName fullName [date:time GMToffset] "req file proto" status length

Because that line of syntax is relatively meaningless, here is a line from a real log file:

204.31.113.138 - - [03/Jul/1996:06:56:12 -0800]
    "GET /PowerBuilder/Compny3.htm HTTP/1.0" 200 5593

Even though the line is split into two, here, you need to remember that inside the log file it really is only one line.

Each of the eleven items listed in the above syntax and example are described in the following list.

Implied Path and Filename-accesses a file in a user's home direc-tory. For example, /foo/ could be expanded into /user/foo/homepage.html. The /user/foo directory is the home directory for the user foo. And homepage.html is the default file name for any user's home page. Implied paths are hard to analyze because you need to know how the server is set up and because the server's set up may change. Relative Path and Filename-accesses a file in a directory that is specified relative to a user's home directory. For example, /foo/cooking.html will be expanded into /user/foo/cooking.html. Full Path and Filename-accesses a file by explicitly stating the full directory and filename. For example,

/user/foo/biking/mountain/index.html.

Web servers can have many different types of log files. For example, you might see a proxy access log, or an error log. In this chapter, we'll focus on the access log-where the Web server tracks every access to your Web site.


next up previous contents
Next: Reading a Log File Up: Using Perl with Web Previous: Using Perl with Web
dave@cs.cf.ac.uk