Tracking Web Usage with Network Flight Recorder

Chad Childers
Ford Motor Company
chad@ford.com

Linda Bangert
Quantum Solutions, Inc.
lmb@interguide.com

Mike O'Connor
Silicon Graphics, Inc.
mjo@dojo.mi.org

Abstract

The hot topic in security circles these days is Marcus Ranum's Network Flight Recorder [1], which is normally thought of as a tool to track attempted break-ins. The NFR software is designed to sit on a separate computer from your server and analyze network traffic. Like a flight recorder, NFR survives even if your system doesn't. It can recover or monitor online transaction records, keep historical statistics about how the network grows, and generate detailed breakdowns of how your network services are being used and by whom.

The great strength of NFR lies in its configurability. You can choose to track only HTTP packets, and then you have a tool for validating HTTP log consistency. Because NFR is not connected with your Web server, issues with virtual hosts are easily tracked, and you are not filling up space on your Web server or using valuable CPU cycles for log analysis. Tracking unique IP addresses is more important in Web usage analysis than hit count, and NFR is ideally suited for this. Once you have reached an understanding of potential problems in HTTP serving, it can also serve as a tool for heading off those problems before they cause user complaints.

Sniffers and Security - the design intent behind NFR

The Internet is based on TCP/IP, which is a packet switching protocol. This means that all data to or from computers on a shared local network go across the same wire, taking turns. Each packet is labeled with a header telling which computer sent the packet and which computer should receive the packet.

Each computer on a network has a unique address, and the packet switching system permits multiple pairs of computers to communicate across a shared network by dividing each conversation into small packets and arranging for computers to take turns sending packets. Normal IP stacks ignore packets that are not for their address. [2] A network sniffer is a device, hardware or software, that is attached to your network in promiscuous mode, meaning that it keeps track of all the packets going by on the network, not just those with headers saying that they are for it.

Sniffers were developed to track network problems, and are good for that, but for a long time, network "sniffers" weren't exceptionally helpful to people working with Web servers. You could certainly use tools such as Network General Sniffer [3] or a Unix box running snoop to dump the network data coming along your interface into a file. If you had a good deal of knowledge about how TCP/IP worked, you could stare at data after the fact and make some general determinations about your network. But until recently, it's been very difficult to apply sniffer data toward a particular network protocol, like HTTP or SSL, in any meaningful way. The tools just weren't there, or were prohibitively expensive and complex.

Enter Network Flight Recorder (NFR) and other intrusion detection systems:

The NFR uses a promiscuous packet interface to pass visible traffic into an internally meta-programmed decision engine which routes information about packets and their contents into statistical or logging backends. In addition to packet analysis and collection, the NFR's internal architecture permits network managers to sample interesting portions of network traffic for logging or statistical analysis. The NFR programming language is simple, but powerful enough that you can perform reasonable analysis on traffic before choosing to record it. For example, you might analyze SMTP transactions but only choose to record those relating to a user who is sending spam or abusive E-mail. The analysis language includes a capability for generating alert messages which the rest of the system queues, multiplexes, and delivers. A simplified hyper-query interface allows extensive browsing of the NFR's stored datasets and statistics from any Java-enabled browser. [4]

Network Flight Recorder fills an important void in the security tool arena. It is a general purpose network traffic recorder. It is perfect for sites that need to verify their security measures. NFR has a meta-programmed decision engine that is extremely flexible. The NFR N-code language is designed to deal with streams of network traffic, with data types like "source IP address" and "url" along with more general purpose programmatic functions. The NFR inserts your N-code into a high performance filtering engine. For example, one of the simple programs included with NFR is "Watch what clients send to web servers". The user can get different reports by simply filling out an on-screen form, or perform more advanced reporting by changing the N-code. The N-code implementation is shown below.

Figure 4: filter N-code for showing Web client requests

# Copyright(C) 1997 Network Flight Recorder, Inc.
# All rights reserved.
#
# Use and distribution of this software and its source code
# are governed by the terms and conditions of the
# Network Flight Recorder Software License ("LICENSE.TXT")
#
# By Mark Sienkiewicz / NFR
#

# This filter serves two purposes: to record client requests
# made to your web servers, and to serve as example in the LISA paper.
#

watchservers_schema = library_schema:new ( 1,
[ "time", "int", "ip", "ip", "str" ] , scope() );

# list of web servers to watch. List IP address of servers or a netmask
# that matches all. use 0.0.0.0:0.0.0.0 to match any server

my_web_servers = [ 0.0.0.0:0.0.0.0 ] ;

# gather data the client sends to a web server. This will only see
# web servers on port 80. If I had web servers on other ports,
# I would make this more elaborate. (coming soon?)

filter watch tcp ( client, dport: 80 )
{

## [ code deleted here for publication brevity ]
# save the time, the connection hash, the client,
# the server, and the command to a list
record system.time, tcp.connHash, tcp.connSrc, tcp.connDst,
$t to watchservers_list;

## [ code deleted here for publication brevity ]

}

watchservers_list = recorder ("bin/list packages/web/watchservers.cfg",
"watchservers_schema" );

NFR is free for noncommercial and research use. It can be installed [5] on an Intel-based server running Linux or BSD, or a Sun server running Solaris. In addition, NFR, Inc. promises to keep the source code freely available. It is a great addition to the security administrator's toolkit.

In their efforts to create tools to do post-mortem analysis of hacker attacks, the authors of NFR have also created better tools for network monitoring that have some intelligence about the protocols they monitor, are more programmable, and have nice visual interfaces to help show you data that's important to you. The marriage of this emerging technology to the Web can lead to new tools to qualify and quantify Web behavior.

The Web Connection

I.  HTTP log integrity: If you want to make sure that the logging mechanisms for your particular httpd are correct, or that you're seeing particular behavior, or if you're an ISP that bills by number of hits, it'd be convenient to have additional logging mechanisms to insure the integrity of your data.

Figure 5: Typical HTTP access_log

Dialup153.tcs.tulane.edu - - [22/Mar/1998:20:44:26 -0500] "GET /stilyagi/book.html HTTP/1.0" 200 4565
crawl4.atext.com - - [22/Mar/1998:21:02:12 -0500] "GET /~chad/gene/D0003/156.html HTTP/1.0" 200 868
grampa.gendex.com - - [22/Mar/1998:21:10:39 -0500] "GET /~chad/gene/GENDEX.txt HTTP/1.0" 304 -
tc3-35.dialup.coast.net - - [22/Mar/1998:21:23:16 -0500] "GET /~chad/wwwnfr.html HTTP/1.0" 200 12505
tc3-35.dialup.coast.net - - [22/Mar/1998:21:23:22 -0500] "GET /~chad/packet.gif HTTP/1.0" 200 1961
tc3-35.dialup.coast.net - - [22/Mar/1998:21:23:22 -0500] "GET /~chad/sniff.gif HTTP/1.0" 200 4688
ww-wc01.proxy.aol.com - - [22/Mar/1998:21:31:51 -0500] "GET /~sharon/resume.html HTTP/1.0" 200 3100 
199.221.88.74 - - [22/Mar/1998:22:11:51 -0500] "GET /cons/confusion.html HTTP/1.0" 200 3461
199.221.88.74 - - [22/Mar/1998:22:11:58 -0500] "GET /pic/alien.g.gif HTTP/1.0" 200 1200
199.221.88.74 - - [22/Mar/1998:22:12:50 -0500] "GET /stilyagi/about.html HTTP/1.0" 200 16499

Typical log analysis tools like Analog[6] allow you to make the common log format into something easier to read, but can only track what an HTTP server can log.
Figure 6: Typical Web traffic analysis w/ analog
Web Server Statistics

Program started at Sun-15-Mar-1998 23:50 local time.
Analysed requests from Fri-11-Nov-1997 09:04 to Sun-15-Jan-1998 22:30 (65.6 days).

Total successful requests: 156 613 154 (9 660 543)
Average successful requests per day: 632 626 (1 380 077)
Total successful requests for pages: 9 680 303 (569 546)
Total failed requests: 315 167 (15 439)
Total redirected requests: 2 911 322 (280 036)
Number of distinct files requested: 62 272 (28 069)
Approximate number of distinct hosts served: 123 835 (57 321)
Approximate number of new hosts served in last 7 days: 2 017
Total data transferred: 276 381 Mbytes (13 871 Mbytes)
Average data transferred per day: 1 116 Mbytes (1 982 Mbytes)
(Figures in parentheses refer to the last 7 days).

Monthly report

   month:  #reqs: 
--------  ------  
Nov 1997: 119865: 
Dec 1997: 121214: 
Jan 1998: 224960: 


Using tools like NFR, it is possible to do much more. For example, you can look at all traffic on your network on the basis of port number (for instance, port 80, the default for HTTP), list all web servers on your network (by scanning all TCP traffic for HTTP headers), and look at all HTTP requests going across your network on port 80. These functions can be combined to show all web traffic on any port. Any sniffer could give you this data, but NFR combines ease of use with a variety of output formats, including useful graphics such as bar charts, histograms, etc.

At this point, it may be worth mentioning that cookies are readily viewed by NFR or any other sniffer. Personal data is often stored in a cookie for the purpose of personalizing a page, listing a stock portfolio, or saving login and password information. Even if the cookie is encrypted, it may be vulnerable to electronic privacy invasion. [7]  To check for a privacy invasion, you could log what goes into httpd as well as what comes out of httpd and compare.

Figure 8: NFR Query of Web client requests
Time
TCP Hash
Client
Server
Command
Sat Jul 18 06:40:34 1998 244 141.217.139.132 204.71.177.172 GET / HTTP/1.0
Sat Jul 18 06:40:34 1998 244 141.217.139.132 204.71.177.172 User-Agent: Mozilla/4.03 [en] (Win95; I)
Sat Jul 18 06:40:34 1998 244 141.217.139.132 204.71.177.172 Host: my.yahoo.com
Sat Jul 18 06:40:34 1998 244 141.217.139.132 204.71.177.172 Cookie: Y=v=1&n=45femu12ukc9n&l=3h_bkl_/o&p=m281f291131v; M=dp=sum&lg=us; T=z=35af3d15

Figure 9: NFR Query of TCP packets, source port 80
Source Port Source Host Destination Host Bytes Packets
80 24.1.109.239 141.217.139.109 384164 322
80 24.4.86.182 141.217.139.109 60685 45
80 24.92.105.152 141.217.139.65 464703 413
80 38.15.19.75 141.217.139.109 351 4

II. Logging ephemeral data that might not otherwise be logged: Most Web traffic is so short that it fits in the TCP/IP socket buffer. This makes it very difficult to determine if even the small answers to Web server requests actually made it to the other end. You could make the TCP/IP send buffer smaller, but then your Web server has to do more work. Answer: use an NFR to see what kind of bandwidth your WWW server really takes.

With NFR, you can visualize data that isn't going to be obvious with some of the WWW logs. You can see how much of that 1.5GB JPEG the user with the 14.4k modem really downloaded before getting impatient and canceling the download. You can see 'dropped' connections. You can look for retransmits. In general, you can see what kind of bandwidth your WWW server is really using, data which won't be obvious by just counting transactions within the httpd logs.

III.  Finding Web servers on your local network on ports other than 80: Many people run Web servers on ports other than port 80 for a variety of reasons. They don't have root on their Unix box and can't run on ports greater than 1024, they want to hide a massive archive of pictures[8]  or an old, insecure web server running CGIs. You might be interested in recording who's running a Web server on your network, but may not want to set off alarms and waste cycles by beating every TCP/IP port on every machine with SATAN[9], and hoping that the banner the Web server uses identifies itself as a Web server. How can you tell if a given service is actually a WWW server? Use a NFR which looks for real HTTP traffic.
Figure 11: NFR track of HTTP traffic from a list recorder
Hash   Client          Server            Command
 18 208.218.124.77   208.218.124.42   GET / HTTP/1.0                 
 18 208.218.124.77   208.218.124.42   If-Modified-Since: Wednesday, 06-Nov-96 12:32:03 GMT; length=530 
 18 208.218.124.77   208.218.124.42   Connection: Keep-Alive         
 18 208.218.124.77   208.218.124.42   User-Agent: Mozilla/3.0Gold (X11; I; BSD/OS 3.0 i386) 
 18 208.218.124.77   208.218.124.42   Host: cornfed                  
 18 208.218.124.77   208.218.124.42   Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* 
 18 208.218.124.77   208.218.124.42                                  
 18 208.218.124.77   208.218.124.42   GET /apache_pb.gif HTTP/1.0    
 18 208.218.124.77   208.218.124.42   If-Modified-Since: Wednesday, 03-Jul-96 06:18:15 GMT; length=2326 
 18 208.218.124.77   208.218.124.42   Referer: http://cornfed/       
 18 208.218.124.77   208.218.124.42   Connection: Keep-Alive         
 18 208.218.124.77   208.218.124.42   User-Agent: Mozilla/3.0Gold (X11; I; BSD/OS 3.0 i386) 
 18 208.218.124.77   208.218.124.42   Host: cornfed                  
 18 208.218.124.77   208.218.124.42   Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg 

You may simply want to log a list of version strings of Web servers used on your network to inform the machine owners of when new httpds come out to fix holes (www.trouble.org).

IV.  Finding out what your Web client sends: Some people are still surprised that the Web browser emits things like what kind of browser you use, HTTP_REFERER by default. You may want to know if your so-called "encrypted session" really is sending out credit card numbers -- for that matter, you may want to know if you have users who are inadvertently emitting credit card info using an unsecured WWW channel. NFR and similar programs can help you find out that information.

V.  Doing the analysis on a separate box: HTTP logs can become unmanageably huge very quickly; a common problem in log analysis is simply getting the logs off of the server onto another box which has to spend more time running the analysis than it takes to build up another huge set of logs. Because NFR is not running on your Web server, this problem is easier to manage. It is extremely easy to tell NFR to track all HTTP accesses (that is, accesses on port 80) for a particular machine, or an entire network, and graph it as you like.

Figure 12: NFR graph of Web traffic from a single server

VI.  Tracking problems with your Web server: When users are constantly canceling a page during loading, because they are frustrated with the wait, the Web server administrator should take some action - but it is relatively difficult to determine when that is happening, and really unusual to do that kind of tracking with current tools. This leads to some really interesting ideas... the author of NFR read an earlier version of this paper and suggested tracking packet timing issues and actually using NFR to find out what bandwidth your users have, and the latency of the connection. Determining user bandwidth is something of a "holy grail" in Web circles, because with that information (which is often impossible to determine without asking the user directly) you can decide what size graphics are appropriate, and whether to use graphics and Java at all! When a robot is out there causing problems by inefficient or inappropriate searches, or something has hung on your network, you would want to know this, and NFR can be a tool to make you aware before the user complaints start coming in.

VII.  Network attacks: Of course, it doesn't hurt to use NFR technologies for their intended purpose. There may be malicious folks trying all sorts of network-level attacks on your WWW server, or people trying to send bogus information to CGI scripts, just waiting to exploit some bug or another than compromises your system.

Conclusions

In actuality, the best part about the technologies that are emerging is that they're flexible and extendable, and have good visual tools so you can take that sniffer data and put it in a WWW page fairly easily. The whole purpose behind this paper isn't so much about specific applications for modern sniffers as much as raising awareness of their existence, that they are free to the academic community, and have the potential to be a great research tool.


[1] Network Flight Recorder, Inc. at http://www.nfr.com/
[2] Comer, Douglas E. The Internet Book : Everything You Need to Know About Computer Networking and How the Internet Works.  Prentice Hall, August 1, 1997
[3] Network General at http://www.nai.com/
[4] Ranum et.al., "Implementing A Generalized Tool For Network Monitoring", Proceedings of the 11th Systems Administration Conference (LISA '97), San Diego, California October 26-31, 1997
[5] The installation of NFR is not difficult, if the system administrator performing the install has a good understanding of IP, MAC addresses, and web server configuration on your system. See the NFR Mailing List Archives for help with the install, at http://www.nfr.net/nfr/mail-archive/. Using NFR does not require a technical background, once it is installed.
[6] Analog: a WWW server logfile analysis program at http://www.statslab.cam.ac.uk/~sret1/analog/
[7] Childers, Bangert, et.al., "Open Standard Content Cookies", Proceedings of the WebNet 97 Conference, Toronto, Ontario.
[8] There are many such archives on the Net, from http://abc.com:1234/ to http://xyz.edu:6789/. If you find one which is particularly interesting, please email the authors.
[9] Security Administrator's Tool for Analyzing Networks at http://www.trouble.org/