April 14, 2001
Time for Lunch!
The simple task of performing a network baseline can occasionally uncover some very interesting trends. Team Uptime was looking through some utilization charts that detailed absolute network utilization over a week?s time on a client?s token ring network. This chart detailed the utilization of the network as a percentage from one to one hundred, with averages taken every five minutes. A Network General (now Network Associates) Sniffer generated this utilization value, so the process of compiling these statistics did not impact the network utilization as a RMON probe would.
Most of the token ring networks had very low utilization, although we usually expected the normal ?bump? of a few percentage points when folks logged into the network between 8:30am and 9:00am. Most of these rings were used for a mainframe application that was accessed every few minutes and transferred a couple of screens from the mainframe to the PC workstations. These workstations were in a warehouse, so there were very few devices per ring. None of these rings were connected to the Internet, so utilization rates remained consistently low.
However, this ring had spikes of up to seventeen percent, and some of these spikes lasted for over an hour! These spikes also occurred at unusual times, such as lunchtime or the middle of the afternoon. Although these utilization values were well under our warning threshold of forty percent, we were intrigued enough by these strange statistics to do some detective work.
We returned to the ring, turned on our Sniffer, and started a capture session. Initial statistics were normal, with utilization percentages between one and two percent. We configured a Sniffer trigger, which told the Sniffer to wait until the utilization spiked above 10% utilization for 20% of a minute (12 seconds). This trigger would automatically save this trace to disk after capturing a full buffer of information, which was about 120 megabytes of information. The trap was set.
We returned the following day, unplugged the Sniffer, and brought it back to our desks. The Sniffer had captured five traces, thanks to our trigger. A quick glance at the trace showed very few Expert problems, and nothing that would be considered a major network problem (other than the utilization thresholds that we set low enough to activate the trigger).
The trace file was much more interesting. The increase in utilization was found to be nothing more than a file transfer! The odd part of this file transfer was the amazing amount of data transferred in a short period of time. In a single second, about 175K was transferred! This single workstation was transferring enough data to increase the utilization of the network from a normal 3 percent to over 15 percent!
As we looked deeper into the protocol trace, we noticed this wasn?t a normal file transfer. The following trace tells the story (this trace can be downloaded here):
Frame Source Address Dest. Address Size Delta Time Summary 1 1005F99.0004AC766 1005FD8.1 67 0.000.000 NCP: C F=03F2F1 8803 0 Read 1038 at 4263 2 1005FD8.1 1005F99.0004AC766 1096 0.002.231 NCP: R OK, 1038 bytes read 3 1005F99.0004AC766 1005FD8.1 67 0.001.338 NCP: C F=03F2F1 8803 0 Read 1038 at 5301 4 1005FD8.1 1005F99.0004AC766 1096 0.001.638 NCP: R OK, 1038 bytes read 5 1005F99.0004AC766 1005FD8.1 67 0.001.341 NCP: C F=03F2F1 8803 0 Read 1038 at 6339 6 1005FD8.1 1005F99.0004AC766 1096 0.001.642 NCP: R OK, 1038 bytes read 7 1005F99.0004AC766 1005FD8.1 67 0.001.336 NCP: C F=03F2F1 8803 0 Read 1038 at 7377 8 1005FD8.1 1005F99.0004AC766 1096 0.001.643 NCP: R OK, 1038 bytes read 9 1005F99.0004AC766 1005FD8.1 67 0.001.369 NCP: C F=03F2F1 8803 0 Read 1038 at 8415 10 1005FD8.1 1005F99.0004AC766 1096 0.001.645 NCP: R OK, 1038 bytes read 11 1005F99.0004AC766 1005FD8.1 67 0.001.331 NCP: C F=03F2F1 8803 0 Read 1038 at 9453 12 1005FD8.1 1005F99.0004AC766 1096 0.001.646 NCP: R OK, 1038 bytes read 13 1005F99.0004AC766 1005FD8.1 67 0.001.336 NCP: C F=03F2F1 8803 0 Read 527 at 10491 14 1005FD8.1 1005F99.0004AC766 585 0.000.975 NCP: R OK, 527 bytes read 15 1005F99.0004AC766 1005FD8.1 67 0.024.029 Expert: File Retransmission NCP: C F=03F2F1 8803 0 Read 1038 at 4263 16 1005FD8.1 1005F99.0004AC766 1096 0.001.646 NCP: R OK, 1038 bytes read 17 1005F99.0004AC766 1005FD8.1 67 0.001.338 NCP: C F=03F2F1 8803 0 Read 1038 at 5301 18 1005FD8.1 1005F99.0004AC766 1096 0.001.646 NCP: R OK, 1038 bytes read 19 1005F99.0004AC766 1005FD8.1 67 0.001.330 NCP: C F=03F2F1 8803 0 Read 1038 at 6339 20 1005FD8.1 1005F99.0004AC766 1096 0.001.642 NCP: R OK, 1038 bytes read 21 1005F99.0004AC766 1005FD8.1 67 0.001.460 NCP: C F=03F2F1 8803 0 Read 1038 at 7377 22 1005FD8.1 1005F99.0004AC766 1096 0.001.639 NCP: R OK, 1038 bytes read 23 1005F99.0004AC766 1005FD8.1 67 0.001.341 NCP: C F=03F2F1 8803 0 Read 1038 at 8415 24 1005FD8.1 1005F99.0004AC766 1096 0.001.644 NCP: R OK, 1038 bytes read 25 1005F99.0004AC766 1005FD8.1 67 0.001.336 NCP: C F=03F2F1 8803 0 Read 1038 at 9453 26 1005FD8.1 1005F99.0004AC766 1096 0.001.640 NCP: R OK, 1038 bytes read 27 1005F99.0004AC766 1005FD8.1 67 0.001.338 NCP: C F=03F2F1 8803 0 Read 527 at 10491 28 1005FD8.1 1005F99.0004AC766 585 0.000.973 NCP: R OK, 527 bytes read
The analysis of the trace showed an interesting conversation, beginning with frame 1:
Frame Source Address Dest. Address Size Delta Time Summary 1 1005F99.0004AC766 1005FD8.1 67 0.000.000 NCP: C F=03F2F1 8803 0 Read 1038 at 4263This first frame showed the source address of a workstation on the network, and the destination address of a NetWare file server.
The OSI Layer 3 network addresses of a Novell NetWare network are a combination of an IPX network address and an IPX node address. The IPX network address is a four-byte address, and is usually followed by a period and the six-byte node address. The node address is usually the same as the MAC address of a workstation, although some node addresses are specific to particular NetWare devices. For example, a file server?s node address is always 1.
Since the destination address had a 1 as the node address, we knew the workstation was communicating to one of the NetWare file servers. The decode showed that the frame data contained a NetWare Core Protocol (NCP) command (denoted with a C) referencing a specific file handle.
NCP uses file handles to reference files instead of using file names.
The NCP command in frame 1 requested the file server to allow the workstation to read 1,038 bytes offset 4,263 bytes into the file. This means that the workstation wanted the file server to read 4,263 bytes into the frame, and send the next 1,038 bytes.
Frame 2 showed the response from the file server:
Frame Source Address Dest. Address Size Delta Time Summary 2 1005FD8.1 1005F99.0004AC766 1096 0.002.231 NCP: R OK, 1038 bytes read
The NCP response (note the R) was the 1,038 bytes of the file transferred to the workstation.
This process continued until seven frames had been transferred for a total of 7,630 bytes. This process then immediately repeats, reading the same bytes from the same file at the same offset. The Sniffer Expert system makes a note of this, marking a File Retransmission when the process begins again in frame 15.
This process continued for a random amount of time. In some traces, this process repeated for a few minutes. Other traces showed the process repeat for over twenty minutes.
One feature of this issue was unique, because it always occurred during the lunchtime hours. After identifying the location of the workstation, we waited until lunch to see the results. Just after noon, the Sniffer showed the network utilization increase to over fifteen percent, so we walked over to the workstation.
Before arriving, we could tell the user wasn?t at their desk (obviously having lunch). We hit a key on the keyboard to clear the screen saver and looked through the processes running on the workstation. No process running on the workstation could have caused this problem! We were stumped.
We returned to the Sniffer, and saw that the network utilization was back to normal. Fifteen minutes later, the problem returned. This time, one person stayed with the Sniffer while another looked at the workstation. We hit a key on the keyboard to stop the screen saver, and the utilization dropped to normal! We were finally able to correlate the network utilization with the workstation?s screen saver.
Looking further, we saw that the screen saver wasn?t a normal Windows-installed screen saver, but a screen-saver package from a third party. More investigation showed that the program was installed onto the user?s private directory on the file server instead of the local hard drive! Each time the user left their workstation for more than fifteen minutes, the screen-saver automatically started and the network utilization increased.
The network administrator removed the software, and reprimanded the user for installing unauthorized third-party software onto the network. The last time we saw the network, it was humming along at peak efficiency.
The painful lessons:
* A network baseline can be a very important method of researching network efficiency.
* Check the right's for user directories on the network. If possible, make sure the user doesn't have execute rights from their private network directory.
* When network problems occur, order lunch in.
Posted by james_messer at April 14, 2001 06:39 PM