April 14, 2001

Time for Lunch!

The simple task of performing a network baseline can occasionally uncover some very interesting trends. Team Uptime was looking through some utilization charts that detailed absolute network utilization over a week?s time on a client?s token ring network. This chart detailed the utilization of the network as a percentage from one to one hundred, with averages taken every five minutes. A Network General (now Network Associates) Sniffer generated this utilization value, so the process of compiling these statistics did not impact the network utilization as a RMON probe would.

Most of the token ring networks had very low utilization, although we usually expected the normal ?bump? of a few percentage points when folks logged into the network between 8:30am and 9:00am. Most of these rings were used for a mainframe application that was accessed every few minutes and transferred a couple of screens from the mainframe to the PC workstations. These workstations were in a warehouse, so there were very few devices per ring. None of these rings were connected to the Internet, so utilization rates remained consistently low.

However, this ring had spikes of up to seventeen percent, and some of these spikes lasted for over an hour! These spikes also occurred at unusual times, such as lunchtime or the middle of the afternoon. Although these utilization values were well under our warning threshold of forty percent, we were intrigued enough by these strange statistics to do some detective work.

lunch.gif (7503 bytes)

We returned to the ring, turned on our Sniffer, and started a capture session. Initial statistics were normal, with utilization percentages between one and two percent. We configured a Sniffer trigger, which told the Sniffer to wait until the utilization spiked above 10% utilization for 20% of a minute (12 seconds). This trigger would automatically save this trace to disk after capturing a full buffer of information, which was about 120 megabytes of information. The trap was set.

We returned the following day, unplugged the Sniffer, and brought it back to our desks. The Sniffer had captured five traces, thanks to our trigger. A quick glance at the trace showed very few Expert problems, and nothing that would be considered a major network problem (other than the utilization thresholds that we set low enough to activate the trigger).

The trace file was much more interesting. The increase in utilization was found to be nothing more than a file transfer! The odd part of this file transfer was the amazing amount of data transferred in a short period of time. In a single second, about 175K was transferred! This single workstation was transferring enough data to increase the utilization of the network from a normal 3 percent to over 15 percent!

As we looked deeper into the protocol trace, we noticed this wasn?t a normal file transfer. The following trace tells the story (this trace can be downloaded here):

Frame Source Address    Dest. Address      Size Delta Time Summary
     1 1005F99.0004AC766 1005FD8.1            67 0.000.000  NCP: C F=03F2F1 8803 0 Read 1038 at 4263
     2 1005FD8.1         1005F99.0004AC766  1096 0.002.231  NCP: R OK, 1038 bytes read
     3 1005F99.0004AC766 1005FD8.1            67 0.001.338  NCP: C F=03F2F1 8803 0 Read 1038 at 5301
     4 1005FD8.1         1005F99.0004AC766  1096 0.001.638  NCP: R OK, 1038 bytes read
     5 1005F99.0004AC766 1005FD8.1            67 0.001.341  NCP: C F=03F2F1 8803 0 Read 1038 at 6339
     6 1005FD8.1         1005F99.0004AC766  1096 0.001.642  NCP: R OK, 1038 bytes read
     7 1005F99.0004AC766 1005FD8.1            67 0.001.336  NCP: C F=03F2F1 8803 0 Read 1038 at 7377
     8 1005FD8.1         1005F99.0004AC766  1096 0.001.643  NCP: R OK, 1038 bytes read
     9 1005F99.0004AC766 1005FD8.1            67 0.001.369  NCP: C F=03F2F1 8803 0 Read 1038 at 8415
    10 1005FD8.1         1005F99.0004AC766  1096 0.001.645  NCP: R OK, 1038 bytes read
    11 1005F99.0004AC766 1005FD8.1            67 0.001.331  NCP: C F=03F2F1 8803 0 Read 1038 at 9453
    12 1005FD8.1         1005F99.0004AC766  1096 0.001.646  NCP: R OK, 1038 bytes read
    13 1005F99.0004AC766 1005FD8.1            67 0.001.336  NCP: C F=03F2F1 8803 0 Read 527 at 10491
    14 1005FD8.1         1005F99.0004AC766   585 0.000.975  NCP: R OK, 527 bytes read
    15 1005F99.0004AC766 1005FD8.1            67 0.024.029  Expert: File Retransmission
                                                            NCP: C F=03F2F1 8803 0 Read 1038 at 4263
    16 1005FD8.1         1005F99.0004AC766  1096 0.001.646  NCP: R OK, 1038 bytes read
    17 1005F99.0004AC766 1005FD8.1            67 0.001.338  NCP: C F=03F2F1 8803 0 Read 1038 at 5301
    18 1005FD8.1         1005F99.0004AC766  1096 0.001.646  NCP: R OK, 1038 bytes read
    19 1005F99.0004AC766 1005FD8.1            67 0.001.330  NCP: C F=03F2F1 8803 0 Read 1038 at 6339
    20 1005FD8.1         1005F99.0004AC766  1096 0.001.642  NCP: R OK, 1038 bytes read
    21 1005F99.0004AC766 1005FD8.1            67 0.001.460  NCP: C F=03F2F1 8803 0 Read 1038 at 7377
    22 1005FD8.1         1005F99.0004AC766  1096 0.001.639  NCP: R OK, 1038 bytes read
    23 1005F99.0004AC766 1005FD8.1            67 0.001.341  NCP: C F=03F2F1 8803 0 Read 1038 at 8415
    24 1005FD8.1         1005F99.0004AC766  1096 0.001.644  NCP: R OK, 1038 bytes read
    25 1005F99.0004AC766 1005FD8.1            67 0.001.336  NCP: C F=03F2F1 8803 0 Read 1038 at 9453
    26 1005FD8.1         1005F99.0004AC766  1096 0.001.640  NCP: R OK, 1038 bytes read
    27 1005F99.0004AC766 1005FD8.1            67 0.001.338  NCP: C F=03F2F1 8803 0 Read 527 at 10491
    28 1005FD8.1         1005F99.0004AC766   585 0.000.973  NCP: R OK, 527 bytes read          

The analysis of the trace showed an interesting conversation, beginning with frame 1:

 Frame Source Address    Dest. Address      Size Delta Time Summary
     1 1005F99.0004AC766 1005FD8.1            67 0.000.000  NCP: C F=03F2F1 8803 0 Read 1038 at 4263
This first frame showed the source address of a workstation on the network, and the destination address of a NetWare file server.

NetWare Tip
The OSI Layer 3 network addresses of a Novell NetWare network are a combination of an IPX network address and an IPX node address. The IPX network address is a four-byte address, and is usually followed by a period and the six-byte node address. The node address is usually the same as the MAC address of a workstation, although some node addresses are specific to particular NetWare devices. For example, a file server?s node address is always 1.

Since the destination address had a 1 as the node address, we knew the workstation was communicating to one of the NetWare file servers. The decode showed that the frame data contained a NetWare Core Protocol (NCP) command (denoted with a C) referencing a specific file handle.

NCP uses file handles to reference files instead of using file names.

The NCP command in frame 1 requested the file server to allow the workstation to read 1,038 bytes offset 4,263 bytes into the file. This means that the workstation wanted the file server to read 4,263 bytes into the frame, and send the next 1,038 bytes.

Frame 2 showed the response from the file server:

 Frame Source Address    Dest. Address      Size Delta Time Summary
     2 1005FD8.1         1005F99.0004AC766  1096 0.002.231  NCP: R OK, 1038 bytes read          

The NCP response (note the R) was the 1,038 bytes of the file transferred to the workstation.

This process continued until seven frames had been transferred for a total of 7,630 bytes. This process then immediately repeats, reading the same bytes from the same file at the same offset. The Sniffer Expert system makes a note of this, marking a File Retransmission when the process begins again in frame 15.

This process continued for a random amount of time. In some traces, this process repeated for a few minutes. Other traces showed the process repeat for over twenty minutes.

One feature of this issue was unique, because it always occurred during the lunchtime hours. After identifying the location of the workstation, we waited until lunch to see the results. Just after noon, the Sniffer showed the network utilization increase to over fifteen percent, so we walked over to the workstation.

Before arriving, we could tell the user wasn?t at their desk (obviously having lunch). We hit a key on the keyboard to clear the screen saver and looked through the processes running on the workstation. No process running on the workstation could have caused this problem! We were stumped.

We returned to the Sniffer, and saw that the network utilization was back to normal. Fifteen minutes later, the problem returned. This time, one person stayed with the Sniffer while another looked at the workstation. We hit a key on the keyboard to stop the screen saver, and the utilization dropped to normal! We were finally able to correlate the network utilization with the workstation?s screen saver.

Looking further, we saw that the screen saver wasn?t a normal Windows-installed screen saver, but a screen-saver package from a third party. More investigation showed that the program was installed onto the user?s private directory on the file server instead of the local hard drive! Each time the user left their workstation for more than fifteen minutes, the screen-saver automatically started and the network utilization increased.

The network administrator removed the software, and reprimanded the user for installing unauthorized third-party software onto the network. The last time we saw the network, it was humming along at peak efficiency.

The painful lessons:

* A network baseline can be a very important method of researching network efficiency.

* Check the right's for user directories on the network. If possible, make sure the user doesn't have execute rights from their private network directory.

* When network problems occur, order lunch in.

Posted by james_messer at April 14, 2001 06:39 PM



Comments
Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?