March 18, 2001

Network Uptime - March 18, 2001

===================== Network Uptime =====================

The Resource for Network Management and Protocol Analysis Professionals
A Newsletter of http://www.NetworkUptime.com
Issue 02 00 00 01 00 03 01 08
March 18, 2001
ISSN: 1529-6938

This Issue:

* Starting Delimiter - Is Protocol Analysis Too Difficult?
* Surf Report - Optimizing Windows Network Traffic
* Network Uptime Case Study - How NOT to Troubleshoot a Network
* Network Uptime Tip-Of-The-Month - Finding CRC Errors in a TCP Header
* Ending Delimiter

*** *** *** *** *** *** *** *** *** *** *** *** *** *** ***


====
Starting Delimiter
- Is Protocol Analysis too Difficult? -
====

I've read a couple of posts on the Internet over the last few weeks that have piqued my interest in the thoughts of network-analysis-minded folks, and the people who would like to become knowledgeable in network management and network analysis. The messages I read were posted in the comp.dcom.net-analysis Usenet newsgroup, which isn't a very busy newsgroup. The few messages that pass through the newsgroup, however, are usually quite interesting.

The thread of messages I read were initiated by a networking neophyte who asked one of the usual 'How do I get started in protocol analysis' messages. A few of the responses to this initial email were of the opinion that network analysis was an increasingly difficult science, and the people who are capable of providing such a complex analysis of network traffic were a dying breed.

In my short career associated with network analysis, I've always found the technical requirements surrounding network analysis to be challenging, stimulating, and a great learning experience. I can't remember ever finding a network problem and simply giving up on the solution because I wasn't knowledgeable enough to continue. There are too many resources available in print and on the web to simply cease my troubleshooting efforts because it was 'too difficult.'

However, my experiences must have been completely different than the writers on the newsgroup. They suggest that network analysis is a very complex and mysterious science in which only the true alchemists can dabble. Indeed, there are occasionally those network problems that can haunt you for weeks, but even those nasty ongoing problems are eventually conquered. In fact, I would say that most of the problems that I solved using network analysis have been the slap-your-forehead-I-can't-believe-how-simple-this-problem-was solutions. Perhaps that's just my slanted retrospective, and not a true representation of the problem as it was occurring.

One of Laura Chappell's 'Ten Truths of Network Troubleshooting' states that network troubleshooting is a combination of good people skills and good deductive reasoning skills. Perhaps it's this combination of skills that is often our biggest enemy when fighting a network problem.

(Make sure you read Laura's '10 Truths of Network Troubleshooting' at her web site: http://www.packet-level.com)

As an example of our difficulties surrounding network analysis, the Case Study in this issue is a report of what we did wrong during a network troubleshooting session instead of a study of how everything was handled perfectly. Hopefully, you can learn from our careless mistakes.

If you have comments about the complexity of network troubleshooting, we'd love to hear them. Email me, and I'll include the most interesting comments in our next newsletter (with the proper level of anonymity, of course).

I'd also like to apologize in the delay of this month's newsletter. Our third child, James III, was born just before putting this month's Network Uptime to bed. It's taken a few extra days to get this issue in one piece, and I am again remembering what it's like to live my daily life with sleep deprivation. It's a good feeling.

- James 'Daddy**3' Messer
Editor, Network Uptime
James@NetworkUptime.com

*** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

====
Network Uptime Surf Report
- Optimizing Windows Network Traffic -
====

This month's surf report references Microsoft TechNet documents that provide an excellent overview of Windows 98 and Windows NT-based traffic optimization. Although the articles have an obvious slant towards Microsoft's Network Monitor, the basic information on DHCP conversations, WINS registration and resolutions, and other Windows-based protocols are invaluable.

http://www.microsoft.com/technet/winnt/winntas/tips/net405ef.asp

http://www.microsoft.com/TechNet/network/networka.asp

*** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

====
Network Uptime Case Study
- How NOT to Troubleshoot a Network -
James Messer, Network Uptime
James@NetworkUptime.com
====

I was recently asked to assist in providing an overall `health check' of an existing network for a client-of-a-client. This was presented as a great opportunity to meet a new group of network people, as well as provide some basic network information about the client's network.

Because this was done as a third-party engagement, I wasn't able to communicate directly with the client's network group prior to my arrival. All of the information I received about this job came directly from the middleman.

** Tip #1: Don't take the word of a middleman, no matter how nice they are. Most of the time, the middleman isn't given the entire story, and the middleman isn't usually interested in getting the entire story; that's why you're there. Often, a `simple' network engagement really means `fix all problems on our network.' This particular situation was a very casual engagement provided as a favor. If this was a formal engagement, there is the usual paperwork that sets the client's expectations. **

I've done these `favors' before, and I was prepared for a very different story once I arrived on-site. At least I took my own advice in this regard, because I unfortunately dismissed many other network troubleshooting fundamentals once I arrived.

The client was interested in basic network health statistics, but as we unpacked our equipment they stated that there was `this one network problem that perhaps you could look at?' Now that the other shoe had dropped, I said that I'd do whatever I could to help, and I'd offer any suggestions that might become apparent.

The client's problem was the universal network ailment, `the network is slow.' A user in the accounting department had been using an application for a number of months, but the application has become exponentially slower and slower. A call to the software manufacturer was met with a common theme - it must be the network! The network team was against the wall, and had to prove that the network wasn't involved, and also where the problem was actually occurring.

This problem sounded easy enough, so we mapped out our game plan. We'd place the Sniffer in the closet containing the user's workstation connection, and we would use the traffic redirection capabilities of the switch to provide the stream of network packets for the Sniffer. Once the initial trace was made, we'd determine the logical next step for analysis. Sounds easy, doesn't it?

** Tip #2: Always have a plan B. In this client's environment, the station wiring was done by one team, the access switches were installed by another team, and a third entity was responsible for the physical security of the data closet. We had to communicate with three separate entities to gain access to the closet and begin our analysis task. **

In this case, everything that could have gone wrong did go very wrong. After some dead time as we were stood around the hallway while the client went to find the key to the data closet, we were provided access to the closet. We wrote down the user's workstation jack number, and searched through the usual maze of wires until we found the correct switch port containing the user. Unfortunately, there were no available ports on the switch for the Sniffer, so we disconnected an unused port to use for our Sniffer connection.

As I searched through cables, the client's switch expert was having problems logging into the switch. A combination of laptop serial port inconsistencies, multiple access passwords to the companies switches, and a bad hub created a series of delays that found us one hour later without any further progress. We weren't able to log into the switch to redirect the network traffic, we found that the cable that we mapped back to the user's workstation really wasn't the user, and the hub that we brought as a backup didn't work properly either.

** Tip #3: Make sure your tool kit has the proper tools, and that they work properly! It's not helpful to have a laptop to use for portable terminal access if the serial port is flaky. Hubs are only helpful for network taps if the proper cables are included, and they are in optimal condition. Coordination of an enterprise's switch configurations can be time-consuming on the front-end, but it will reap many benefits when problems are occurring. **

We punted, and decided to get a new hub and connect it in the user's office. This connectivity solution was not as professional as a closet link, but at least we had access to the data as it traversed the network, and we knew that this data definitely belonged to the user!

We started a capture session, and asked the user to move through his usual slow-down procedures. Sure enough, we immediately had slowdowns! After over two hours of fumbling around the network, we had a small level of success.

The resulting protocol decode showed that a request was sent from the client's workstation, and the workstation waited about 45 seconds to receive a response to the initial request. From this information, it was obvious that we needed to investigate the server's role in this slowdown or the workstation configuration.

We didn't have any information about the inner-workings of this application, and we didn't completely understand the relationship of the client to the server, or any of the back-end processes related to this application. If this study were done in different circumstances, we'd probably consider getting a representative from the development team involved to provide us with an overview of the application's basic structure.

We needed to view the application's data movement from the relation of the client and the server simultaneously. Although we had the equipment on-site, we didn't initially set everything up for the first trace.

** Tip #4: Smoke `em if you've got `em! We dragged all of this equipment on-site, and didn't even use it for our initial trace. Many network problems are intermittent, and there was no guarantee that this problem would be reproducible during our engagement. If you're going to make a trace, make sure all of your equipment is recording at all times! **

Fortunately, we were able to connect our Gigabit Sniffer Pro to a central backbone switch and redirect all traffic to-and-from the central database server to the Sniffer Pro analyzer. Since we were now bringing all of our guns to bear on this problem, we also started server-based traces with Microsoft's Performance Monitor application.

We know that this was a judgment call. By starting PerfMon recordings on the server, we could possibly change the results of our troubleshooting by changing the operation of the database server. In retrospect, it may have been better to perform another end-to-end analysis of the slowdown before enabling PerfMon and running the test again. At the time, however, we weren't really sure how reproducible the problem would be, and we wanted to make sure we got everything on this final run.

We ran the trace again, and fortunately saw the slow-down symptom appear again (sometimes it _IS_ better to be lucky than good). This trace verified that the server received the request, and the request bounced around for 45 seconds inside of the database server before sending the reply. Because we were capturing all traffic into the server and out of the server, we knew that no additional processes were externally affecting our application slowdown.

In the end, the client was pleased with our results. Although we didn't solve the application slowdown, we were able to definitively provide an explanation for the slowdown and suggest some further studies to help resolve the problem. More importantly for the client, they had proof that their network was running at top speed, and the heat was off the network team.

We'd probably work though this problem a bit differently next time, but that's why it's called a learning process. Perfection is a great goal, but where's the fun in that?

*** *** *** *** *** *** *** *** *** *** *** *** *** *** ***


====
Network Uptime Tip-Of-The-Month
- Finding CRC Errors in a TCP Header -
====

If you've ever used a Network Associate's Sniffer Pro to locate a CRC error in a TCP header, you've realized that there wasn't an obvious method to pull those frames out of a decode. There are no predefined filters in Sniffer Pro that will quickly group the TCP-based CRC errors, but there are search functions that can help you locate the frames.

Sniffer Pro includes a 'Find' function on the Display pull-down menu. To use the Find capability of Sniffer Pro, you must be displaying a decode in Sniffer Pro.

* To use the Find command, load your trace file and select the 'Decode' tab.

* From the 'Display' pull down menu, choose 'Find Frame'

* In the 'Find Frame' dialog box, select the 'Text' tab

* In the search field type "(should be". This is without the quotes, and it _is_ with the preceding open parenthesis - there are no close parenthesis. This is how it should look in the search field:

(should be

* Choose the radio button for 'Detail Text', and start searching!

We don't use the term 'should be' without the parenthesis because there are some decode strings that use that series of words to describe information. By using the open parenthesis, you are assured that the search will only stop on the text that describes the incorrect CRC.

This search capability can also be used for any text that Sniffer Pro might add to the decode's Detail screens. Search away!


*** *** *** *** *** *** *** *** *** *** *** *** *** *** ***


====
Ending Delimiter
====

If you're reading a forwarded copy of Network Uptime, sign up for your own FREE subscription:

http://www.NetworkUptime.com/newsletter/


Promote Network Uptime! Add Network Uptime graphics and banners to your web page:

http://www.NetworkUptime.com/graphics


To unsubscribe from Network Uptime, use the above URL, or email Majordomo@NetworkUptime.com with the following text in the body of the message:

unsubscribe NetworkUptime


For questions or comments, email us at James@NetworkUptime.com or visit the Network Uptime web page at http://www.NetworkUptime.com!

==== End of Network Uptime ISSN: 1529-6938 Issue 02 00 00 01 00 03 01 08 (c)2001, NetworkUptime.com, Inc. http://www.NetworkUptime.com ====

Posted by james_messer at March 18, 2001 02:38 PM