February 14, 2001
Complex Simplicity
We received a call from a small and growing company that was having intermittent problems with their network switch. This problem had everything a network analyst loved to hate; intermittent problems and hardware issues. We packed heavy for this visit. We carried our laptops, a Sniffer, RMON probes, and the appropriate cables necessary to connect to the management port of the LAN switch. We knew this was a small company, but we were intrigued by the company's network configuration. This client was reported to have about 50-60 workstations, Novell Netware, Windows NT, Ethernet, and token ring topologies. Why would such a small company have such a mish-mosh of networking technologies? The grim foreboding of this visit hung in the air.
Just to make sure we believed what we were reading, we gave a call to the client. Sure enough, their network was full of these technologies, plus a few more servers and devices. We couldn't wait to see this network, if only to say that we'd seen such a thing!
When we arrived, we got the details from the company's server administrator. He helped explain the network's legacy and complexity. The company was small and was the creation of two companies that had merged. One company had Ethernet, the other had token ring. One company had Windows NT, the other company ran Novell NetWare. Like many small companies, there was a plan in place for conversion to Ethernet and Windows NT, but the current interim stage was a necessary evil.
Because of these topology requirements, a chassis-based OSI Layer 2 LAN switch had been placed into the center of the network. One slot on the switch contained an Ethernet switching card and the other slot contained a token ring switching card. Since important servers and devices existed on both sides of this split network, this LAN switch provided the translational bridging necessary to communicate from one side of the network to the other. Since an Ethernet hub and token ring MAU connected the users on each side of this network, a single port was used on each slot of the switch to provide this important connectivity.

The big problem revolved around this switch. When everyone went home for the day, the network was running without problem. However, each morning the network connection between the token ring and Ethernet networks was inoperable. The company had zero in the way of network management and analysis tools, so they took the easy way out and rebooted the switch each morning. Although this wasn't optimal (this company had clients who required dial-in access and connectivity to the entire network during the evening), it wasn't causing enough trouble to call in some network help. Unfortunately, the problem was beginning to occur during the day and the downtime was affecting enough upper-level managers to call the Network Uptime team.
We were taken to the network room, and found the network switch carefully placed in one of their 19" equipment racks. The room was in perfect condition, with not a cable out of place. We're always finding large corporate infrastructures where the cable plant was best described as 'cable-gami', but this client was meticulous in their network cabling runs and equipment installations.
This situation was also compounded by the switch's history. Although the chassis-based switch had one company's name emblazoned upon it, we were told that the switch company was recently acquired by a large PC manufacturer. We were unsure if the switch was still supported by the manufacturer, but we decided to jump in with both feet. We felt sure that this fix wasn't going to be an easy one.
The file server administrator told us that the switch's token ring port seemed to be the offending party. Each morning, the light on the token ring port was out, yet the Ethernet port always stayed lit. From this piece of information, we began to consider that some unknown token ring error during the evening was causing the switch to remove itself from the ring, causing the problems each morning.
We added RMON probes to both the Ethernet and token ring networks, and added a Sniffer to the token ring side of the network. With this short and long-term network visibility, we felt sure to catch any network hiccup that might occur. So, we waited. And waited. And waited a little more. We had gotten to the network just after five o'clock, so things were pretty quiet on both sides of the network. The Sniffer showed few problems on the token ring side, and we saw nothing that might cause a cataclysmic failure. Since most folks turned their workstations off before leaving, the only devices still active on the network were those in the network equipment room.
Just before we decided to call it quits for the evening and let the RMON probes work through the night, my associate called me around to the other side of the equipment racks. On the wall behind the equipment racks, the network punch-down blocks were mounted in a perfect line. As I gave my associate a puzzling look, he motioned me closer to the blocks.
Just like the rest of the room, the punch-down blocks were mounted and cabled with perfection. Each cable was attached and gathered into a group with precision. Whoever did this cabling did exceptionally neat work. However, every cable was punched down improperly and completely out of specification!
We shook our heads and looked again. Every cable had eight inches of the outer jacket removed, exposing the four pair of twisted wires. Each of the twisted pairs were un-twisted, straightened into a small bunch, and folded over itself. This small bundle of folded wires was then secured together with a couple of wire ties, and the ends were punched into the voice-grade punch-down blocks.
Suddenly, we were unconcerned with the switch problems and other assorted network infrastructure issues. We had found our problem, and it was under our noses the entire time. Before we could begin any detailed analysis of the network issues, we had to qualify the foundation of the network; the cabling plant.
We returned the next day with some high-end category 5 cable testers. A quick glance of the Sniffer showed plenty of physical-layer issues, now that every user was on the network and communicating. We ran some cable tests, and found nearly every connection out of specification for a 10 megabit Ethernet network, and fewer still passed the 16 megabit token ring requirements.
We concluded that the tape backups which started during the early morning hours may have been the process that created enough physical layer issues to remove the switch from the token ring network. We gave the client our detailed cabling analysis, and provided recommendations for cabling equipment and procedures. Once the client changed the wiring plant, their intermittent network problems disappeared.
Morals of the story:
* Sometimes the problems that seem the most difficult have a very simple cause.
* Examine all portions of the network infrastructure before bombarding the network with testing and analysis equipment. Your eyes can sometimes be your best test equipment.
* Make sure your arsenal of network analysis equipment contains a cabling and fiber tester. It's impossible to quantify a physical network issue without the proper equipment.
Posted by james_messer at February 14, 2001 06:50 PM
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)
