April 16, 2001
Redundancy from a Network Perspective
They goal of most IT organizations is to create a networking environment that provides the maximum amount of uptime. Many environments strive for a goal of 'five nines,' or 99.999% uptime over a measured period. Over the period of a year, this calculates to only five minutes and fifteen seconds total downtime!
To address such a high requirement of uptime, many servers and devices provide levels of redundancy that help provide this additional availability. These methods can take many forms, such as duplicate storage methods, alternate network paths, redundant network interface cards, duplicated routers and switches, or clusters of servers that can pick-up the load should a piece of the cluster fail.
On the surface, these products quietly provide redundancy while we go about our normal network tasks. Because of this plug-and-play redundancy, I was caught unprepared when I saw some unusual traffic on my network analyzer:
Source Dest. Size Rel. Time Summary 0008C7C59DA7 Broadcast 60 0:00:00.000 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DA8 Broadcast 60 0:00:00.000 LOOP: UNKNOWN CODE TYPE = 0 0008C707AA58 Broadcast 60 0:00:00.081 LOOP: UNKNOWN CODE TYPE = 0 0008C7D21E4A Broadcast 60 0:00:00.123 LOOP: UNKNOWN CODE TYPE = 0 0008C7EA468F Broadcast 60 0:00:00.201 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A041 Broadcast 60 0:00:00.307 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A042 Broadcast 60 0:00:00.307 LOOP: UNKNOWN CODE TYPE = 0 0008C7D9B14F Broadcast 60 0:00:00.336 LOOP: UNKNOWN CODE TYPE = 0 0008C7D9B758 Broadcast 60 0:00:00.340 LOOP: UNKNOWN CODE TYPE = 0 0008C7D9B88C Broadcast 60 0:00:00.571 LOOP: UNKNOWN CODE TYPE = 0 0008C72B5E5C Broadcast 60 0:00:00.588 LOOP: UNKNOWN CODE TYPE = 0 0008C72B5C78 Broadcast 60 0:00:00.588 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CFB Broadcast 60 0:00:00.697 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CFC Broadcast 60 0:00:00.697 LOOP: UNKNOWN CODE TYPE = 0 0008C707C5EA Broadcast 60 0:00:00.731 LOOP: UNKNOWN CODE TYPE = 0 0008C72B6904 Broadcast 60 0:00:00.804 LOOP: UNKNOWN CODE TYPE = 0 0008C72B5CD2 Broadcast 60 0:00:00.804 LOOP: UNKNOWN CODE TYPE = 0 0008C707C80C Broadcast 60 0:00:00.830 LOOP: UNKNOWN CODE TYPE = 0 0008C7735C9A Broadcast 60 0:00:00.846 LOOP: UNKNOWN CODE TYPE = 0 0008C7089835 Broadcast 60 0:00:00.909 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DA7 Broadcast 60 0:00:00.988 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DA8 Broadcast 60 0:00:00.988 LOOP: UNKNOWN CODE TYPE = 0 0008C708C275 Broadcast 60 0:00:01.061 LOOP: UNKNOWN CODE TYPE = 0 0008C72B5D04 Broadcast 60 0:00:01.061 LOOP: UNKNOWN CODE TYPE = 0 0008C7E9483D Broadcast 60 0:00:01.257 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A041 Broadcast 60 0:00:01.295 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A042 Broadcast 60 0:00:01.295 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CB1 Broadcast 60 0:00:01.319 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CB2 Broadcast 60 0:00:01.319 LOOP: UNKNOWN CODE TYPE = 0 0008C7EAA028 Broadcast 60 0:00:01.344 LOOP: UNKNOWN CODE TYPE = 0 0008C7EA4A74 Broadcast 60 0:00:01.345 LOOP: UNKNOWN CODE TYPE = 0 0008C7D9B82A Broadcast 60 0:00:01.351 LOOP: UNKNOWN CODE TYPE = 0 0008C707C819 Broadcast 60 0:00:01.399 LOOP: UNKNOWN CODE TYPE = 0 0008C7D9B880 Broadcast 60 0:00:01.432 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A3CB Broadcast 60 0:00:01.687 LOOP: UNKNOWN CODE TYPE = 0 0008C7C5A3CC Broadcast 60 0:00:01.687 LOOP: UNKNOWN CODE TYPE = 0 0008C707C2A8 Broadcast 60 0:00:01.699 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CFB Broadcast 60 0:00:01.699 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59CFC Broadcast 60 0:00:01.699 LOOP: UNKNOWN CODE TYPE = 0 0008C7E95D27 Broadcast 60 0:00:01.805 LOOP: UNKNOWN CODE TYPE = 0 0008C7E99AAF Broadcast 60 0:00:01.825 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DAD Broadcast 60 0:00:01.991 LOOP: UNKNOWN CODE TYPE = 0
Since I was using an older network analyzer, it didn't automatically recognize the Organizationally Unique Identifier (OUI) of the address, which is defined as the first three bytes of an Ethernet MAC address. A quick trip to the 'IEEE OUI and Company_id Assignments' page was very helpful:
http://standards.ieee.org/regauth/oui/index.shtml
The OUI was assigned to Compaq Computer Corporation, which meant that many of Compaq NICs were sending these loop packets! I knew that we had quite a few Compaq NIC cards, and in just two seconds I had found approximately 50 of these loopback frames. My backbone was full of these broadcasts!
If I filtered to an individual MAC address, I found more information:
Source Dest. Size Rel. Time Summary 0008C7C59DA7 Broadcast 60 0:00:00.000 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DA7 Broadcast 60 0:00:00.988 LOOP: UNKNOWN CODE TYPE = 0 0008C7C59DA7 Broadcast 60 0:00:01.991 LOOP: UNKNOWN CODE TYPE = 0
The loopback frame was occurring approximately every second, which was indicative of a 'heartbeat' or some other type of regularly occurring circumstance. These frames weren't randomly sent by these Compaq NICs.
The next logical step was a visit to the Compaq web page, where I found a white paper that provided the information I needed to put this issue into perspective:
Compaq Advanced Network Error Correction Support using
PCI Hot Plug with Microsoft Windows NT
ftp://ftp.compaq.com/pub/supportinformation/papers/ecg0570897.pdf
This white paper confirmed my suspicions - the Compaq NICs were sending a 'heartbeat' as part of their network fault tolerance. Unfortunately, this heartbeat was an 'all ones' broadcast across our switched network! Based on the information contained in Compaq's white paper, we were able to make changes to our infrastructure that moved these broadcasts from the backbone to a private 'redundancy' network.
After this issue occurred, I thought about the situations where availability and redundancy was associated with the network. An obvious example is Spanning Tree, where Bridge Protocol Data Units (BPDUs) are sent every couple of seconds (by default) as a 'heartbeat' between LAN bridges.
http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/
rel_5_2/config/spantree.htm
Another might be Cisco's Hot Standby Router Protocol, where routers communicate between each other to provide network redundancy:
http://www.cisco.com/warp/public/619/index.shtml
I've also seen Microsoft NT Servers a network 'heartbeat' to obtain uptime verification from file servers:
Source Dest. Delta Time Summary 020100000000 Broadcast 0.000.000 Ethertype=886F (Unknown) 020100000000 Broadcast 0.494.338 Ethertype=886F (Unknown) 020100000000 Broadcast 0.400.089 Ethertype=886F (Unknown) 020100000000 Broadcast 0.601.940 Ethertype=886F (Unknown) 020100000000 Broadcast 0.407.756 Ethertype=886F (Unknown) 020100000000 Broadcast 0.500.744 Ethertype=886F (Unknown) 020100000000 Broadcast 0.499.443 Ethertype=886F (Unknown) 020100000000 Broadcast 0.601.962 Ethertype=886F (Unknown) 020100000000 Broadcast 0.497.765 Ethertype=886F (Unknown) 020100000000 Broadcast 0.500.004 Ethertype=886F (Unknown) 020100000000 Broadcast 0.400.685 Ethertype=886F (Unknown) 020100000000 Broadcast 0.499.306 Ethertype=886F (Unknown) 020100000000 Broadcast 0.501.121 Ethertype=886F (Unknown)
This issue was a little more difficult to research because of the EtherType of 886F in the frame. A visit to the IEEE EtherType Field Registration Authority determined that Microsoft was the owner of the 886F EtherType.
http://standards.ieee.org/regauth/ethertype/index.html
Additional searches on Microsoft's web site found many documents on redundancy and load balancing. Here's one of them:
Microsoft - Network Load Balancing Technical Overview
http://www.microsoft.com/TechNet/win2000/nlbovw.asp
There are many other examples of server-based redundancy affecting network traffic, and many more methods will be created as system availability increases in importance and as our technology changes. Are you aware of the 'heartbeats' traversing your network? A simple network analysis trace may uncover network traffic that might surprise you!
Posted by james_messer at April 16, 2001 11:14 PM
Thanks for signing in, . Now you can comment. (sign out)
(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)
