April 16, 2001

Redundancy from a Network Perspective

They goal of most IT organizations is to create a networking environment that provides the maximum amount of uptime. Many environments strive for a goal of 'five nines,' or 99.999% uptime over a measured period. Over the period of a year, this calculates to only five minutes and fifteen seconds total downtime!

To address such a high requirement of uptime, many servers and devices provide levels of redundancy that help provide this additional availability. These methods can take many forms, such as duplicate storage methods, alternate network paths, redundant network interface cards, duplicated routers and switches, or clusters of servers that can pick-up the load should a piece of the cluster fail.

On the surface, these products quietly provide redundancy while we go about our normal network tasks. Because of this plug-and-play redundancy, I was caught unprepared when I saw some unusual traffic on my network analyzer:

Source        Dest.      Size   Rel. Time  Summary
0008C7C59DA7  Broadcast    60 0:00:00.000  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DA8  Broadcast    60 0:00:00.000  LOOP: UNKNOWN CODE TYPE = 0
0008C707AA58  Broadcast    60 0:00:00.081  LOOP: UNKNOWN CODE TYPE = 0
0008C7D21E4A  Broadcast    60 0:00:00.123  LOOP: UNKNOWN CODE TYPE = 0
0008C7EA468F  Broadcast    60 0:00:00.201  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A041  Broadcast    60 0:00:00.307  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A042  Broadcast    60 0:00:00.307  LOOP: UNKNOWN CODE TYPE = 0
0008C7D9B14F  Broadcast    60 0:00:00.336  LOOP: UNKNOWN CODE TYPE = 0
0008C7D9B758  Broadcast    60 0:00:00.340  LOOP: UNKNOWN CODE TYPE = 0
0008C7D9B88C  Broadcast    60 0:00:00.571  LOOP: UNKNOWN CODE TYPE = 0
0008C72B5E5C  Broadcast    60 0:00:00.588  LOOP: UNKNOWN CODE TYPE = 0
0008C72B5C78  Broadcast    60 0:00:00.588  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CFB  Broadcast    60 0:00:00.697  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CFC  Broadcast    60 0:00:00.697  LOOP: UNKNOWN CODE TYPE = 0
0008C707C5EA  Broadcast    60 0:00:00.731  LOOP: UNKNOWN CODE TYPE = 0
0008C72B6904  Broadcast    60 0:00:00.804  LOOP: UNKNOWN CODE TYPE = 0
0008C72B5CD2  Broadcast    60 0:00:00.804  LOOP: UNKNOWN CODE TYPE = 0
0008C707C80C  Broadcast    60 0:00:00.830  LOOP: UNKNOWN CODE TYPE = 0
0008C7735C9A  Broadcast    60 0:00:00.846  LOOP: UNKNOWN CODE TYPE = 0
0008C7089835  Broadcast    60 0:00:00.909  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DA7  Broadcast    60 0:00:00.988  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DA8  Broadcast    60 0:00:00.988  LOOP: UNKNOWN CODE TYPE = 0
0008C708C275  Broadcast    60 0:00:01.061  LOOP: UNKNOWN CODE TYPE = 0
0008C72B5D04  Broadcast    60 0:00:01.061  LOOP: UNKNOWN CODE TYPE = 0
0008C7E9483D  Broadcast    60 0:00:01.257  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A041  Broadcast    60 0:00:01.295  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A042  Broadcast    60 0:00:01.295  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CB1  Broadcast    60 0:00:01.319  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CB2  Broadcast    60 0:00:01.319  LOOP: UNKNOWN CODE TYPE = 0
0008C7EAA028  Broadcast    60 0:00:01.344  LOOP: UNKNOWN CODE TYPE = 0
0008C7EA4A74  Broadcast    60 0:00:01.345  LOOP: UNKNOWN CODE TYPE = 0
0008C7D9B82A  Broadcast    60 0:00:01.351  LOOP: UNKNOWN CODE TYPE = 0
0008C707C819  Broadcast    60 0:00:01.399  LOOP: UNKNOWN CODE TYPE = 0
0008C7D9B880  Broadcast    60 0:00:01.432  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A3CB  Broadcast    60 0:00:01.687  LOOP: UNKNOWN CODE TYPE = 0
0008C7C5A3CC  Broadcast    60 0:00:01.687  LOOP: UNKNOWN CODE TYPE = 0
0008C707C2A8  Broadcast    60 0:00:01.699  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CFB  Broadcast    60 0:00:01.699  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59CFC  Broadcast    60 0:00:01.699  LOOP: UNKNOWN CODE TYPE = 0
0008C7E95D27  Broadcast    60 0:00:01.805  LOOP: UNKNOWN CODE TYPE = 0
0008C7E99AAF  Broadcast    60 0:00:01.825  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DAD  Broadcast    60 0:00:01.991  LOOP: UNKNOWN CODE TYPE = 0


Since I was using an older network analyzer, it didn't automatically recognize the Organizationally Unique Identifier (OUI) of the address, which is defined as the first three bytes of an Ethernet MAC address. A quick trip to the 'IEEE OUI and Company_id Assignments' page was very helpful:

http://standards.ieee.org/regauth/oui/index.shtml

The OUI was assigned to Compaq Computer Corporation, which meant that many of Compaq NICs were sending these loop packets! I knew that we had quite a few Compaq NIC cards, and in just two seconds I had found approximately 50 of these loopback frames. My backbone was full of these broadcasts!

If I filtered to an individual MAC address, I found more information:

Source        Dest.      Size   Rel. Time  Summary
0008C7C59DA7  Broadcast    60 0:00:00.000  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DA7  Broadcast    60 0:00:00.988  LOOP: UNKNOWN CODE TYPE = 0
0008C7C59DA7  Broadcast    60 0:00:01.991  LOOP: UNKNOWN CODE TYPE = 0

The loopback frame was occurring approximately every second, which was indicative of a 'heartbeat' or some other type of regularly occurring circumstance. These frames weren't randomly sent by these Compaq NICs.
The next logical step was a visit to the Compaq web page, where I found a white paper that provided the information I needed to put this issue into perspective:

Compaq Advanced Network Error Correction Support using
PCI Hot Plug with Microsoft Windows NT
ftp://ftp.compaq.com/pub/supportinformation/papers/ecg0570897.pdf

This white paper confirmed my suspicions - the Compaq NICs were sending a 'heartbeat' as part of their network fault tolerance. Unfortunately, this heartbeat was an 'all ones' broadcast across our switched network! Based on the information contained in Compaq's white paper, we were able to make changes to our infrastructure that moved these broadcasts from the backbone to a private 'redundancy' network.


After this issue occurred, I thought about the situations where availability and redundancy was associated with the network. An obvious example is Spanning Tree, where Bridge Protocol Data Units (BPDUs) are sent every couple of seconds (by default) as a 'heartbeat' between LAN bridges.

http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/
rel_5_2/config/spantree.htm


Another might be Cisco's Hot Standby Router Protocol, where routers communicate between each other to provide network redundancy:

http://www.cisco.com/warp/public/619/index.shtml


I've also seen Microsoft NT Servers a network 'heartbeat' to obtain uptime verification from file servers:

Source        Dest.      Delta Time Summary
020100000000  Broadcast  0.000.000  Ethertype=886F (Unknown)
020100000000  Broadcast  0.494.338  Ethertype=886F (Unknown)
020100000000  Broadcast  0.400.089  Ethertype=886F (Unknown)
020100000000  Broadcast  0.601.940  Ethertype=886F (Unknown)
020100000000  Broadcast  0.407.756  Ethertype=886F (Unknown)
020100000000  Broadcast  0.500.744  Ethertype=886F (Unknown)
020100000000  Broadcast  0.499.443  Ethertype=886F (Unknown)
020100000000  Broadcast  0.601.962  Ethertype=886F (Unknown)
020100000000  Broadcast  0.497.765  Ethertype=886F (Unknown)
020100000000  Broadcast  0.500.004  Ethertype=886F (Unknown)
020100000000  Broadcast  0.400.685  Ethertype=886F (Unknown)
020100000000  Broadcast  0.499.306  Ethertype=886F (Unknown)
020100000000  Broadcast  0.501.121  Ethertype=886F (Unknown)

This issue was a little more difficult to research because of the EtherType of 886F in the frame. A visit to the IEEE EtherType Field Registration Authority determined that Microsoft was the owner of the 886F EtherType.

http://standards.ieee.org/regauth/ethertype/index.html


Additional searches on Microsoft's web site found many documents on redundancy and load balancing. Here's one of them:

Microsoft - Network Load Balancing Technical Overview
http://www.microsoft.com/TechNet/win2000/nlbovw.asp


There are many other examples of server-based redundancy affecting network traffic, and many more methods will be created as system availability increases in importance and as our technology changes. Are you aware of the 'heartbeats' traversing your network? A simple network analysis trace may uncover network traffic that might surprise you!

Posted by james_messer at April 16, 2001 11:14 PM



Comments
Post a comment

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?