According to Abraham Lincoln:
“If we could first know where we are and whither we are tending, we could better judge what to do and how to do it.”
An important step in top-down network design is to examine a customer’s existing net- work to better judge how to meet expectations for network scalability, performance, and availability. Examining the existing network includes learning about the topology and physical structure and assessing the network’s performance.
By developing an understanding of the existing network’s structure, uses, and behavior, you can determine whether a customer’s design goals are realistic. You can document any bottlenecks or network performance problems, and identify internetworking devices and links that will need to be replaced because the number of ports or capacity is insufficient for the new design. Identifying performance problems can help you select solutions to solve problems and develop a baseline for future measurements of performance.
Most network designers do not design networks from scratch. Instead, they design enhancements to existing networks. Developing a successful network design requires that you develop skills in characterizing an incumbent network to ensure interoperability between the existing and anticipated networks. This chapter describes techniques and tools to help you develop those skills. This chapter concludes with a Network Health checklist that documents typical thresholds for diagnosing a network as “healthy.”
Characterizing the Network Infrastructure
Characterizing the infrastructure of a network means developing a set of network maps and learning the location of major internetworking devices and network segments. It also includes documenting the names and addresses of major devices and segments, and iden- tifying any standard methods for addressing and naming. Documenting the types and lengths of physical cabling and investigating architectural and environmental constraints are also important aspects of characterizing the network infrastructure. Architectural and
environmental constraints are becoming increasingly important in modern network designs that must accommodate wireless networking, which may not work if the signal is blocked by cement walls, for example.
Developing a Network Map
Learning the location of major hosts, interconnection devices, and network segments is a good way to start developing an understanding of traffic flow. Coupled with data on the performance characteristics of network segments, location information gives you insight into where users are concentrated and the level of traffic that a network design must support.
At this point in the network design process, your goal is to obtain a map (or set of maps) of the existing network. Some design customers might have maps for the new network design as well. If that is the case, you might be one step ahead, but be careful of any assumptions that are not based on your detailed analysis of business and technical requirements.
To develop a network drawing, you should invest in a good network-diagramming tool. Tools include IBM’s Tivoli products, WhatsUp Gold from Ipswitch, and LANsurveyor from SolarWinds. The Microsoft Visio Professional product is also highly recommended for network diagramming. For large enterprises and service providers, Visionael Corporation offers client/server network documentation products.
Note Tools that automatically diagram a network can be helpful, but the generated maps might require a lot of cleanup to make them useful.
Characterizing Large Internetworks
Developing a single network map might not be possible for large internetworks. There are many approaches to solving this problem, including simply developing many maps, one
for each location. Another approach is to apply a top-down method. Start with a map or set of maps that shows the following high-level information:
■ Geographical information, such as countries, states or provinces, cities, and campuses
■ WAN connections between countries, states, and cities
■ WAN and LAN connections between buildings and between campuses
For each campus network, you can develop more precise maps that show the following more detailed information:
■ Buildings and floors, and possibly rooms or cubicles
■ The location of major servers or server farms
■ The location of routers and switches
■ The location of firewalls, Network Address Translation (NAT) devices, intrusion detection systems (IDS), and intrusion prevention systems (IPS)
■ The location of mainframes
■ The location of major network-management stations
■ The location and reach of virtual LANs (VLAN)
■ Some indication of where workstations reside, although not necessarily the explicit location of each workstation
Another method for characterizing large, complex networks is to use a top-down approach that is influenced by the OSI reference model. First, develop a logical map that shows applications and services used by network users. This map can call out internal web, email, FTP, and print and file-sharing servers. It can also include external web, email, and FTP servers.
Note Be sure to show web caching servers on your network maps because they can affect traffic flow. Documenting the location of web caching servers will make it easier to troubleshoot any problems reaching web servers during the implementation and operation phases of the network design cycle.
Next develop a map that shows network services. This map might depict the location of security servers; for example, Terminal Access Controller Access Control System (TACACS) and Remote Authentication Dial-In User Service (RADIUS) servers. Other net- work services include Dynamic Host Configuration Protocol (DHCP), Domain Name System (DNS), and Simple Network Management Protocol (SNMP) and other manage- ment services. The location and reach of any virtual private networks (VPN) that connect corporate sites via a service provider’s WAN or the Internet can be depicted, including major VPN devices, such as VPN concentrators. Dial-in and dial-out servers can be
shown on this map as well.
You may also want to develop a map that depicts the Layer 3 topology of the internet- work. This map can leave out switches and hubs, but should depict routers, logical links between the routers, and high-level routing protocol configuration information (for exam- ple, the location of the desired designated router [DR] if Open Shortest Path First [OSPF] is being used).
Layer 3 drawings should also include router interface names in Cisco shorthand nomen- clature (such as s0/0) if Cisco routers are used. Other useful information includes Hot Standby Router Protocol (HSRP) router groupings, redistribution points between rout- ing protocols, and demarcation points where route filters occur. The Layer 3 drawing should also include the location and high-level configuration of firewalls and NAT, IDS, and IPS devices.
A map or set of maps that shows detailed information about data link layer links and devices is often extremely helpful. This map reveals LAN devices and interfaces
connected to public or private WANs. This map may hide the logical Layer 3 routing topology, which is shown in the previous map(s), but it should provide a good characteri- zation of the physical topology. A data link layer map includes the following information:
■ An indication of the data link layer technology for WANs and LANs (Frame Relay, Point-to-Point Protocol [PPP], VPN, 100-Mbps or 1000-Mbps Ethernet, and so on)
■ The name of the service provider for WANs
■ WAN circuit IDs
■ The location and high-level configuration information for LAN switches (for example, the location of the desired root bridge if the Spanning Tree Protocol [STP] is used)
■ The location and reach of any VLANs and VLAN Trunking Protocol (VTP) configu- rations
■ The location and high-level configuration of trunks between LAN switches
■ The location and high-level configuration of any Layer 2 firewalls
Characterizing the Logical Architecture
While documenting the network infrastructure, take a step back from the diagrams you develop and try to characterize the logical topology of the network and the physical components. The logical topology illustrates the architecture of the network, which can be hierarchical or flat, structured or unstructured, layered or not, and other possibilities. The logical topology also describes methods for connecting devices in a geometric shape (for example, a star, ring, bus, hub and spoke, or mesh).
When characterizing the logical topology, look for “ticking time bombs” or implementa- tions that might hinder scalability. Ticking time bombs include large Layer 2 STP domains that will take a long time to converge and overly complex or oversized networks that might lead to Enhanced Interior Gateway Routing Protocol (EIGRP) stuck-in-active (SIA) problems and other routing problems. If the customer has fully redundant network equip- ment and cabling but the servers are all single-homed (attached to a single switch), keep this in mind as you plan your redesign of the network. This could be another ticking time bomb that can be fixed with a redesign.
The logical topology can affect your ability to upgrade a network. For example, a flat topology does not scale as well as a hierarchical topology. A typical hierarchical topology that does scale is a core layer of high-end routers and switches that are optimized for availability and performance, a distribution layer of routers and switches that implement policies, and an access layer that connects users via hubs, switches, and other devices. Logical topologies are discussed in more detail in Chapter 5, “Designing a Network Topology.”
Figure 3-1 shows a high-level network diagram for an electronics manufacturing company. The drawing shows a physical topology, but it is not hard to step back and visualize that the logical topology is a hub-and-spoke shape with three layers. The core layer of the network is a Gigabit Ethernet network. The distribution layer includes routers and
switches, and Frame Relay and T1 links. The access layer is composed of 10-Mbps and
100-Mbps Ethernet networks. An Ethernet network hosts the company’s web server. As you can see from the figure, the network included some rather old design components. The company required design consultation to select new technologies and to meet new goals for high availability and security.
Medford
100-Mbps Ethernet
50 Users
Ashland
100-Mbps Ethernet
30 Users
Frame Relay CIR = 56 Kbps DLCI = 5
Frame Relay CIR = 56 Kbps DLCI = 4
Grants Pass
HQ
100-Mbps Ethernet
75 Users
T1
Grants Pass
HQ Gigbit Ethernet
FEP (Front End Processor)
IBM Mainframe
Web/FTP Server
Eugene
10-Mbps Ethernet
20 Users
T1 Internet
Figure 3-1 Network Diagram for an Electronics Manufacturing Company
Developing a Modular Block Diagram
In addition to developing a set of detailed maps, it is often helpful to draw a simplified block diagram of the network or parts of the network. The diagram can depict the major functions of the network in a modular fashion. Figure 3-2 shows a block, modularized network topology map that is based on the Cisco Enterprise Composite Network Model.
Enterprise Campus Enterprise
Service
Building
Access
Edge
Provider
Edge
Network Mangement
Building
Distribution
Campus
Backbone
Server Farm
Edge
Distribution
E-Commerce
Internet
Connectivity
VPN/Remote
Access
WAN
ISP B
ISP A
PSTN Frame
Relay/
ATM
Figure 3-2 Modularized Network Topology Example
Characterizing Network Addressing and Naming
Characterizing the logical infrastructure of a network involves documenting any strate- gies your customer has for network addressing and naming. Addressing and naming are discussed in greater detail in Part II of this book, “Logical Network Design.”
When drawing detailed network maps, include the names of major sites, routers, network segments, and servers. Also document any standard strategies your customer uses for naming network elements. For example, some customers name sites using airport codes (San Francisco = SFO, Oakland = OAK, and so on). You might find that a customer suf- fixes names with an alias that describes the type of device (for example, RTR for router). Some customers use a standard naming system, such as DNS, for IP networks, or
NetBIOS Windows Internet Naming Service (WINS) on Windows networks. In such cases, you should document the location of the DNS and WINS servers and relevant high-level configuration information.
You should also investigate the network layer addresses your customer uses. Your cus- tomer’s addressing scheme (or lack of any scheme) can influence your ability to adapt the network to new design goals. For example, your customer might use unregistered IP addresses that will need to be changed or translated before connecting to the Internet.
As another example, current IP subnet masking might limit the number of nodes in a
LAN or VLAN.
Your customer might have a goal of using route summarization, which is also called route aggregation or supernetting. Route summarization reduces routes in a routing table, routing-table update traffic, and overall router overhead. Route summarization also improves network stability and availability, because problems in one part of a network are less likely to affect the whole internetwork. Summarization is most effective when
address prefixes have been assigned in a consistent and contiguous manner, which is often not the case.
Your customer’s existing addressing scheme might affect the routing protocols you can select. Some routing protocols do not support classless addressing, variable-length subnet masking (VLSM), or discontiguous subnets. A discontiguous subnet is a subnet that is divided, as shown in Figure 3-3. Subnet 108 of network 10 is divided into two areas that are separated by network 192.168.49.0.
Area 0
Network
192.168.49.0
Area 1
Subnets 10.108.16.0-
10.108.31.0
Area 2
Subnets 10.108.32.0-
10.108.47.0
Figure 3-3 Example of a Discontiguous Subnet
Characterizing Wiring and Media
To help you meet scalability and availability goals for your new network design, it is important to understand the cabling design and wiring of the existing network. Documenting the existing cabling design can help you plan for enhancements and identi- fy any potential problems. If possible, you should document the types of cabling in use as well as cable distances. Distance information is useful when selecting data link layer technologies based on distance restrictions.
While exploring the cabling design, assess how well equipment and cables are labeled in the current network. The extent and accuracy of labeling will affect your ability to imple- ment and test enhancements to the network.
Your network diagram should document the connections between buildings. The diagram should include information on the number of pairs of wires and the type of wiring (or wireless technology) in use. The diagram should also indicate how far buildings are from one another. Distance information can help you select new cabling. For example, if you plan to upgrade from copper to fiber cabling, the distance between buildings can be
much longer.
Probably the wiring (or wireless technology) between buildings is one of the following:
■ Single-mode fiber
■ Multimode fiber
■ Shielded twisted-pair (STP) copper
■ Unshielded twisted-pair (UTP) copper
■ Coaxial cable
■ Microwave
■ Laser
■ Radio
■ Infrared
Within buildings, try to locate telecommunications wiring closets, cross-connect rooms, and any laboratories or computer rooms. If possible, determine the type of cabling that is installed between telecommunications closets and in work areas. (Some technologies,
such as 100BASE-TX Ethernet, require Category 5 or later cabling, so be sure to docu- ment the existence of any Category 3 cabling that needs to be replaced.) Gather informa- tion about both vertical and horizontal wiring. As shown in Figure 3-4, vertical wiring runs between floors. Horizontal wiring runs from telecommunications closets to wallplates in cubicles or offices. Work-area wiring runs from the wallplate to a worksta- tion in a cubicle or office.
Horizontal
Wiring
Work-Area
Wiring
Telecommunications
Wiring Closet
Wallplate
Vertical Wiring
(Building Backbone)
Main Crossconnect Room
(or Main Distribution Frame)
Intermediate Crossconnect Room
(or Intermediate Distribution Frame)
Building A—Headquarters Building B Campus Backbone
Figure 3-4 Example of Campus Network Wiring
In most buildings, the cabling from a telecommunications closet to a workstation is approx- imately 100 meters (about 300 feet), including the work-area wiring, which is usually just a few meters. If you have any indication that the cabling might be longer than 100 meters,
you should use a time-domain reflectometer (TDR) to verify your suspicions. (TDR func- tionality is included in most cable testers.) Many network designs are based on the assump- tion that workstations are no more than 100 meters from the telecommunications closet.
Ask the client for a copy of the copper or fiber certification tests that were completed when the cabling was first installed. Test results will help you learn the type of cabling that was installed, its certification, and the warranty period for the installation work. Many modern network designers do just that one step of verifying that the cable was tested and certified rather than going through a detailed analysis of the cabling infrastruc- ture. On the other hand, many network designers still focus on cabling because they have learned the hard way that meeting availability goals can be difficult when the cabling was not installed properly.
For each building, you can fill out the chart shown in Table 3-1. The data that you fill in depends on how much time you have to gather information and how important you think cabling details will be to your network design. If you do not have a lot of information, just put an X for each type of cabling present and document any assumptions (for exam- ple, an assumption that workstations are no more than 100 meters from the telecommuni- cations closet). If you have time to gather more details, include information on the length and number of pairs of cables. If you prefer, you can document building wiring informa- tion in a network diagram instead of in a table.
Table 3-1 Building Wiring
Building name
Location of telecommunications closets
Location of cross-connect rooms and demarcations to external networks
Logical wiring topology (structured, star, bus, ring, centralized, distributed, mesh, tree, or whatever fits)
Vertical Wiring
Coaxial Fiber STP Category 3 UTP Category 5 or 6 UTP Other
Vertical shaft 1
Vertical shaft 2
Vertical shaft n
Horizontal Wiring
Coaxial Fiber STP Category 3 UTP Category 5 or 6 UTP Other
Floor 1
Floor 2
Floor 3
Floor n
Work-Area Wiring
Coaxial Fiber STP Category 3 UTP Category 5 or 6 UTP Other
Floor 1
Floor 2
Floor 3
Floor n
Checking Architectural and Environmental Constraints
When investigating cabling, pay attention to such environmental issues as the possibility that cabling will run near creeks that could flood, railroad tracks or highways where traf- fic could jostle cables, or construction or manufacturing areas where heavy equipment or digging could break cables.
Be sure to determine if there are any legal right-of-way issues that must be dealt with before cabling can be put into place. For example, will cabling need to cross a public street? Will it be necessary to run cables through property owned by other companies? For line-of-sight technologies, such as laser or infrared, make sure there aren’t any obsta- cles blocking the line of sight.
Within buildings, pay attention to architectural issues that could affect the feasibility of implementing your network design. Make sure the following architectural elements are sufficient to support your design:
■ Air conditioning
■ Heating
■ Ventilation
■ Power
■ Protection from electromagnetic interference
■ Doors that can lock
■ Space for:
■ Cabling conduits
■ Patch panels
■ Equipment racks
■ Work areas for technicians installing and troubleshooting equipment
Note Keep in mind that cabling and power are highly influenced by human factors. Installing new cabling might require working with labor unions, for example. Maintaining the reliability of cabling might require monitoring the infamous backhoe operator or the janitor who knocks cables around. It’s also not unheard of for security guards to lean against a wall late at night and accidentally activate emergency power off (EPO) or dis- charge fire suppressant. To avoid problems, make sure EPO and fire suppressant buttons have safety covers and are out of the way.
Checking a Site for a Wireless Installation
A common goal for modern campus network designs is to install a wireless LAN (WLAN) based on IEEE 802.11 standards. An important aspect of inspecting the archi- tectural and environmental constraints of a site is determining the feasibility of using wireless transmission. The term wireless site survey is often used to describe the process of analyzing a site to see if it will be appropriate for wireless transmission.
In some ways, doing a wireless site survey is no different from checking an architecture for wired capabilities, where you might need to document obstructions or areas that have water leaks, for example. But in many ways, a wireless site survey is quite different from a wired site survey because the transmission isn’t going through guided wires; it’s being
sent in radio frequency (RF) waves through air. Learning RF transmission theory in depth requires a lot of time and a good background in physics. For complex RF designs and concerns, it often makes sense to hire an RF expert. To do a basic site survey, you might not need help, though.
A site survey starts with a draft WLAN design. Using a floor plan or blueprint for the
site, the designer decides on the initial placement of the wireless access points. An access point is a station that transmits and receives data for users of the WLAN. It usually
serves also as the point of interconnection between the WLAN and the wired Ethernet network. A network designer can decide where to place access points for initial testing based on some knowledge of where the users will be located, characteristics of the access points’ antennas, and the location of major obstructions.
The initial placement of an access point is based on an estimate of the signal loss that will occur between the access point and the users of the access point. The starting point for
an estimate depends on how much loss in power a signal would experience in the vacuum of space, without any obstructions or other interference. This is called the free space
path loss and is specified in decibels (dB). The estimate is tuned with an understanding
that the actual expected signal loss depends on the medium through which the signal will travel, which is undoubtedly not a vacuum. An RF signal traveling through objects of var- ious sorts can be affected by many different problems, including the following:
■ Reflection: Reflection causes the signal to bounce back on itself. The signal can inter- fere with itself in the air and affect the receiver’s capability to discriminate between the signal and noise in the environment. Reflection is caused by metal surfaces such
as steel girders, scaffolding, shelving units, steel pillars, and metal doors. As an exam- ple, implementing a WLAN across a parking lot can be tricky because of metal cars (sources of reflection) that come and go.
■ Absorption: Some of the electromagnetic energy of the signal can be absorbed by the material in objects through which it passes, resulting in a reduced signal level. Water has significant absorption properties, and objects such as trees or thick wooden structures can have a high water content. Implementing a WLAN in a cof- fee shop can be tricky if there are large canisters of liquid coffee. Coffee shop WLAN users have also noticed that people coming and going can affect the signal level. (On Star Trek, a nonhuman character once called a human “an ugly giant bag of mostly water”!)
■ Refraction: When an RF signal passes from a medium with one density into a medi- um with another density, the signal can be bent, much like light passing through a prism. The signal changes direction and might interfere with the nonrefracted signal. It can take a different path and encounter other, unexpected obstructions and arrive at recipients damaged or later than expected. As an example, a water tank not only introduces absorption, but also the difference in density between the atmosphere and the water can bend the RF signal.
■ Diffraction: Diffraction, which is similar to refraction, results when a region through which the RF signal can pass easily is adjacent to a region in which reflective obstruc- tions exist. Like refraction, the RF signal is bent around the edge of the diffractive re- gion and can then interfere with that part of the RF signal that is not bent.
The designers of 802.11 transmitting devices attempt to compensate for variable environ- mental factors that might cause reflection, absorption, refraction, or diffraction by boost- ing the power level above what would be required if free space path were the only con- sideration. The additional power added to a transmission is called the fade margin.
Performing a Wireless Site Survey
A site survey confirms signal propagation, strength, and accuracy in different locations. Many wireless network interface cards (NIC) ship with utilities that enable you to meas- ure signal strength. Cisco 802.11 NICs ship with the Cisco Aironet Client Utility (ACU), which is a graphical tool for configuring, monitoring, and managing the NIC and its wire- less environment. A site survey can be as simple as walking around with a wireless note- book computer and using the utility to measure signal strength.
Signal strength can also be determined with a protocol analyzer. The WildPackets
AiroPeek analyzer, for example, presents the signal strength for each frame received.
An access point typically sends a beacon frame every 100 milliseconds (ms). You can divide the area being surveyed into a grid, and then move your protocol analyzer from gridpoint to gridpoint and plot on a diagram the signal strength of the beacon frames.
When evaluating the various metrics that are provided by wireless utilities, be sure to measure frame corruption and not just signal strength. With a protocol analyzer, capture frames and check for cyclic redundancy check (CRC) errors. CRC errors are the result of corruption from environmental noise or collisions between frames.
You can also indirectly measure signal quality by determining if frames are being lost in transmission. If your protocol analyzer is capturing relatively close to an access point and a mobile client is pinging a server, through the access point, onto the wired Ethernet, you can determine whether ping packets are getting lost.
As part of your site survey, you can also look at acknowledgments (ACK) and frame retries after a missing ACK. With 802.11 WLANs, both the client and the access point send ACKs to each other. An ACK frame is one of six special frames called control frames. All directed traffic (frames addressed to any nonbroadcast, nonmulticast destina- tion) are positively acknowledged with an ACK. Clients and access points use ACKs to implement a retransmission mechanism not unlike the Ethernet retry process that occurs after a collision.
In a wired Ethernet, the transmitting station detects collisions through the rules of carrier sense multiple access with collision detection (CSMA/CD). 802.11 uses carrier sense mul- tiple access with collision avoidance (CSMA/CA) as the access method and does not depend on collision detection to operate. Instead, an ACK control frame is returned to a sender for each directed packet received. If a directed frame does not receive an ACK, the frame is retransmitted.
Wireless networking is covered again in later chapters, but remember to consider it early in your design planning. Using a wireless utility, such as the Cisco ACU, WildPackets OmniPeek, or NetStumbler, check signal strength and accuracy with potential access point placements to determine if the architecture of the physical site will be a problem. Performing a basic wireless site survey is an important part of the top-down network design process of checking for architectural and environmental constraints.
Checking the Health of the Existing Internetwork
Studying the performance of the existing internetwork gives you a baseline measurement from which to measure new network performance. Armed with measurements of the present internetwork, you can demonstrate to your customer how much better the new internetwork performs once your design is implemented.
Many of the network-performance goals discussed in Chapter 2, “Analyzing Technical
Goals and Tradeoffs,” are overall goals for an internetwork. Because the performance of
existing network segments will affect overall performance, you need to study the perform- ance of existing segments to determine how to meet overall network performance goals.
If an internetwork is too large to study all segments, you should analyze the segments that will interoperate the most with the new network design. Pay particular attention to backbone networks and networks that connect old and new areas.
In some cases, a customer’s goals might be at odds with improving network performance. The customer might want to reduce costs, for example, and not worry about perform- ance. In this case, you will be glad that you documented the original performance so that you can prove that the network was not optimized to start with and your new design has not made performance worse.
By analyzing existing networks, you can also recognize legacy systems that must be incorporated into the new design. Sometimes customers are not aware that older proto- cols are still running on their internetworks. By capturing network traffic with a protocol analyzer as part of your baseline analysis, you can identify which protocols are actually running on the network and not rely on customers’ beliefs.
Developing a Baseline of Network Performance
Developing an accurate baseline of a network’s performance is not an easy task. One chal- lenging aspect is selecting a time to do the analysis. It is important that you allocate a lot of time (multiple days) if you want the baseline to be accurate. If measurements are made over too short a timeframe, temporary errors appear more significant than they are.
In addition to allocating sufficient time for a baseline analysis, it is also important to find a typical time period to do the analysis. A baseline of normal performance should not include atypical problems caused by exceptionally large traffic loads. For example, at some companies, end-of-the-quarter sales processing puts an abnormal load on the net- work. In a retail environment, network traffic can increase fivefold around Christmas
time. Network traffic to a web server can unexpectedly increase tenfold if the website gets linked to other popular sites or listed in search engines.
In general, errors, packet/cell loss, and latency increase with load. To get a meaningful measurement of typical accuracy and delay, try to do your baseline analysis during peri- ods of normal traffic load. On the other hand, if your customer’s main goal is to improve performance during peak load, be sure to study performance during peak load. The deci- sion whether to measure normal performance, performance during peak load, or both, depends on the goals of the network design.
Some customers do not recognize the value of studying the existing network before designing and implementing enhancements. Your customer’s expectations for a speedy design proposal might make it difficult for you to take a step back and insist on time to develop a baseline of performance on the existing network. Also, your other job tasks and goals, especially if you are a sales engineer, might make it impractical to spend days developing a precise baseline.
The work you do before the baseline step in the top-down network design methodology can increase your efficiency in developing a baseline. A good understanding of your cus- tomer’s technical and business goals can help you decide how thorough to make your study. Your discussions with your customer on business goals can help you identify seg- ments that are important to study because they carry critical and/or backbone traffic. You can also ask your customer to help you identify typical segments from which you can extrapolate other segments.
Analyzing Network Availability
To document availability characteristics of the existing network, gather any statistics that the customer has on the mean time between failure (MTBF) and mean time to repair (MTTR) for the internetwork as a whole and major network segments. Compare these sta- tistics with information you have gathered on MTBF and MTTR goals, as discussed in Chapter 2. Does the customer expect your new design to increase MTBF and decrease MTTR? Are the customer’s goals realistic considering the current state of the network?
Talk to the network engineers and technicians about the root causes of the most recent and most disruptive periods of downtime. Assuming the role of a forensic investigator, try to get many sides to the story. Sometimes myths develop about what caused a net- work outage. (You can usually get a more accurate view of problem causes from engi- neers and technicians than from users and managers.)
You can use Table 3-2 to document availability characteristics of the current network.
Table 3-2 Availability Characteristics of the Current Network
MTBF MTTR Date and Duration of Last Major Downtime
Cause of Last Major Downtime
Fix for Last
Major Downtime
Enterprise
(as a whole) Segment 1
Segment 2
Segment 3
Segment n
Analyzing Network Utilization
Network utilization is a measurement of the amount of bandwidth that is in use during a specific time interval. Utilization is commonly specified as a percentage of capacity. If a network-monitoring tool says that network utilization on a Fast Ethernet segment is 70 percent, for example, this means that 70 percent of the 100-Mbps capacity is in use, aver- aged over a specified timeframe or window.
Different tools use different averaging windows for computing network utilization. Some tools let the user change the window. Using a long interval can be useful for reducing
the amount of statistical data that must be analyzed, but granularity is sacrificed. As Figure 3-5 shows, it can be informative (though tedious) to look at a chart that shows network utilization averaged every minute.
16:40:00
16:43:00
16:46:00
16:49:00
16:52:00
16:55:00
16:58:00
17:01:00
17:04:00
17:07:00
17:10:00
0 1 2 3 4 5 6 7
Utilization
Figure 3-5 Network Utilization in Minute Intervals
Figure 3-6 shows the same data averaged over 1-hour intervals. Note that the network was not very busy, so neither chart goes above 7 percent utilization. Note also that changing to a long interval can be misleading because peaks in traffic get averaged out (the detail is lost). In Figure 3-5, you can see that the network was relatively busy around 4:50 p.m.
You cannot see this in Figure 3-6, when the data was averaged on an hourly basis.
In general, you should record network utilization with sufficient granularity in time to see short-term peaks in network traffic so that you can accurately assess the capacity requirements of devices and segments. Changing the interval to a small amount of time, say a fraction of a second, can be misleading also, however. To understand the concern, consider a small time interval. In a packet-sized window, at a time when a station is send- ing traffic, the utilization is 100 percent, which is what is wanted.
The size of the averaging window for network utilization measurements depends on your goals. When troubleshooting network problems, keep the interval small, either minutes or seconds. A small interval helps you recognize peaks caused by problems such as broadcast
storms or stations retransmitting quickly due to a misconfigured timer. For performance analysis and baselining purposes, use an interval of 1 to 5 minutes. For long-term load analysis, to determine peak hours, days, or months, set the interval to 10 minutes.
13:00:00
14:00:00
15:00:00
16:00:00
17:00:00
0 0.5
1 1.5 2
2.5
3 3.5 4
4.5
Utilization
Figure 3-6 Network Utilization in Hour Intervals
When developing a baseline, it is usually a good idea to err on the side of gathering too much data. You can always summarize the data later. When characterizing network uti- lization, use protocol analyzers or other monitoring tools to measure utilization in 1- to
5-minute intervals on each major network segment. If practical, leave the monitoring tools running for at least 1 or 2 typical days. If the customer’s goals include improving per- formance during peak times, measure utilization during peak times and typical times. To determine if the measured utilization is healthy, use the Network Health checklist that appears at the end of this chapter.
Measuring Bandwidth Utilization by Protocol
Developing a baseline of network performance should also include measuring utilization from broadcast traffic versus unicast traffic, and by each major protocol. As discussed in Chapter 4, “Characterizing Network Traffic,” some protocols send excessive broadcast traffic, which can seriously degrade performance, especially on switched networks.
To measure bandwidth utilization by protocol, place a protocol analyzer or remote moni- toring (RMON) probe on each major network segment and fill out a chart such as the one shown in Table 3-3. If the analyzer supports relative and absolute percentages, specify the bandwidth used by protocols as relative and absolute. Relative usage specifies how much bandwidth is used by the protocol in comparison to the total bandwidth currently in use on the segment. Absolute usage specifies how much bandwidth is used by the protocol
in comparison to the total capacity of the segment (for example, in comparison to
100 Mbps on Fast Ethernet).
Table 3-3 Bandwidth Utilization by Protocol
Relative Network
Utilization
Absolute Network
Utilization
Broadcast/Multicast
Rate
Protocol 1
Protocol 2
Protocol 3
Protocol n
Analyzing Network Accuracy
Chapter 2 talked about specifying network accuracy as a bit error rate (BER). You can use a BER tester (also called a BERT) on serial lines to test the number of damaged bits compared to total bits. As discussed in the “Checking the Status of Major Routers, Switches, and Firewalls” section later in this chapter, you can also use Cisco show com- mands to gain an understanding of errors on a serial interface, which is a more common practice on modern networks than using a BERT.
With packet-switched networks, it makes more sense to measure frame (packet) errors because a whole frame is considered bad if a single bit is changed or dropped. In packet- switched networks, a sending station calculates a CRC based on the bits in a frame. The sending station places the value of the CRC in the frame. A receiving station determines if a bit has been changed or dropped by calculating the CRC again and comparing the result to the CRC in the frame. A frame with a bad CRC is dropped and must be retrans- mitted by the sender. Usually an upper-layer protocol has the job of retransmitting frames that do not get acknowledged.
A protocol analyzer can check the CRC on received frames. As part of your baseline analysis, you should track the number of frames received with a bad CRC every hour for
1 or 2 days. Because it is normal for errors to increase with utilization, document errors as a function of the number of bytes seen by the monitoring tool. A good rule-of-thumb threshold for considering errors unhealthy is that a network should not have more than one bad frame per megabyte of data. (Calculating errors this way lets you simulate a seri- al BERT. Simply calculating a percentage of bad frames compared to good frames does not account for the size of frames and hence does not give a good indication of how many bits are actually getting damaged.)
In addition to tracking data link layer errors, such as CRC errors, a baseline analysis should include information on upper-layer problems. A protocol analyzer that includes an expert system, such as CACE Technologies’ Wireshark analyzer or WildPackets’ OmniPeek analyzer, speeds the identification of upper-layer problems by automatically generating diagnoses and symptoms for network conversations and applications.
Accuracy should also include a measurement of lost packets. You can measure lost pack- ets while measuring response time, which is covered later in this chapter in the “Analyzing
Delay and Response Time” section. When sending packets to measure how long it takes to receive a response, document any packets that do not receive a response, presumably because either the request or the response got lost. Correlate the information about lost packets with other performance measurements to determine if the lost packets indicate a need to increase bandwidth, decrease CRC errors, or upgrade internetworking devices. You can also measure lost packets by looking at statistics kept by routers on the number of packets dropped from input or output queues.
Analyzing Errors on Switched Ethernet Networks
Switches have replaced hubs in most campus networks. A switch port that is in half- duplex mode follows the normal rules of CSMA/CD. The port checks the medium for any traffic by watching the carrier sense signal, defers to traffic if necessary, detects col-
lisions, backs off, and retransmits. Whether a collision can occur depends on what is con- nected to the switched port. If a shared medium is connected to the switch, collisions
can occur. A good rule of thumb is that fewer than 0.1 percent of frames should encounter collisions. There should be no late collisions. Late collisions are collisions that happen after a port or interface has sent the first 64 bytes of a frame. Late collisions indi- cate bad cabling, cabling that is longer than the 100-meter standard, a bad NIC, or a duplex mismatch.
If the switch port connects a single device, such as another switch, a server, or a single workstation, both ends of this point-to-point link should be configured for full duplex. In this case, collisions should never occur. Full-duplex Ethernet isn’t CSMA/CD. There are only two stations that can send because full duplex requires a point-to-point link, and each station has its own private transmit channel. So full duplex isn’t multiple access
(MA). There’s no need for a station to check the medium to see if someone else is sending on its transmit channel. There isn’t anyone else. So full duplex doesn’t use carrier sense (CS). There are no collisions. Both stations sending at the same time is normal. Receiving while sending is normal. So, there is no collision detection (CD) either.
Unfortunately, the autonegotiation of half versus full duplex has been fraught with prob- lems over the years, resulting in one end of a point-to-point link being set to half duplex and the other being set to full duplex. This is a misconfiguration and must be fixed. Autonegotiation problems can result from hardware incompatibilities and old or defective Ethernet software drivers. Some vendors’ NICs or switches do not conform exactly to the IEEE 802.3u specification, which results in incompatibilities. Hardware incompatibility can also occur when vendors add advanced features, such as autopolarity, that are not in the IEEE 802.3u specification. (Autopolarity corrects reversed polarity on the transmit and receive twisted pairs.)
The autonegotiation of speed isn’t usually a problem. If the speed doesn’t negotiate cor- rectly, the interface doesn’t work, and the administrator hopefully notices and corrects the problem immediately. Manually configuring the speed for 10 Mbps, 100 Mbps, or
1000 Mbps usually isn’t necessary (except for cases where the user interface requires this before it will allow manual configuration of duplex mode). If a LAN still has Category 3 cabling, manually configuring the speed to 10 Mbps is recommended, however. Errors
can increase on a LAN that has autonegotiated for 100 Mbps or 1000 Mbps if there is
Category 3 cabling that does not support the high-frequency signal used on 100- or
1000-Mbps Ethernet.
Duplex negotiation happens after the speed is negotiated. Problems with duplex negotia- tion are harder to detect because any performance impact is dependent on the link part- ners transmitting at the same time. A workstation user who doesn’t send much traffic might not notice a problem, whereas a server could be severely impacted by a duplex mis- match. As part of analyzing the performance of the existing network, be sure to check
for duplex mismatch problems. A surprisingly high number of networks have been hob- bling along for years with performance problems related to a duplex mismatch.
To detect a duplex mismatch, look at the number and type of errors on either end of the link. You can view errors with the show interface or show port command on Cisco routers and switches. Look for CRC and runt errors on one side and collisions on the other side of the link. The side that is set for full duplex can send whenever it wants. It doesn’t need to check for traffic. The side that is set for half duplex does check for traffic and will stop transmitting if it detects a simultaneous transmission from the other side. It will back off, retransmit, and report a collision. The result of the half-duplex station’s stopping transmission is usually a runt frame (shorter than 64 bytes) and is always a CRC- errored frame.
The full-duplex side receives runts and CRC-errored frames and reports these errors. The half-duplex station reports collisions. Most of these will be legal collisions; some might be illegal late collisions. When checking the health of Ethernet LANs, check for these errors. Notice the asymmetry of the errors when there is a duplex mismatch. If you see collisions and CRC errors on both sides of the link, the problem is probably something other than a duplex mismatch, perhaps a wiring problem or bad NIC.
Until recently, most engineers recommended avoiding autonegotiation, but that is chang- ing. Improvements in the interoperability of autonegotiation and the maturity of the tech- nology mean that it is generally safer to rely on autonegotiation than to not rely on it.
There are numerous problems with not using autonegotiation. The most obvious one is human error. The network engineer sets one end of the link and forgets to set the other end. Another problem is that some NICs and switch ports don’t participate in autonegoti- ation if manually set. This means they don’t send the link pulses to report their setting.
How should the partner react to such a situation? The answer is undefined. Some NICs and switch ports assume the other side is too old to understand full duplex and must be using half. This causes the NIC or switch port to set itself to half. This is a serious prob- lem if the other side is manually configured to full. On the other hand, there are cases where autonegotiation simply does not work, and you might need to carefully configure the mode manually.
Analyzing Network Efficiency
Chapter 2 talked about the importance of using maximum frame sizes to increase net- work efficiency. Bandwidth utilization is optimized for efficiency when applications and protocols are configured to send large amounts of data per frame, thus minimizing the number of frames and round-trip delays required for a transaction. The number of frames per transaction can also be minimized if the receiver is configured with a large receive window allowing it to accept multiple frames before it must send an acknowledgment. The goal is to maximize the number of data bytes compared to the number of bytes in headers and in acknowledgment packets sent by the other end of a conversation.
Changing frame and receive window sizes on clients and servers can result in improved efficiency. Increasing the maximum transmission unit (MTU) on router interfaces can also improve efficiency, although doing this is not appropriate on low-bandwidth links that are used for voice or other real-time traffic. (As Chapter 2 mentioned, you don’t want to increase serialization delay.)
On the other hand, increasing the MTU is sometimes necessary on router interfaces that use tunnels. Problems can occur when the extra header added by the tunnel causes frames to be larger than the default MTU, especially in cases where an application sets the IP Don’t Fragment (DF) bit and a firewall is blocking the Internet Control Message Protocol (ICMP) packets that notify the sender of the need to fragment. A typical symp- tom of this problem is that users can ping and telnet but not use HTTP, FTP, and other
protocols that use large frames. A solution is to increase the MTU on the router interface.
To determine if your customer’s goals for network efficiency are realistic, you should use a protocol analyzer to examine the current frame sizes on the network. Many protocol analyzers let you output a chart, such as the one in Figure 3-7, that documents how many frames fall into standard categories for frame sizes. Figure 3-7 shows packet sizes at an Internet service provider (ISP). Many of the frames were 64-byte acknowledgments. A lot of the traffic was HTTP, which used 1500-byte packets in most cases, but also sent 500- and 600-byte packets. If many web-hosting customers had been transferring pages to a web server using a file-transfer or file-sharing protocol, there would have been many more
1500-byte frames. The other traffic consisted of DNS lookups and replies and Simple Mail Transfer Protocol (SMTP), Post Office Protocol (POP), and Address Resolution Protocol (ARP) packets.
A simple way to determine an average frame size is to divide the total number of megabytes seen on a segment by the total number of frames in a specified timeframe. Unfortunately, this is a case in which a simple statistical technique does not result in use- ful data. The average frame size is not a meaningful piece of information. On most net- works, there are many small frames, many large frames, but few average-sized frames. Small frames consist of acknowledgments and control information. Data frames fall into the large frame-size categories. Frame sizes typically fall into what is called a bimodal distribution, also known as a camel-back distribution. A “hump” is on either side of the average but not many values are near the average.
Figure 3-7 Graph of Packet Sizes on an Internet Service Provider’s
Ethernet Backbone
Note Network performance data is often bimodal, multimodal, or skewed from the mean. (Mean is another word for average.) Frame size is often bimodal. Response times from a server can also be bimodal, if sometimes the data is quickly available from RAM cache and sometimes the data is retrieved from a slow mechanical disk drive.
When network-performance data is bimodal, multimodal, or skewed from the mean, you should document a standard deviation with any measurements of the mean. Standard deviation is a measurement of how widely data disperses from the mean.
Analyzing frame sizes can help you understand the health of a network, not just the effi- ciency. For example, an excessive number of Ethernet runt frames (less than 64 bytes) can indicate too many collisions on a shared Ethernet segment. It is normal for collisions
to increase with utilization that results from access contention. If collisions increase even when utilization does not increase or even when only a few nodes are transmitting, there could be a more serious problem, such as a bad NIC or a duplex mismatch problem.
Analyzing Delay and Response Time
To verify that performance of a new network design meets a customer’s requirements,
you need to measure response time between significant network devices before and after a new network design is implemented. Response time can be measured many ways. Using a protocol analyzer, you can look at the amount of time between frames and get a rough estimate of response time at the data link layer, transport layer, and application layer.
(This is a rough estimate because packet arrival times on an analyzer can only approxi- mate packet arrival times on end stations.)
A more common way to measure response time is to send ping packets and measure the round-trip time (RTT) to send a request and receive a response. While measuring RTT, you can also measure an RTT variance. Variance measurements are important for appli- cations that cannot tolerate much jitter (for example, voice and video applications). You can also document any loss of packets.
You can use Table 3-4 to document response time measurements. The table uses the term
node to mean router, server, client, or mainframe.
Table 3-4 Response-Time Measurements
Node A Node B Node C Node D
Node A X
Node B X
Node C X
Node D X
Depending on the amount of time you have for your analysis and depending on your customer’s network design goals, you should also measure response time from a user’s point of view. On a typical workstation, run some representative applications and meas- ure how long it takes to get a response for typical operations, such as checking email, sending a file to a server, downloading a web page, updating a sales order, printing a report, and so on.
Sometimes applications or protocol implementations are notoriously slow or poorly writ- ten. Some peripherals are known to cause extra delay because of incompatibilities with operating systems or hardware. By joining mailing lists and newsgroups and reading infor- mation in journals and on the World Wide Web, you can learn about causes of response- time problems. Be sure to do some testing on your own also, though, because every envi- ronment is different.
In addition to testing user applications, test the response time for network-services proto- cols (for example, DNS queries, DHCP requests for an IP address, RADIUS authentica- tion requests, and so on). Chapter 4 covers protocol issues in more detail.
You should also measure how much time a workstation takes to boot. Some workstation operating systems take a long time to boot due to the amount of network traffic that they send and receive while booting. You can include boot time measurements in your analysis of the existing network so that you have a baseline. When the new network design is implemented, you can compare the amount of time a workstation takes to boot with the baseline time. Hopefully you can use this data to prove that your design is an improvement.
Although your customer might not give you permission to simulate network problems, it makes sense to do some testing of response times when the network is experiencing problems or change. For example, if possible, measure response times while routing pro- tocols are converging after a link has gone down. Measure response times during conver- gence again, after your new design is implemented, to see if the results have improved. As covered in Chapter 12, “Testing Your Network Design,” you can test network problems
on a pilot implementation.
Checking the Status of Major Routers, Switches, and Firewalls
The final step in characterizing the existing internetwork is to check the behavior of the internetworking devices in the internetwork. This includes routers and switches that con- nect layers of a hierarchical topology, and devices that will have the most significant roles in your new network design. It’s not necessary to check every LAN switch, just the major switches, routers, and firewalls.
Checking the behavior and health of an internetworking device includes determining how busy the device is (CPU utilization), how many packets it has processed, how many pack- ets it has dropped, and the status of buffers and queues. Your method for assessing the health of an internetworking device depends on the vendor and architecture of the
device. In the case of Cisco routers, switches, and firewalls, you can use the following
Cisco IOS commands:
■ show buffers: Displays information on buffer sizes, buffer creation and deletion, buffer usage, and a count of successful and unsuccessful attempts to get buffers when needed.
■ show cdp neighbors detail: Displays information about neighbor devices, including which protocols are enabled, network addresses for enabled protocols, the number and type of interfaces, the type of platform and its capabilities, and the version of Cisco IOS Software.
■ show environment: Displays temperature, voltage, and blower information on the
Cisco 7000 series, Cisco 7200 series, and Cisco 7500 series routers, and the Cisco
12000 series Gigabit Switch Router.
■ show interfaces: Displays statistics for interfaces, including the input and output rate of packets, a count of packets dropped from input and output queues, the size and usage of queues, a count of packets ignored due to lack of I/O buffer space on a
card, CRC errors, collision counts, and how often interfaces have restarted.
■ show ip cache flow: Displays information about NetFlow, a Cisco technology that collects and measures data as it enters router and switch interfaces, including source and destination IP addresses, source and destination TCP or UDP port numbers, dif- ferentiated services codepoint (DSCP) values, packet and byte counts, start and end time stamps, input and output interface numbers, and routing information (next-hop address, source and destination autonomous system numbers, and source and desti- nation prefix masks).
■ show memory: Displays statistics about system memory, including total bytes, used bytes, and free bytes. Also shows detailed information about memory blocks.
■ show processes: Displays CPU utilization for the last 5 seconds, 1 minute, and 5 minutes, and the percentage of CPU used by various processes, including routing protocol processes, buffer management, and user-interface processes. (The show processes cpu and show processes cpu history commands are both useful varia- tions of the show processes command.)
■ show running-config: Displays the router’s configuration stored in memory and cur- rently in use.
■ show startup-config: Displays the configuration the router will use upon the next reboot.
■ show version: Displays software version and features, the names and sources of con- figuration files, the boot images, the configuration register, router uptime, and the reason for the last reboot.
Network Health Checklist
You can use the following Network Health checklist to assist you in verifying the health of an existing internetwork. The Network Health checklist is generic in nature and docu- ments a best-case scenario. The thresholds might not apply to all networks.
❑ The network topology and physical infrastructure are well documented.
❑ Network addresses and names are assigned in a structured manner and are well documented.
❑ Network wiring is installed in a structured manner and is well labeled.
❑ Network wiring has been tested and certified.
❑ Network wiring between telecommunications closets and end stations is no more than 100 meters.
❑ Network availability meets current customer goals.
❑ Network security meets current customer goals.
❑ No LAN or WAN segments are becoming saturated (70 percent average network uti- lization in a 10-minute window).
❑ There are no collisions on Ethernet full-duplex links.
❑ Broadcast traffic is less than 20 percent of all traffic on each network segment. (Some networks are more sensitive to broadcast traffic and should use a 10 percent threshold.)
❑ Wherever possible and appropriate, frame sizes have been optimized to be as large as possible for the data link layer in use.
❑ No routers are overused (5-minute CPU utilization is under 75 percent).
❑ On average, routers are not dropping more than 1 percent of packets. (For net- works that are intentionally oversubscribed to keep costs low, a higher threshold can be used.)
❑ Up-to-date router, switch, and other device configurations have been collected, archived, and analyzed as part of the design study.
❑ The response time between clients and hosts is generally less than 100 ms (1/10th of a second).
Summary
This chapter covered techniques and tools for characterizing a network before designing enhancements to the network. Characterizing an existing network is an important step in top-down network design because it helps you verify that a customer’s technical design goals are realistic. It also helps you understand the current topology and locate existing network segments and equipment, which will be useful information when the time comes to install new equipment.
As part of the task of characterizing the existing network, you should develop a baseline of current performance. Baseline performance measurements can be compared to new measurements once your design is implemented to demonstrate to your customer that your new design (hopefully) improves performance.
EmoticonEmoticon