5.1 Network Troubleshooting Methodology
A systematic approach to network troubleshooting ensures efficient problem resolution and prevents recurring issues through proper documentation and analysis.
The Structured Troubleshooting Process
Identify the Problem
Gather Information: Collect detailed information about the issue from multiple sources
- Question users: What exactly is not working? When did it start?
- Identify symptoms: Error messages, performance issues, connectivity failures
- Determine changes: Recent updates, configuration changes, new equipment
- Duplicate the problem: Reproduce the issue to understand its scope
- Approach individually: Handle multiple problems separately to avoid confusion
Establish a Theory of Probable Cause
Question the Obvious: Start with simple, common causes before complex scenarios
Consider Multiple Approaches:
- Top-to-bottom OSI: Start at Application layer, work down to Physical
- Bottom-to-top OSI: Start at Physical layer, work up to Application
- Divide and conquer: Isolate problem domain by testing different segments
Test the Theory to Determine Cause
If theory confirmed: Determine next steps to resolve the problem
If theory not confirmed: Establish new theory or escalate to higher expertise
• Use non-disruptive tests when possible
• Document test results for future reference
• Consider impact on production systems
Establish Plan of Action
Develop comprehensive plan to resolve problem and identify potential effects:
- Resource requirements: Personnel, equipment, time
- Impact assessment: Affected users, systems, services
- Rollback plan: How to undo changes if needed
- Communication plan: Notify stakeholders of planned changes
Implement Solution or Escalate
Execute the planned solution or escalate to appropriate personnel:
- Follow change management: Proper approval and scheduling
- Escalation criteria: Complex issues, critical systems, time constraints
- Implementation timing: Consider maintenance windows
Verify Full System Functionality
Confirm the problem is resolved and implement preventive measures:
- End-to-end testing: Verify complete functionality
- User validation: Confirm users can perform required tasks
- Preventive measures: Monitoring, alerts, configuration changes
- Performance baseline: Establish new performance metrics
Document Findings and Lessons Learned
Create comprehensive documentation throughout the process:
- Problem description: Symptoms, scope, impact
- Root cause analysis: What caused the problem
- Solution implemented: Step-by-step resolution
- Lessons learned: Prevention strategies, knowledge transfer
• Always have a backup plan before making changes
• Test changes in a lab environment when possible
• Keep detailed logs of all troubleshooting steps
• Communicate clearly with users and stakeholders
• Learn from each troubleshooting experience
5.2 Cabling & Physical Interface Issues
Cable Issues
Incorrect Cable Types
Fiber Optic Mismatch:
- Single-mode: Long distance, narrow core (9μm)
- Multi-mode: Shorter distance, wider core (50μm/62.5μm)
- Problem: Mode mismatch causes signal loss
- Detection: High attenuation, intermittent connectivity
Ethernet Cable Categories:
- Cat 5e: 1Gbps, 100MHz
- Cat 6: 1Gbps (10Gbps short distance), 250MHz
- Cat 6A: 10Gbps, 500MHz
- Cat 7/8: Higher frequencies, specialized applications
Shielding Issues:
- STP: Shielded Twisted Pair - better EMI protection
- UTP: Unshielded Twisted Pair - more common, cost-effective
- Problem: Using UTP in high-EMI environments
Signal Degradation Issues
Crosstalk
Electromagnetic interference between wire pairs:
- NEXT: Near-End Crosstalk - interference at transmitting end
- FEXT: Far-End Crosstalk - interference at receiving end
- Causes: Damaged cable, poor termination, bundle proximity
- Testing: Cable analyzer with crosstalk measurements
Electromagnetic Interference (EMI)
External interference affecting signal quality:
- Sources: Motors, fluorescent lights, wireless devices
- Symptoms: Intermittent connectivity, performance issues
- Mitigation: Shielded cables, proper grounding, route separation
- Testing: Spectrum analyzer, EMI detector
Attenuation
Signal strength loss over distance:
- Causes: Cable length, poor connections, cable degradation
- Limits: 100m for copper Ethernet, varies for fiber
- Solutions: Repeaters, switches, shorter cable runs
- Testing: Cable tester, optical power meter
Termination & Polarity Issues
Improper Termination
Incorrect cable connector installation:
- RJ45 issues: Wrong wire order, poor crimping, exposed conductors
- Fiber issues: Poor polish, contamination, improper insertion loss
- Standards: T568A vs T568B wiring standards
- Testing: Wire map testing, continuity testing
TX/RX Transposition
Transmit and receive pairs swapped:
- Fiber optic: TX connected to TX, RX to RX (no communication)
- Copper: Wrong pair assignments
- Detection: No link light, failed connectivity
- Solution: Swap fiber connections, rewire copper pairs
Interface Issues & Counters
Error Counters
CRC (Cyclic Redundancy Check): Frame corruption errors
- Causes: Physical layer issues, electromagnetic interference
- Impact: Retransmissions, performance degradation
Runts: Frames smaller than 64 bytes
- Causes: Collisions, faulty NICs, duplex mismatch
Giants: Frames larger than maximum size (1518 bytes)
- Causes: Faulty drivers, incorrect MTU settings
Drops: Discarded packets due to buffer overflow
- Causes: Congestion, insufficient bandwidth
Port Status Issues
Port States
Error Disabled: Port automatically shut down due to error condition
- Triggers: Port security violation, STP guard, duplex mismatch
- Recovery: Clear error condition, manually re-enable port
Administratively Down: Port manually disabled
- Commands: shutdown/no shutdown
- Reasons: Security, maintenance, troubleshooting
Suspended: Port temporarily disabled by switch
- Causes: Policy violations, security restrictions
Hardware Issues
Power over Ethernet (PoE)
Power Budget Exceeded: Switch cannot provide enough power
- PoE (802.3af): 15.4W per port
- PoE+ (802.3at): 30W per port
- PoE++ (802.3bt): 60W or 100W per port
- Solution: PoE budget management, external power injectors
Incorrect PoE Standard: Mismatch between device requirements and switch capability
Transceiver Issues
Transceiver Mismatch: Incompatible SFP/QSFP modules
- Speed mismatch: 1G SFP in 10G port
- Protocol mismatch: Ethernet vs Fibre Channel
- Vendor compatibility: Third-party vs OEM transceivers
Signal Strength Issues:
- Fiber attenuation: Signal loss over distance
- Dirty connectors: Contamination affecting light transmission
- Testing: Optical power meter, OTDR testing
5.3 Network Services Issues
Switching Issues
Spanning Tree Protocol (STP) Issues
Network Loops: Multiple paths causing broadcast storms
- Symptoms: High CPU utilization, slow network performance
- Detection: Rapidly changing MAC address tables
- Prevention: Proper STP configuration, loop prevention
Root Bridge Selection Issues:
- Problem: Suboptimal root bridge selection
- Solution: Manual priority configuration
- Best practice: Set core switches as root bridge
Port Roles & States:
- Roles: Root, designated, alternate, backup
- States: Discarding, learning, forwarding
- Issues: Slow convergence, incorrect role assignment
VLAN Issues
Incorrect VLAN Assignment:
- Symptoms: Cannot communicate with other hosts
- Troubleshooting: Verify port VLAN membership
- Commands: show vlan, show interface switchport
- Common causes: Wrong access VLAN, trunk misconfiguration
VLAN Trunking Issues:
- Native VLAN mismatch: Different native VLANs on trunk
- Allowed VLAN list: Required VLANs not permitted
- Encapsulation mismatch: 802.1Q vs ISL
Access Control List (ACL) Issues
Improperly configured ACLs blocking legitimate traffic:
- Order dependency: First match wins principle
- Implicit deny: Default deny at end of ACL
- Direction: Inbound vs outbound application
- Troubleshooting: ACL hit counters, log analysis
Routing Issues
Route Selection Problems
Routing Table Issues:
- Missing routes: No path to destination network
- Incorrect routes: Wrong next-hop or interface
- Route preference: Administrative distance conflicts
- Commands: show ip route, show route
Default Route Problems:
- Missing default route: Cannot reach unknown networks
- Incorrect default gateway: Wrong next-hop address
- Multiple defaults: Conflicting default routes
IP Address Configuration Issues
DHCP Issues
Address Pool Exhaustion: No available IP addresses in DHCP scope
- Symptoms: Clients receive APIPA addresses (169.254.x.x)
- Causes: Undersized scope, long lease times, scope depletion
- Solutions: Expand scope, reduce lease time, reclaim unused addresses
- Monitoring: DHCP scope utilization, lease tracking
IP Configuration Errors
Incorrect Default Gateway:
- Symptoms: Local network access works, Internet doesn't
- Testing: ping default gateway, traceroute
- Common causes: Wrong IP, gateway down, routing issues
Incorrect IP Address:
- Wrong subnet: IP not in correct network range
- Duplicate IP: Two devices with same IP address
- Detection: ARP conflicts, ping responses
Incorrect Subnet Mask:
- Symptoms: Communication issues with some hosts
- Impact: Incorrect network/broadcast calculation
- Testing: Verify network connectivity patterns
• Verify physical connectivity first
• Check configuration consistency across devices
• Use appropriate show commands for each service
• Monitor logs for error messages and patterns
• Test connectivity at each network layer
5.4 Performance Issues
Network Congestion & Capacity
Congestion/Contention
Multiple devices competing for limited network resources:
- Symptoms: Slow response times, packet loss, timeouts
- Causes: Insufficient bandwidth, shared collision domains
- Detection: Interface utilization monitoring, queue depths
- Solutions: Bandwidth upgrades, traffic shaping, load balancing
Bottlenecking
Single point of constraint limiting overall performance:
- Common locations: Uplinks, Internet connections, servers
- Identification: Performance monitoring, traffic analysis
- CPU bottlenecks: High device CPU utilization
- Memory bottlenecks: Insufficient buffer space
Bandwidth vs Throughput
Bandwidth: Maximum theoretical capacity of network link
Throughput: Actual data transfer rate achieved
- Factors affecting throughput: Protocol overhead, errors, congestion
- Measurement: iperf, speed tests, flow monitoring
- Optimization: Protocol tuning, error reduction
Network Performance Metrics
Latency
Time delay for data to travel from source to destination:
- Types: Propagation, transmission, processing, queuing delay
- Measurement: ping, traceroute, synthetic transactions
- Typical values: LAN <1ms, WAN 20-100ms, Satellite 500-600ms
- Impact: Application responsiveness, user experience
Packet Loss
Percentage of packets that fail to reach destination:
- Causes: Congestion, buffer overflow, link errors
- Impact: Retransmissions, reduced throughput
- Acceptable levels: <0.1% for most applications
- Testing: Extended ping tests, performance monitoring
Jitter
Variation in packet arrival times (latency variation):
- Causes: Network congestion, route changes, queuing
- Impact: Voice quality degradation, video stuttering
- Measurement: Jitter buffers, VoIP quality metrics
- Mitigation: QoS implementation, traffic prioritization
Wireless Performance Issues
RF Interference
Channel Overlap: Adjacent wireless networks using overlapping channels
- 2.4GHz: Use channels 1, 6, 11 for non-overlap
- 5GHz: More channels available, less congestion
- Detection: Wi-Fi analyzer, spectrum analysis
- Solution: Channel planning, power adjustment
External Interference:
- Sources: Microwaves, Bluetooth, baby monitors
- Symptoms: Intermittent connectivity, slow speeds
- Mitigation: Channel change, 5GHz migration
Signal & Coverage Issues
Signal Degradation/Loss:
- Causes: Distance, obstacles, multipath fading
- Symptoms: Low signal strength, high retry rates
- Solutions: Additional APs, antenna adjustments
Insufficient Wireless Coverage:
- Dead zones: Areas with no signal
- Weak signal areas: Poor performance zones
- Solution: Site survey, AP placement optimization
Client Connection Issues
Client Disassociation:
- Causes: Weak signal, interference, authentication issues
- Symptoms: Frequent disconnections, failed connections
- Troubleshooting: Client logs, AP association logs
Roaming Misconfiguration:
- Sticky clients: Not roaming to stronger signal
- Excessive roaming: Constantly switching APs
- Solutions: Power adjustment, roaming thresholds
• Establish baseline performance metrics
• Monitor key performance indicators continuously
• Use appropriate tools for each type of analysis
• Consider application requirements when evaluating performance
• Implement QoS for critical applications
5.5 Troubleshooting Tools & Protocols
Software Troubleshooting Tools
Protocol Analyzer
Deep packet inspection and network traffic analysis:
- Wireshark: Most popular open-source packet analyzer
- Capabilities: Capture, filter, analyze network traffic
- Use cases: Protocol debugging, security analysis, performance troubleshooting
- Features: Real-time capture, offline analysis, protocol decoding
Network Discovery Tools
Nmap: Network discovery and security auditing tool
- Host discovery: Find active devices on network
- Port scanning: Identify open ports and services
- OS detection: Fingerprint operating systems
- Script scanning: Automated vulnerability detection
LLDP/CDP: Link Layer Discovery Protocols
Command Line Tools
Command | Purpose | Example Usage | Key Information |
---|---|---|---|
ping | Test connectivity and latency | ping 8.8.8.8 | Round-trip time, packet loss |
traceroute/tracert | Trace path to destination | traceroute google.com | Hop-by-hop latency, routing path |
nslookup | DNS troubleshooting | nslookup example.com | DNS resolution, record types |
dig | Advanced DNS lookup | dig @8.8.8.8 example.com MX | Detailed DNS information |
tcpdump | Command-line packet capture | tcpdump -i eth0 port 80 | Network traffic analysis |
netstat | Network connection status | netstat -an | Active connections, listening ports |
ip/ifconfig/ipconfig | Interface configuration | ip addr show | IP addresses, interface status |
arp | ARP table management | arp -a | MAC-to-IP mappings |
Hardware Troubleshooting Tools
Physical Layer Testing
Toner/Probe: Cable tracing and identification
- Function: Locate cables in walls, patch panels
- Components: Tone generator and inductive probe
- Use cases: Cable mapping, fault isolation
Cable Tester: Verify cable integrity and performance
- Basic testing: Continuity, wire mapping, length
- Advanced testing: Crosstalk, attenuation, delay skew
- Certification: Category compliance testing
Network Analysis Hardware
Network Taps: Hardware devices for traffic monitoring
- Function: Copy network traffic for analysis
- Types: Passive optical, active electrical
- Benefits: No impact on network performance
Wi-Fi Analyzer: Wireless network analysis tools
- Function: RF spectrum analysis, signal strength mapping
- Capabilities: Channel utilization, interference detection
- Site surveys: Coverage planning and optimization
Visual Fault Locator: Fiber optic troubleshooting
- Function: Inject visible light into fiber
- Detection: Breaks, bends, connectors
- Range: Typically up to 5km
Performance Testing
Speed Tester: Bandwidth and throughput measurement
- Web-based: Speedtest.net, Fast.com
- Command-line: iperf, iperf3
- Dedicated hardware: Professional testing equipment
- Metrics: Download/upload speeds, latency, jitter
Network Device Commands
Layer 2 Troubleshooting Commands
show mac-address-table: Display MAC address mappings
- Information: MAC addresses, VLANs, ports
- Troubleshooting: MAC flapping, learning issues
show vlan: Display VLAN configuration and status
- Information: VLAN ID, name, ports, status
- Verification: Port assignments, VLAN existence
show arp: Display ARP table entries
- Information: IP-to-MAC mappings, interface associations
- Troubleshooting: Address resolution issues
Layer 3 & Interface Commands
show route: Display routing table
- Information: Networks, next-hops, metrics, protocols
- Verification: Path selection, reachability
show interface: Display interface status and statistics
- Information: Status, utilization, errors, counters
- Troubleshooting: Physical layer issues, performance
show config: Display device configuration
- Information: Running vs startup configuration
- Verification: Configuration consistency
show power: Display PoE status and budgets
- Information: Power consumption, available power
- Troubleshooting: PoE delivery issues
• Choose appropriate tool for the problem scope
• Start with simple tools before complex analysis
• Use multiple tools to correlate findings
• Consider impact on production systems
• Document tool results for future reference