Thursday, 26 February 2026

 BGP Troubleshooting — Step-by-Step Practical Guide


When BGP fails, traffic stops, routes disappear, and users feel it immediately. Here is a structured method I follow in production networks.

1️⃣ Verify Basic Reachability
BGP runs over TCP port 179. If IP connectivity fails, BGP cannot work.
✔ Ping neighbor loopback / interface
✔ Check routing table for neighbor IP
✔ Verify no ACL or firewall blocking TCP 179
Key idea: No IP reachability = No BGP session.

2️⃣ Check BGP Neighbor State
Run:
Copy code

show ip bgp summary

Look for neighbor state:
• Idle → No TCP connection
• Connect / Active → TCP problem
• Established → Session OK
If not Established, focus on configuration mismatch or network reachability.

3️⃣ Validate Neighbor Configuration
Most BGP issues are configuration mistakes.
✔ Correct neighbor IP
✔ Correct remote-AS
✔ Update-source configured (for loopback peering)
✔ Proper multihop setting (if not directly connected)

Small typo = session down.

4️⃣ Authentication Problems
If MD5 authentication is configured:
✔ Password must match on both sides
✔ Check logs for authentication failure
Mismatch = session resets repeatedly.

5️⃣ Check Route Advertisement
Session up but routes missing? Then check policy.
✔ Network statements present
✔ Route redistribution configured
✔ Route-map / prefix-list not blocking routes
✔ Next-hop reachable

Command:
Copy code

show ip bgp neighbors x.x.x.x advertised-routes

6️⃣ Investigate Route Filtering
Many networks fail because of filtering policies.
✔ Prefix-list direction (in / out)
✔ Route-map deny statements
✔ Maximum-prefix limit reached

Policy errors silently drop routes.

7️⃣ Check BGP Attributes and Path Selection
If route received but not used:
✔ Local Preference
✔ AS Path length
✔ MED value
✔ Weight
✔ Next-hop reachability

Best path selection determines traffic flow.

8️⃣ Monitor Logs and Debug Carefully
Logs give the real story.
✔ Neighbor reset reason
✔ Hold timer expiry
✔ Policy rejection
Use debug only in maintenance window.

9️⃣ Check Physical and L2 Issues
Sometimes problem is not BGP.
✔ Interface flapping
✔ Duplex mismatch
✔ VLAN or trunk issue
✔ High CPU or memory

Transport instability breaks BGP.

🔟 Compare With Working Peer
Best practical trick:
👉 Compare working neighbor vs failing neighbor
👉 Spot configuration differences quickly

Final Tip:
Always troubleshoot in layers → Physical → IP → TCP → BGP → Policy.

This structured approach reduces MTTR and builds strong network stability.


No comments:

Post a Comment

Why do many Palo Alto engineers open a TAC case immediately… without checking anything first?

A production issue happens. Application team says “network issue.” Users say “firewall problem.” And within minutes someone says: “Let’s ope...