Sunday, 4 January 2026

BGP Case Study: When “Routes Are Up” but Traffic Is Still Broken



Last week, I was troubleshooting a Palo Alto firewall where BGP showed as Established, prefixes were learned — yet traffic was blackholing intermittently.

At first glance, everything looked healthy:
BGP state: Established
Routes: Present in routing table
No obvious errors or flaps
But users were still complaining.

What was actually happening?
After deeper analysis, the issue turned out to be a combination of design and BGP behavior gaps:
๐Ÿ”น Asymmetric routing
Inbound traffic preferred ISP-A
Outbound traffic exited via ISP-B
Stateful firewall dropped return traffic
๐Ÿ”น Missing BGP attribute control
No Local Preference tuning
Default best-path selection led to unpredictable routing
๐Ÿ”น Over-reliance on BGP state
“BGP up” was assumed as “traffic healthy”
No traffic validation or session-aware checks

The Fix (Architect mindset)
✔ Explicit Local Preference for primary path
✔ Controlled AS-Path prepending for backup ISP
✔ Route verification using:
show routing route
Traffic logs (session end reasons)
Packet capture for asymmetric flows
Key takeaway

BGP being UP does not mean your application path is correct.
Always validate control plane + data plane together.

As network architects, our job isn’t just to make protocols work —
it’s to make traffic predictable and resilient.
Have you faced a BGP issue where everything looked fine but wasn’t?
Let’s discuss...

No comments:

Post a Comment

๐Ÿ”ฅ The Hidden Risk of “Wide Open” Internal Policies — And How To Remove Them Safely

In one of my recent projects, I noticed a wide open internal traffic policy in place. Later, I was asked to work on this issue and remove th...