Thursday, 26 February 2026

🚨 Why BGP Flaps Occur in Palo Alto Firewalls (And How to Prevent Them)



Last week, during troubleshooting, we observed an intriguing pattern:

👉 Routes appearing and disappearing
👉 Intermittent traffic drops
👉 Frequent BGP state changes

This is a classic case of BGP flapping.

However, here’s a crucial truth that many engineers overlook:

BGP flaps are rarely solely a BGP issue.

🔍 What is a BGP Flap?

A BGP flap occurs when the BGP neighbor relationship repeatedly transitions between:

➡️ Established → Down → Established → Down

Each flap triggers several consequences:

• Route withdrawals
• Route re-advertisements
• Potential traffic disruptions
• Increased control-plane churn

Even minor flaps can cause significant headaches in production environments.

⚙️ Common Causes in Palo Alto Environments

Based on field experience, these are the primary culprits:

1️⃣ Aggressive BGP Timers

If keepalive/hold timers are set too low, the following issues arise:

• Minor packet loss leads to session drops
• The control plane becomes overly sensitive
• Neighbor resets occur frequently

✅ Check:
Navigate to “Network > Virtual Router > BGP > Peer Group”


2️⃣ Underlying Interface Instability

Remember that BGP relies on the stability of interfaces and IP reachability. If an interface experiences fluctuations, BGP will also experience flapping.

Typical causes include:

• Physical link issues
• HA failovers
• VLAN/zone misconfiguration
• Cloud ENI instability

✅ Verify:
Examine interface logs and system logs first, rather than solely relying on BGP logs.


3️⃣ Path Monitoring / Static Route Withdrawals

In Palo Alto, when path monitoring fails, the following sequence of events occurs:

➡️ Static route is removed
➡️ The next hop becomes unreachable
➡️ BGP sessions drop

This issue often deceives many engineers.

✅ Check:
- Network > Virtual Router > Static Route > Path Monitoring

4️⃣ Control Plane Resource Stress

If the firewall is busy, it may experience:

- High CPU usage
- Packet buffer pressure
- Session table stress

This can lead to delayed BGP keepalives, causing neighbor resets.

✅ Monitor:
- “show system resources”

5️⃣ MTU or Fragmentation Issues (Silent Killer)

These issues are commonly encountered in:

- IPSec tunnels
- Cloud VPNs
- GRE overlays

Symptoms include:

- TCP handshake functioning correctly
- Intermittent failure of BGP keepalives

✅ Test:
- Perform an extended ping with the DF bit set.

🛠️ How I Usually Troubleshoot (Real-World Flow)

Instead of immediately diving into BGP configuration, follow this order:

1️⃣ Check interface stability
2️⃣ Review system logs for link/HA events
3️⃣ Verify path monitoring
4️⃣ Assess CPU and control plane resources
5️⃣ Only then tune BGP timers


No comments:

Post a Comment

Why do many Palo Alto engineers open a TAC case immediately… without checking anything first?

A production issue happens. Application team says “network issue.” Users say “firewall problem.” And within minutes someone says: “Let’s ope...