Wednesday, 4 February 2026

ISP

 We had 2 ISPs.

Primary went down. Secondary was UP.
Still, internet was not working.

On paper, this looked like a perfect design:
2 ISPs
SD-WAN profile for path selection
Automatic failover expected

Reality: When Primary failed, traffic had nowhere to go.
Why?
Because Secondary ISP was never advertising a default route to the firewall.
So even though the link was UP,
the firewall had no route to the internet.

What went wrong (design mistake):
We assumed:
“If SD-WAN is configured, failover will just work.”
But SD-WAN only selects between existing routes.
It does NOT create routes.
No route = no forwarding = no internet.

The real root cause: We never tested Secondary in isolation.
Primary was always healthy,
so Secondary stayed “theoretical HA”.
Until the day it became production.

Architect takeaway:
High Availability is not about having backup links.
It’s about proving backup paths actually work.

If you’ve never:
Pulled the primary cable
Or disabled the primary route
Then your secondary is not HA.
It’s just hope with an interface.

No comments:

Post a Comment

Why do many Palo Alto engineers open a TAC case immediately… without checking anything first?

A production issue happens. Application team says “network issue.” Users say “firewall problem.” And within minutes someone says: “Let’s ope...