Wednesday, 4 February 2026

ISP

 We had 2 ISPs.

Primary went down. Secondary was UP.
Still, internet was not working.

On paper, this looked like a perfect design:
2 ISPs
SD-WAN profile for path selection
Automatic failover expected

Reality: When Primary failed, traffic had nowhere to go.
Why?
Because Secondary ISP was never advertising a default route to the firewall.
So even though the link was UP,
the firewall had no route to the internet.

What went wrong (design mistake):
We assumed:
“If SD-WAN is configured, failover will just work.”
But SD-WAN only selects between existing routes.
It does NOT create routes.
No route = no forwarding = no internet.

The real root cause: We never tested Secondary in isolation.
Primary was always healthy,
so Secondary stayed “theoretical HA”.
Until the day it became production.

Architect takeaway:
High Availability is not about having backup links.
It’s about proving backup paths actually work.

If you’ve never:
Pulled the primary cable
Or disabled the primary route
Then your secondary is not HA.
It’s just hope with an interface.

No comments:

Post a Comment

🔥 The Hidden Risk of “Wide Open” Internal Policies — And How To Remove Them Safely

In one of my recent projects, I noticed a wide open internal traffic policy in place. Later, I was asked to work on this issue and remove th...