How LoginRadius Maintained 100% Uptime During the AWS US-EAST-1 Outage

Outages Happen — But Your Service Shouldn’t Stop.
First published: 2025-11-21      |      Last updated: 2025-11-21

Last month, AWS had a major service disruption in its US-EAST-1 (N. Virginia) region. The impact was felt across the internet, with businesses facing everything from latency spikes to full downtime.

Meanwhile, LoginRadius stayed completely operational. No downtime. No degraded performance. No surprises for our customers.

For us, this wasn’t luck - this was our reliability-by-design architecture doing exactly what it was built to do.

Context on What Happened at AWS

The incident started late on October 19, 2025 (11:49 PM PDT) and stretched well into the next day. AWS reported increased error rates across several services in the US-EAST-1 region, which they traced back to DNS resolution failures for regional DynamoDB endpoints.

This issue cascaded through other AWS systems. AWS mitigated the core DNS problem by 2:24 AM PDT on October 20, but full recovery wasn’t confirmed until 3:01 PM PDT—roughly 16 hours of intermittent or limited service for the largest AWS region.

Resilient by Design

While the cloud ecosystem was scrambling back to normal, the LoginRadius platform kept running smoothly. Our monitoring showed no anomalies, no latency spikes, and no API errors across any of our environments.

Key outcomes during the outage:

  • Zero Downtime : Customer-facing APIs and services stayed online throughout.

  • Consistent Performance : We maintained normal API traffic and delivered 500 ms response times on 100% of requests, with no latency spikes.

  • Data Integrity : No data loss, no consistency issues.

  • Automated Validation : Health checks across all regions confirmed that failover mechanisms were ready—but not needed—thanks to efficient automated routing.

How We Stayed Resilient : The Architecture

Our ability to absorb the US-EAST-1 disruption wasn’t reactive. It’s the result of deliberate architectural decisions. LoginRadius runs an active-active, multi-region infrastructure built on core layers of defense:

1. Global Traffic Management with Two-Tier Failover

Every request passes through two layers of routing to ensure:

  • resilience

  • low latency

  • seamless failover

Our first layer of defense reroutes traffic away from unhealthy regions. If more granular failover is needed, a second layer of application-level routing kicks in. Combined, this ensures every request has a healthy path.

2. Multi-Region Compute Clusters

Our stateless microservice architecture runs across different independent AWS regions. Because our services are stateless, they can scale up or down based upon the need. When the primary EKS cluster was impacted, the multi-tier routing kicked in. It redirected traffic to the secondary cluster, ensuring continuous service.

3. Distributed Data Layer

We have active-active replication so data stays synchronized in real time across regions. This redundancy ensured complete consistency and zero data loss, even as one region experienced issues.

4. Continuous, Multi-Layered Monitoring

Visibility is foundational to resilience. Our observability stack brings together a real-time APM solution, a public endpoint health check tool, a dedicated logs and anomaly detection platform, and one designed to alert and escalate.

What This Means for Our Customers

This outage is a reminder of a simple truth: cloud incidents are inevitable. But your service doesn’t have to go down because of them.

LoginRadius is designed to handle failures beneath the surface through:

  • automated traffic steering

  • redundant compute across regions

  • distributed, consistent data layers

  • continuous global monitoring and failover automation

Your authentication flows, APIs, and hosted pages stay fast, secure, and uninterrupted—even when underlying infrastructure isn’t.

You can focus on growing your business knowing your identity platform is built on resilient engineering.

Looking Ahead: The Journey Continues

While this event was a tremendous validation of our engineering principles, resilience is a journey, not a milestone.

We’ll continue to:

  • expand disaster recovery playbooks with simulated regional failures

  • enhance observability ensuring that health metrics and anomaly detection remain predictive, not reactive.

  • strengthen multi-cloud readiness to provide even greater operational independence and flexibility

Our vision stays the same: to deliver a global identity platform that’s resilient by design. So your business stays online, even when parts of the internet don’t.

cardImage

The State of Consumer Digital ID 2024

cardImage

Top CIAM Platform 2024

cardImage

Learn How to Master Digital Trust

Customer Identity, Simplified.

No Complexity. No Limits.
Thousands of businesses trust LoginRadius for reliable customer identity. Easy to integrate, effortless to scale.

See how simple identity management can be. Start today!