We want to share the following Root Cause Analysis with you
Update: 25.02.2026 - Initial publishing
What happened?
On February 24, 2026, during a scheduled maintenance window intended to improve network resiliency in the FRA region ([Link]), a configuration deployment led to a loss of connectivity for public services.
Internal monitoring detected the outage shortly after the start of the announced maintenance, once the configuration change was applied. Our network engineering teams identified link failures between critical hardware components. By 20:55 UTC, a manual fix was applied to the affected devices, and initially affected services were restored by 21:00 UTC.
How could this happen? (Root Cause)
The incident was caused by a configuration state drift between our central software repository and the live hardware settings in the FR7 production environment that went undetected before the rollout.
Specifically, the outage involved Forward Error Correction (FEC) settings—parameters that allow different brands of networking hardware to communicate reliably.
What are we doing to prevent recurrence?
We are committed to ensuring this specific failure mode does not happen again. Our engineering teams have initiated the following corrective actions:
The scheduled maintenance was initially planned to have only a few seconds of service disruption; however, due to the issues described, it caused a significantly higher impact. This maintenance was part of our ongoing initiative to improve stability and performance in our data centers. Our network team remains committed to driving this initiative forward. We understand the impact that this incident has caused and are working with due diligence and urgency to incorporate the lessons learned to further reduce risks during maintenance operations on our core network components.