Network service degraded

Incident Report for IONOS Cloud

Postmortem

We want to share the following Root Cause Analysis with you

Update: 25.02.2026 - Initial publishing

What happened?

On February 24, 2026, during a scheduled maintenance window intended to improve network resiliency in the FRA region ([Link]), a configuration deployment led to a loss of connectivity for public services.

Internal monitoring detected the outage shortly after the start of the announced maintenance, once the configuration change was applied. Our network engineering teams identified link failures between critical hardware components. By 20:55 UTC, a manual fix was applied to the affected devices, and initially affected services were restored by 21:00 UTC.

How could this happen? (Root Cause)

The incident was caused by a configuration state drift between our central software repository and the live hardware settings in the FR7 production environment that went undetected before the rollout.

Specifically, the outage involved Forward Error Correction (FEC) settings—parameters that allow different brands of networking hardware to communicate reliably.

  • The Discrepancy: Unlike the staging environments, the production environment had unique configurations for settings required for multi-vendor hardware interoperability. Despite using a "four-eyes" principle to validate the configuration changes, the rendered output did not provide enough visibility into the unexpected discrepancy. This caused the difference to go unnoticed.
  • The Trigger: The automated deployment performed a "full rebuild" of the device configuration. Because the repository did not contain the specific FEC settings (the discrepancy), it omitted them during the rebuild.
  • The Result: Once the new configuration was pushed, the lack of FEC parity caused the physical links between mismatched devices to fail, dropping traffic for all public services in the region.

What are we doing to prevent recurrence?

We are committed to ensuring this specific failure mode does not happen again. Our engineering teams have initiated the following corrective actions:

  • Comprehensive Configuration Audit: We are performing a full audit of all production devices to identify and resolve any "drifts" where live settings (like FEC) differ from our central repository. (to be completed within Q1 2026)
  • Improved Validation Checks: We are implementing an automated pre-flight check that compares the "intended" configuration against the "running" configuration to clearly flag any potential omissions before a change is finalized. This increases the visibility of drifts and unexpected discrepancies and reduces the surface area for human error. (to be completed within Q1 2026)

The scheduled maintenance was initially planned to have only a few seconds of service disruption; however, due to the issues described, it caused a significantly higher impact. This maintenance was part of our ongoing initiative to improve stability and performance in our data centers. Our network team remains committed to driving this initiative forward. We understand the impact that this incident has caused and are working with due diligence and urgency to incorporate the lessons learned to further reduce risks during maintenance operations on our core network components.

Posted Feb 25, 2026 - 11:56 UTC

Resolved

The issue was linked to the scheduled network maintenance announced here: https://status.ionos.cloud/incidents/y3q7703x5fg4.
We have confirmed that the problem is now resolved. Our team is continuing the investigation to determine the root cause, and we will publish the RCA on this page as soon as it becomes available.
Posted Feb 24, 2026 - 21:44 UTC

Monitoring

A fix Has been implemented and Customers should see traffic and connectivity return to normal at this point.
Posted Feb 24, 2026 - 21:05 UTC

Identified

The team has now Identified the source and are investigating remediation steps.
Posted Feb 24, 2026 - 21:00 UTC

Update

We are continuing to investigate this issue.
Posted Feb 24, 2026 - 20:53 UTC

Update

We are continuing to investigate this issue.
Posted Feb 24, 2026 - 20:40 UTC

Investigating

We are writing to inform you that we have been experiencing sporadic connection issues and substantial delays on packet delivery. Network technicians have started working on the occurrence immediately after detection and will isolate the problem and solve the issue as quick as possible. However, it is possible that there will be a certain degradation in connection quality affecting individual virtual resources. We will inform you as soon as the functionality has been restored.
Posted Feb 24, 2026 - 20:39 UTC
This incident affected: Location DE/FRA (Network).