S3 outage and High error rate in Frankfurt

Incident Report for IONOS Cloud

Postmortem

What happened?

We experienced a network incident resulting in connectivity issues between our FRA and FRE data centers. This led to a high error rate for services hosted on a segment of S3 hosts in the FRA region.

How could this happen? (Technical Root Cause)

The incident was caused by a software failure within the Border Gateway Protocol (BGP) control plane on a specific network switch. This failure caused a Layer 3 routing failure, which resulted in outgoing traffic from a segment of S3 hosts being blackholed (unexpectedly dropped).

A network redundancy mechanism designed to prevent traffic blackholing in failure scenarios did not activate because the physical network connections remained operational, even though the critical control plane service had failed. Since the physical links did not go down, the redundancy mechanism was bypassed. The issue was ultimately resolved by rebooting the affected network switch.

What are we doing to prevent this from happening again?

To prevent a recurrence and improve network stability, we are implementing the following measures:

  • Improved Monitoring: Enhancing BGP and container monitoring on all network devices to detect control plane failures faster. (01. 2026)
  • Vendor Support: Engaging with our network switch supplier through an open vendor support case to address the underlying software issue. (ONGOING)
  • Resiliency Investigation: Investigating and implementing improvements to enhance the overall resiliency of our network fabric devices and efficacy of failover mechanisms and triggers. (ONGOING)
Posted Dec 23, 2025 - 08:41 UTC

Resolved

This incident has been resolved.
Posted Dec 22, 2025 - 11:22 UTC

Update

The team is continuing to run repairs and monitor the situation. Some customers may notice some degraded service as things work themselves out and all systems return to normal. Customers may also notice that Backups and so on did not run overnight. As mentioned, the RCA will follow along with more information.
Posted Dec 22, 2025 - 04:29 UTC

Update

We are continuing to monitor for any further issues.
Posted Dec 22, 2025 - 03:18 UTC

Monitoring

The issue has been resolved, and services should be returning to normal. The team is continuing to monitor the system recovery. We will publish the RCA as soon as the team has had a chance to review the records, and determine a cause
Posted Dec 22, 2025 - 03:16 UTC

Update

We are continuing to work on a fix for this issue.
Posted Dec 22, 2025 - 02:54 UTC

Identified

The team has identified the source of the issue and has begun troubleshooting and remediation
Posted Dec 22, 2025 - 02:42 UTC

Update

Early indications point to an internal network issue, impacting communication between Buckets. External traffic is not impacted, and we do not see other customer-facing resources affected either.
Posted Dec 22, 2025 - 02:22 UTC

Investigating

The S3 team is currently working on an issue impacting S3 in Frankfurt. Based on the last reports, we are seeing High error rates likely caused by communication issues. we will update regularly as the issue is investigated and resolved.
Posted Dec 22, 2025 - 01:54 UTC
This incident affected: Location DE/FRA (Network, Object Storage).