Kubernetes service restricted

Incident Report for IONOS Cloud

Postmortem

Incident Summary
On March 2, 2026, some Managed Kubernetes clusters in our Frankfurt (DE/FRA) region became stuck in an UPDATING or DEPLOYING state. This caused intermittent communication issues between control plane components and worker nodes.
Root Cause
The instability was caused by our control plane management system running out of memory and crashing under high load. When the system restarted, it was unable to properly resume its interrupted tasks, which left several clusters stuck mid-update.
Resolution
Our engineering teams quickly stabilized the system by increasing its memory allocation. Once stable, engineers manually recovered the remaining clusters that were stuck to restore full service.
Prevention
To prevent this from happening again, we are working to address an upstream software bug related to how the system handles memory-related crashes. Additionally, we are fixing an internal alerting issue that failed to notify our on-call team, ensuring a much faster response should a similar issue arise in the future.

Posted Mar 06, 2026 - 16:12 UTC

Resolved

This incident has been resolved.
Posted Mar 02, 2026 - 14:55 UTC

Identified

We are currently experiencing instability affecting Kubernetes clusters hosted in Frankfurt. Some clusters are stuck in an updating/deploying state, which is causing intermittent communication issues between control plane components and worker nodes. The issue may cause brief interruptions or degraded performance across impacted services. Our engineering teams are actively investigating and working to restore full stability as quickly as possible.
Posted Mar 02, 2026 - 12:03 UTC
This incident affected: Location DE/FRA (Managed Kubernetes).