AI Model Hub Service Degradation and Increased Error Rate

Incident Report for IONOS Cloud

Postmortem

What happened: On November 26, 2025, at 12:19 UTC, we observed a decline in usage-related metrics in our AI Model Hub. This was a result of connectivity issues in the secure connection to our backend inference engines.

What we did: After resolving the configuration issue the environment recovered.

What we are doing to avoid recurrence:

  • To spot a decline in usage metrics more quickly, we have updated our alerting thresholds.
  • Automatic health checks will be expanded.
Posted Nov 26, 2025 - 14:24 UTC

Resolved

All metrics have returned to baseline. We are marking this incident as resolved. We will publish a Root Cause Analysis here shortly.
Posted Nov 26, 2025 - 13:46 UTC

Monitoring

The Infrastructure Team has deployed the fix, and we are seeing improvements in availability. We are monitoring the situation until all metrics return to expected baselines.
Posted Nov 26, 2025 - 13:28 UTC

Identified

We have identified a connectivity issue preventing communication to the inferencing backends. The Infrastructure Team is currently working on a fix. We will provide the next update by 13:45 UTC.
Posted Nov 26, 2025 - 13:15 UTC

Update

We are updating the impact to outage. The AI Model Hub Team is currently investigating and is working to restore the service as quickly as possible. We will post the next update here 13:15 UTC latest.
Posted Nov 26, 2025 - 12:43 UTC

Update

We are continuing to investigate this issue.
Posted Nov 26, 2025 - 12:40 UTC

Investigating

We are currently investigating increased error rate in our AI Model Hub.
Posted Nov 26, 2025 - 12:40 UTC
This incident affected: Global Services (AI Model Hub).