AI Model Hub - Service Degradations

Incident Report for IONOS CLOUD

Identified

The AI Model Hub is currently experiencing high demand. As a shared infrastructure environment, this temporary surge in utilization may result in increased latency and varied performance across the platform.

Current Impacts

- Increased Latency (TTFT): High GPU utilization may cause delayed responses (Time to First Token) when initiating model requests. We are continuously managing platform resources to maximize availability across all concurrent workloads.

- Llama 405B Constraints: Due to its scale and high compute requirements, the Llama 405B model is particularly sensitive to traffic spikes, leading to potential timeouts or higher latency—we recommend switching to alternative models for time-critical workloads requiring faster response times.

- Collections: Collections can also be influenced negatively by high demand and utilization. Please see the notes on new deployments in our documentation: https://docs.ionos.com/cloud/ai/ai-model-hub/how-tos/document-collections

Recommendations:

- To mitigate temporarily performance or availability issues, we encourage our customers and partners to implement (exponential backoff) retry mechanisms in their projects and pipelines. Usage peaks that lead in timeouts and degraded performance are usually transient.
- We kindly ask our customers to not create additional Support Tickets for questions and reports related to the performance of models in the Model Hub. Our Product and Tech Teams are aware and are actively monitoring and working on improving the service to meet demand.
- We ask Llama 405B users that face issues to consider a switch to GTP-OSS-120B instead. In most real world use cases, the model is more cost efficient and delivers a more robust performance

Outlook

The following measures are currently underway to improve the performance of the AI Model Hub:

- Various (model specific) optimizations are rolled out on a regular basis
- Improved cross-GPU load balancing (ETA July)
- Adjustments to error codes (replacing HTTP 5xx with HTTP 429) to allow clients to better understand and react to situations

Posted Jun 04, 2026 - 14:34 UTC

This incident affects: Global Services (AI Model Hub).