Update - We want to share another note on the recovery efforts. While at the infrastructure level the incident is resolved, we recommend checking the status of virtual machines and resources deployed in your environment.
Depending on how the operating systems and/or applications have been able to deal with the loss of connectivity to the storage, crashes and freezes may have occurred that might require manual intervention (e.g., a restart) to resolve.
Unfortunately, we cannot perform this action for you and want to make you aware of this important check to make sure your environment is fully operational after this incident.
The team is currently working on compiling the Root Cause Analysis. We expect that we will have a preliminary version to share with you today.
Nov 18, 2025 - 01:43 UTC
Monitoring - All the remaining volumes have been successfully recovered and redundancy is being established to the last volumes, as well. We also have seen services recover. We are closely monitoring the situation and are completing final tasks to close the incident.
We have also started investigation into the root cause of the incident and will publish the analysis here.
We want to thank you for your patience thus far and commit to a transparent and thorough investigation.
Nov 18, 2025 - 01:01 UTC
Update - We believe we have identified a solution for the volumes that are still unavailable (approx.. 10%) and are currently rolling out the fix to them. In parallel, we are working to restore redundancy to the already-recovered volumes and are making good progress in this area, as well.
Nov 18, 2025 - 00:27 UTC
Update - The recovery of the affected volumes has progressed well overall. We have a subset of volumes were recovery takes longer than expected. We are currently looking into restoring the remaining volumes and will then restore redundancy.
Nov 18, 2025 - 00:11 UTC
Update - As recovery efforts progress, we are beginning to see improvements in service availability. While we are still actively mitigating the incident, we want to share a preliminary, high-level overview of the situation:
We have identified a machine that caused a widespread network disruption in FRA. This disruption cascaded, degrading the performance of a network that provided connectivity to a storage server. The resulting loss of connectivity led to service disruptions and degradation for compute resources in the datacenter.
We will keep you updated about the progress of the recovery as well as insights into the root cause of the incident.
Nov 17, 2025 - 23:44 UTC
Update - We needed to update the scope of the incident. A recovery plan has been put together, and recovery efforts are currently already underway. We are monitoring the process closely and will share more details as they become available. Another update will be made until 00:00 UTC latest.
Nov 17, 2025 - 23:31 UTC
Update - We are continuing to work on a fix for this issue.
Nov 17, 2025 - 23:27 UTC
Identified - We are currently recovering a storage server outage in the datacenter. We are currently migrating affected storages/VMs from the host. We will provide another status update before 23:30 UTC.
Nov 17, 2025 - 23:14 UTC
Investigating - We are currently investigating a storage issue
Nov 17, 2025 - 22:55 UTC