AUS01 - Power event and storage loss

Resolved·Partial outage

Severity Level: Critical

Our engineers have identified the root cause of the issue and they’re actively working on remediation. Starbase and other storage-dependent services in us-west-3 have been restored and are operating normally. The primary remaining issue is that some customer VMs are still experiencing stale or empty NFS mounts as a result of the earlier power event and mass reboots.

We are recommending affected customers reboot their VMs to restore file system access. In cases where a VM reboot does not resolve the issue, a hypervisor reboot may be necessary.

In the event you’re unsure as to whether your service is impacted by this incident, please open a support ticket. We’ll get back to you as soon as possible.

https://support.lambdalabs.com/hc/en-us/requests/new

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sun, Feb 15, 2026, 02:20 AM

(1 month ago)

Affected components

Feb 14, 2026, 10:48 PM

Feb 15, 2026, 02:20 AM

Updates

Resolved

https://support.lambdalabs.com/hc/en-us/requests/new

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sun, Feb 15, 2026, 02:20 AM

Identified

Severity Level: Critical

All nodes in AUS01 have been powered up and have established network connectivity.

The team completed physical checks and is now verifying system status before proceeding with logical startup tasks for storage. This is an important step toward restoring NFS and dependent services, but customer-facing storage is not yet confirmed available.

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sun, Feb 15, 2026, 01:23 AM(57 minutes earlier)

Monitoring

Severity Level: Critical

A power event at the AUS01 datacenter caused power loss, triggering reboots of 246 GPU nodes, all 4 spine switches, mCPU nodes, and VAST storage infrastructure.

We restored compute fabric and InfiniBand connectivity, bringing all spine switches and most leaf switches back online. Partial UPS redundancy has been restored, and work is ongoing to fully recover storage and protected power.

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sun, Feb 15, 2026, 12:30 AM(53 minutes earlier)

Investigating

Severity Level: Critical

Our engineers are currently investigating a power event at the AUS01 Datacenter. All GPU nodes in the AUS01 datacenter simultaneously rebooted at 21:06 UTC on February 14, 2026, affecting multiple customers. The nodes came back online quickly, but the reboots were unexpected and unplanned.

Additionally, storage for these nodes has also disconnected leading to failed workloads. Thank you for your patience as we continue troubleshooting the cause and restoring service
In the event you’re unsure as to whether your service is impacted by this incident, please open a support ticket. We’ll get back to you as soon as possible.

https://support.lambdalabs.com/hc/en-us/requests/new

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sun, Feb 15, 2026, 12:06 AM(23 minutes earlier)

Investigating

Severity Level: Critical

Our engineers are currently investigating a power event at the AUS01 Datacenter. All GPU nodes in the AUS01 datacenter simultaneously rebooted at 21:06 UTC on February 14, 2026, affecting multiple customers. The nodes came back online quickly, but the reboots were unexpected and unplanned.

Additionally, storage for these nodes has also disconnected leading to failed workloads. Thank you for your patience as we continue troubleshooting the cause and restoring service

In the event you’re unsure as to whether your service is impacted by this incident, please open a support ticket. We’ll get back to you as soon as possible.

https://support.lambdalabs.com/hc/en-us/requests/new

To be automatically notified when the status of this incident changes, please click on the “Subscribe to updates” button.

Thank you for your patience and understanding during this time.

Sat, Feb 14, 2026, 10:48 PM(1 hour earlier)