test
Post-Mortem: Dallas, TX Data Center Outage – December 28, 2025
Incident Summary
On December 28, 2025, the Dallas, TX data center experienced an outage affecting TX1. The initial outage began at approximately 6:06 PM Eastern Time due to a brief utility interruption that caused HVAC systems to fail. This led to elevated temperatures and automatic shutdowns of critical equipment.After the facility was stabilized, it was discovered that TX1 remained offline for a total of 16 hours and 48 minutes due to a failed SFP on the switch. The data center took an extended period to acknowledge the issue and replace the faulty component, resulting in prolonged downtime for this server.
We have opened an investigation regarding the data center’s delay in restoring network connectivity for TX1 after the HVAC was restored. Additional updates will be provided as we work through our internal processes for evaluating incidents such as this.
Timeline of Events
- 6:06 PM ET – NOC observed multiple servers going offline at the Dallas location. Investigation began immediately.
- 6:13 PM ET – High temperatures detected on core network devices. Monitoring continued.
- 6:43 PM ET – Elevated temperatures persisted. HVAC failure suspected. Facility contacted; on-call technicians dispatched.
- 6:52 PM ET – Facilities arrived and began restoring CRAC and rooftop HVAC units. Cooling estimated to take ~30 minutes.
- 7:25 PM ET – Facility still warm (~89°F). Estimated 15 more minutes until safe for equipment power-up.
- 7:43 PM ET – Facility reached ~80°F. Network equipment began powering up gradually.
- 8:21 PM ET – All network devices restored. Servers being powered up in controlled sequence.
- 9:27 PM ET – Most servers restored. However, TX1 remained offline due to a faulty SFP on the switch. Extended downtime continued until the SFP was replaced the following morning.
- 11:32 AM ET (Dec 29) – TX1 fully restored after the SFP replacement and verification.
Impact
- Affected Server: TX1
- Downtime: ~17 hours (6:37 PM ET Dec 28 – 11:32 AM ET Dec 29)
- Customer Impact: Extended loss of service for TX1
Root Cause
The outage had two main contributing factors:
1. Data Center HVAC Failure: A utility interruption caused the HVAC systems to fail, which led to elevated temperatures and automatic shutdowns of critical equipment.
2. Faulty SFP on Switch for TX1: After temperatures normalized, TX1 remained offline because the SFP module on the switch had failed. The data center took a long time to acknowledge and replace the faulty hardware, resulting in prolonged downtime.
Additional notes from the facility:
- Core networking and servers shut down automatically due to high temperatures
- Some servers experienced temporary connectivity issues during recovery
A full Root Cause Analysis (RCA) will be provided by the facility.
Resolution
- CRAC and rooftop HVAC units restored and operating normally
- Facility temperature returned to safe levels (~80°F)
- Network equipment powered on first, followed by servers
- TX1 restored after SFP replacement (~16 hours total downtime)
