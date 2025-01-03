OpenAI published an incident report detailing the cause of last weeks ChatGPT outage and what they are doing to prevent a repeat. The outage began on December 26th, 2024 at 10:40 AM and was mostly resolved by 3:11 PM, except for except ChatGPT which was 100% recovered by 6:20 PM.

The following services were impacted:

ChatGPT

Sora video creation

APIs: agents, realtime speech, batch, and DALL-E

Cause of OpenAI Outage

The cause of the outage was a cloud provider data center failure which impacted OpenAI databases. While the databases are mirrored across regions, switching over to a backup database required manual interaction on the part of the cloud provider to redirect operations to a backup datacenter in another region. The manual intervention was cited as how the outage was fixed but the given reason for why it took so long was the scale of the project.

A failover is an automated process for switching to a backup system in the event of a system failure. OpenAI announced that they are working toward creating infrastructure changes to improve responses to future cloud database failures.

OpenAI explained:

“In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases. This will allow significantly faster failover.”

Significant ChatGPT Outage

OpenAI’s said the ChatGPT outage was due to a regional cloud provider database failure but the effect was global, evidenced by user reports on social media from across Europe and North America.

Google Trends, which tracks search volume, indicates that this may have been the largest such event, with more people searching for information about it than for any previous outage.

