On 6 December at approximately 22:49 GMT we were alerted to a disruption in the UK Service. The full outage persisted until 22:59 GMT. Following the initial outage there were fluctuations in the Service response and stability that continued through 7 December at approximately 02:51 GMT. During this window we could see from our instrumentation that some customers were able to use the Service, but there were also some that may have been experiencing delays or problems authenticating.
Our investigation revealed that there was a network disruption at the start of the outage for approximately 2 minutes. This caused additional safety systems to kick in that protected several VMs from corruption. The remaining 8 minutes of the initial outage involved the timing around the VMs to restart. The remaining issues through 02:51 GMT involved intermittent connection problems with our web and database servers as a result of the initial incident.
We are performing a thorough analysis of the events that led to the initial network disruption and have already implemented adjustments to our traffic monitoring and alerting systems. These adjustments would have altered how we responded to the incident and we are confident these changes, along with an additional process change, will prevent this same type of disruption from happening in the future.
We apologize for any inconvenience this disruption may have caused.