November 26 Update: Power has been completely restored to the building where our data center is located and the situation is now considered back to normal. More details on the status blog.
For those following our status blog and Twitter feed, the events of the last two week have been hard to miss. We wanted to post a quick summary, current status, and a couple of thoughts.
On Monday, October 29th, we posted that all systems were up and we were actively monitoring the situation. Peer1 was ready to run on backup generator power for days if necessary. That evening, the storm surge from Sandy, assisted by a rising tide and full moon, flooded the basement of our data center, cutting off the fuel supply to the backup generator. The next morning, Peer1 informed us of an impending emergency shutdown of the generator. We executed a protective shutdown to prevent loss or corruption of customer data. Later, when we’d secured confirmation from Peer1 that there was no imminent danger of power loss, we restarted our systems. The total duration of this unplanned, voluntary downtime was 3 hours.
The extraordinary efforts of Fog Creek, Square Space, Stack Exchange and Peer1 over the next three days have been covered by several news outlets. For a blow by blow account, check out the latest Stack Exchange podcast.
Currently, the rooftop generator has a steady supply of fuel. A separate “roll-up” generator (the size of a shipping container) has been parked next to the build to provide a second source of power. Power at the data center has been switched back and forth between these two generators to confirm that continuous, redundant power is available.
The building is still not on city power. Thirty feet of seawater have had to be pumped out of the basement, leaving behind damaged electrical systems that must be repaired and replaced before safely connected to the grid.
We take the trust of our customers seriously. We’ve been gratified by the expressions of support and appreciation. We’ve been inspired by our colleagues’ competence and dedication. However, we believe that extraordinary efforts should not be necessary to maintain smooth operations.
The lessons we’ve taken from this downtime are many and they’re still being processed. The actions we’re taking cannot be summarized in one post-mortem, but the changes will be obvious over the next several months.
We don’t want your money if you’re not amazingly happy. If you were materially affected by this downtime, please email us at email@example.com and we will make it right.
The Fog Creek Sys Admin Team