Back in October the New York harbor paid an unwelcome visit to the datacenter that houses our servers. Followers of this space are aware of the heroic efforts that literally kept the lights on. Those events were inspiring and made us all proud to be part of Fog Creek. But as Sys Admins our job is to view heroic efforts as a failure in planning, preparation and architecture. So as soon as the flood waters receded we set ourselves on a path to improve the continuity and availability of FogBugz and Kiln, with the ultimate goal a second geographically diverse datacenter. As with all lofty goals the path has several large phases and milestones. Now, some of those milestones are upon us and it is time for an update.
The first major step is moving all of our servers from six individual colocation cabinets to seven racks in a cage. The actual move distance is only about 10 meters but represents an upgrade in almost every way. These infrastructure upgrades will enable us to support new releases the FogBugz and Kiln teams are about to ship (teaser alert: we’ve been dog fooding them the last couple of months and they are awesome, contact us if you want to participate in the beta). The cage will provide Fog Creek a solid foundation for the future and the ability to free up enough gear to equip a second datacenter.
We are striving to keep the impact of this move as transparent as possible. However, during March we will have three significant Saturday night maintenance windows during which FogBugz and Kiln will be unavailable. Our goal is to keep these outages a short as possible, as few as possible, and during a time of low usage for a majority of customers. To paraphrase Moltke “No project plan survives first contact with the server room,” so more details on the timing and scheduling will be forthcoming. Watch this space and our status blog for more details. We are committed to the Fog Creek guarantee of not wanting your money if you are not amazingly happy. These upgrades are a huge part of our commitment to live up to that guarantee. Please let us know if these outages materially impact your business and we will make it right.
Here are some questions you are probably asking and we have asked ourselves over the past months:
Why stay in Peer1?
Why stay in a datacenter located in one of the most expensive pieces of real estate, in a flood zone, in a shared building? All excellent questions, but as we debated the various answers we realized there was nothing wrong with our current datacenter that a second datacenter wouldn’t fix. In a disaster (natural or man made) situation, a second geographically diverse datacenter with a tested and practiced failover procedure is our best option for providing our customers with continued service. Fog Creek has grown up with Peer1 over the past 10 years and we have a great relationship with Mike and Scott (the real heroes of Sandy) who operate the facility. So we decided to stay and make Peer1 part of our overall strategy.
Why not move to the cloud like Trello?
We aren’t parochial about choosing technology solutions. We want the best technology to solve the problem at hand (just search for cloud vs. dedicated server if you are looking for parochial arguments). Before Sandy arrived we had begun the process of moving Trello to AWS. A technology stack well suited to horizontal scaling and stellar growth made Trello an excellent candidate for cloud hosting. FogBugz and Kiln use different technology stacks and are more I/O intensive than Trello. For these and many other reasons, the cloud isn’t the right solution for FogBugz and Kiln at this time.
We will have before and after pictures and some other fun vignettes into the process. If you want to work on awesome exciting projects, we are looking for experienced unstoppable Sys Admins!