There was recent bellyaching in the blogosphere again about Twitter being down. Dave Winer grumbles, “What other basic form of communication goes down for 12 hours at a time?” There are various comments, and in the end, apparently it was about their moving ISP’s. Twitter themselves had this to say:
Twitter is humming along now after a late night. Our team worked earnestly into the night and morning on our largest and most complex maintenance project ever. Everything went pretty much according to plan except for one thing: an incorrect switch.
The switch in question caps traffic an unacceptable level. In order to correct this, we’ll need to get some hardware installed. Unfortunately, that means we’re not done with our datacenter move just yet. This type of work can be frustrating but it’s all towards Twitter’s highest goal: reliability.
Such moves are never easy, they always include a hitch of some kind, and the Twitter customer base is hopelessly addicted to the medium so Twitter hears about it whenever the turn the thing off for any period of time. I look at this and for me it’s just one more reason I wouldn’t want to own a datacenter.
Suppose your service, or maybe even Twitter, was built on Amazon’s Cloud or some other Utility Computing solution. You don’t own the servers, you are renting them. If loads go up, you can simply rent more in direct proportion to the loads and on 10 minutes notice. A recent High Scalability article on scaling Twitter shows they don’t really have all that many servers:
10 boxes, in other words. Now it comes time to upgrade. Much pain and frustration. To do it well, and without interruption, they really need 2 complete copies of their infrastructure. This way, they can prepare the new version and start cutting users over to it while leaving the old one running. When everyone is over, the old system can be decommissioned. For many startups, owning twice as much hardware as they use is just out of the question. The more successful they become, the more expensive it becomes to entertain such a luxury. Not so on a utility computing service like Amazon’s. Purchase the use of twice as many servers for just how long it takes for a successful upgrade and then cut them loose afterward.
There are detractors to the Amazon approach out there, but do we really think it would make Twitter much less reliable? What if it made it much more reliable?
Here’s another thought that runs rampant: how well would Amazon’s new SimpleDB work for a service like Twitter? It seems tailormade. Certainly the notion of a “texty” database with up to 1024 characters per field seems like a fit. It would be fascinating to see some of the Twitterati put up a Twitter clone on Amazon’s Web Services using SimpleDB just to see how well it works and how quickly it could be put together. Given the platform and the requirements of the application, it seems like it would not be that hard to do the experiment. It would certainly make for an interesting test of how well Amazon’s infrastructure really works.