There’s a great story over on the High Scalability blog about how Twitter became 10000% faster:
For us, it’s really about scaling horizontally – to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.
This is the story I wanted to tell in Multicore Language Timetable: a faster language pales in comparison to a more scalable language. In this case, Twitter didn’t have that luxury, Ruby wasn’t more scalable, but it did have sufficient facilities that they could rearchitect their app for horizontal scaling, which is utilization of more cores. It’s also the story of how the Multicore Crisis is here today and many of you have already experienced it.
Twitter learned several interesting lessons along the way that I’ve been hearing more and more:
- Don’t let the database be a bottleneck. We had the same view at Callidus, the Enterprise Software company I last worked at. We build a grid architecture and managed to offload enough so that a typical configuration was 75% Java grid computing array and 25% database. This was for a radically more database-intensive (financial processing) business application than most Web 2.0 apps like Twitter.
- You have to build it yourself. Unfortunately, there’s a lot of “almost” technology out there that doesn’t quite work. That’s really unfortunate because everyone keeps hitting this horizontal scaling problem and having to reinvent the wheel: 70% of the software you write is still wasted.
- Conventional database wisdom often tragically impairs scalability: More and more companies are denormalizing to minimize joins and leaving relational integrity as a problem solved outside the database.
- “Most performance comes not from language but from application design.” That’s a quote from the article, but I maintain it is also an artifact of using languages designed for a fundamentally different problem than what web scale applications face today. Because the languages aren’t meant to solve the scaling problem, we shouldn’t be surprised that they don’t.
Interestingly, Twitter still has 1 single mySQL DB for everything. It is massively backed up by in-memory caches that run on many machines, but at some point it can become the bottleneck too. They’ve worked hard to de-emphasize it, but ultimately they have to figure out how to horizontally scale that DB.