Of course, you should never go up in a burning building, go out instead. Amazon’s Werner Voegels sees the Multicore Crisis in much the same way:
Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.
Voegels is writing there about Michael Stonebreaker’s claims that he can demonstrate a database architecture that outperforms conventional databases by a factor of 50X. Stonebreaker is no one to take lightly: he’s accomplished a lot of innovation in his career so far and he isn’t nearly done. He advocates replacing the Oracle (and mySQL) style databases (which he calls legacy databases) with a collection of special purpose databases that are optimized for particular tasks such as OLTP or data warehousing. It’s not unlike the concept myself and others have talked about that suggests that the one-language-fits-all paradigm is all wrong and you’d do better to adopt polyglot programming.
I like Stonebreaker’s work. While I want the ability to scale out to any level that Voegels suggests, I will take the 50X improvement as a basic building block and then scale that out if I can. That’s a significant scaling factor even looked at in the terms of the Multicore Language Timetable. It’s nearly 8 years of Moore’s Cycles. I’m also mindful that databases are the doorway to the I/O side of the equation which is often a lot harder to scale out. Backing an engine that’s 50X faster sucking the bits off the disk with memcached ought to lead to some pretty amazing performance.
But Voegels is right, in the long term we need to see different beasts than the elephants. It was with that thought in mind that I’ve been reading with interest articles about Sequoia, an open source database clustering technology that makes a collection of database servers look like one more powerful server. It can be used to increase performance and reliablity. It’s worth noting that Sequoia can be installed for any Java app using JDBC without modifying the app. Their clever monicker for their technology is RAIDb: Redundant Array of Inexpensive Databases. There are different levels of RAIDb just as there are RAID levels that allow for partitioning, mirroring, and replication. The choice of level or combinations of levels governs whether your applications gets more performance, more reliability, or both.
Sequoia is not a panacea, but for some types of benchmarks such as TPC-W, it shows a nearly linear speedup as more cpus are added. It seems likely a combination of approaches such as Stonebreaker’s specialized databases for particular niches and clustering approaches like Sequoia all running on a utility computing fabric such as Amazon’s EC2 will finally break the multicore logjam for databases.