Site icon SmoothSpan Blog

Memcached: When You Absolutely Positively Have to Get It To Scale the Next Day

Why should “Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0” care about a piece of technology like memcached?  Because it just might save your bacon, that’s why. 

Suppose your team have just finished working feverishly to implement “Virus-a-Go-Go”, your new Facebook widget that is guaranteed to soar to the top of the charts.  You launch it, and sure enough you were right.  Zillions of hits are suddenly raining down on your new widget. 

But you were also wrong.  You woefully underestimated in your architecture what it would take to handle the traffic that is now beating your pitiful little servers into oblivion.  Angry email is flowing so fast it threatens to overwhelm your mail servers.  Worse, someone has published your phone number in a blog post, and now it rings continuously.  Meanwhile, your software developers are telling you that your problem is in your use of the database (seems to be the problem so much of the time, doesn’t it?), and the architecture is inherently not scalable.  Translation:  they want a lot of time to make things faster, time you don’t have.  What to do, what to do?

With appologies to Federal Express, the point of my title is that memcached may be one of the fastest things you can retrofit to your software to make it scale.  Memcached when properly used has the potential to increase performance by hundreds or sometimes even thousands of times.  I don’t know if you can quite manage it overnight, but desperate times call for desperate measures.  Hopefully next time you are headed for trouble, you’ll start out with a memcached architecture in advance and buy yourself more time before you hit a scaling crunch.  Meanwhile, let me tell you more about this wonder drug, memcached.

What is memcached? 

Simply put, memcached sits between your database and whatever is generating too many queries on it and attempts to avoid repetitious queries by caching the answer in memory.  If you ask it for something it already knows, it retrieves it very quickly from memory.  If you ask it for something it doesn’t know, it gets it from the database, copies it into the cache for future reference and hands it over.  Someone once said, it doesn’t matter how smart you are, if the other guy already knows the answer to a hard question, he can blurt it out before you can figure it out.  And this is exactly what memcached does for you. 

The beautiful thing about memcached is that it can usually be added to your software without huge structural changes being necessary.  It sits as a relatively transparent layer that does the same thing your software has always done, but just a whole lot faster.  Most of the big sites use memcached to good effect.  Facebook, for example, uses 200 quad core machines that each have 16GB of RAM to create a 3 Terabyte memcached that apparently has a 99% hit rate.

Here’s another beautiful thought: memcached gives you a way to leverage lots of machines easily instead of rewriting your software to eliminate the scalability bottleneck.  You can run it on every spare machine you can lay hands on, and it soaks up available memory on those machines to get smarter and smarter and faster and faster.  Cool!  Think of it as a short term bandaid to help you overcome your own personal Multicore Crisis.

What’s Needed

Getting the memcached software is easy.  The next step I would take is to go read some case studies from folks using tools similar to yours and see how they’ve integrated memcache into their architectural fabric.  Next, you need to look for the right place to apply leverage with memcache within your application.  Memcached takes a string to use as a key, and returns a result associated with the key.  Some possibilities for keys include:

–  A sql query string

–  A name that makes sense:  <SocialNetID>.<List of Friends>

The point is that you are giving memcached a way to identify the result you are looking for.  Some keys are better than others–think about your choice of key carefully!

Now you can insert calls to memcached into your code in strategic places and it will begin to search the cache.  You’ll also need to handle the case where the cache has no entry by detecting it and telling memcached to add the missing entry.  Be careful not to create a race condition!  This happens when multiple hits on the cache (you’re using a cache because you expect multiple hits, right?) cause several processes to compete with who gets to put the answer into memcached.  There are straightforward solutions available, so don’t panic.

Last step?  You need to know when the answer is no longer valid and be prepared to tell memcached about it so it doesn’t give you back an answer that’s out of date.  This can sometimes be the hardest part, and giving careful thought to what sort of key you’re using is important to making this step easier.  Think carefully about how often its worth updating the cache too.  Sometimes availability is more important than consistency.  The more you update, the fewer hits on the cache will be made between updates, which will slow down your system.  Sometimes things don’t have to be right all the time.  For example, do you really need to be able to lookup which of your friends are online at the moment and be right every millisecond?  Perhaps it is good enough to be right every two minutes.  It’s certainly much easier to cache something changing on a two minute interval.

Memcached is Not Just for DB Scaling

The DB is often the heart of your scalability problem, but memcached can be used to cache all sorts of things.  An alternate example would be to cache some computation that is expensive, must be performed fairly often, and whose answer doesn’t change nearly as often as it is asked for.   Sessions can also be an excellent candidate for storage in memcached although strictly speaking, this is typically more DB caching.

Downsides to memcached

Alternatives to memcached

Conclusion

Memcached would seem to be an absolutely essential tool for scaling web software, but I noticed something funny while researching this article.  Scaling gets 765,013 hits on Google Blog Search.  Multicore (related to scaling) gets 88,960 hits.  Memcached only gets 15,062.  I can only hope that means most people already know about it and find the concepts easy enough to grasp that they don’t consider it worth writing about.  If only 1 in  50 of those writing about scaling know about memcached, its time more learned.

Go forward and scale out using memcached, but remember that its a tactical weapon.  You should also be thinking about how to take the next step, which is breaking down your DB architecture so it scales more easily using techniques like sharding, partitioning, and federated architectures.  This will let you apply multiple cores to the DB problem.  I’ll write more about that in a later post.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Exit mobile version