SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for June 20th, 2008

Degrees of Multi-Tenancy (Degrees of Green Crystals)

Posted by Bob Warfield on June 20, 2008

Phil Wainewright is hosting a great discussion on multi-tenancy than now spans two postings.  I encourage you to read through both if you have an interest in SaaS or multi-tenancy.  The discussions really underscore how much confusion there is around the term.  I wanted to make a couple of points to try to dispell some of the, um, cloudiness around this holiest-of-holy cloud computing/SaaS tenets.

First, you have to keep in mind that multi-tenancy is much more of a marketing event than a technology event.  Whoa!  That sort of thing will get me excommunicated from the SaaS Church of Benioff.  Well, I’m sorry, but it’s true.  Multi-tenancy is all about what we used to call “green crystals marketing” at Borland, a term I first heard from my friend (and then VP of Product Management) Rob Dickerson.

What is Green Crystals Marketing?  When you’re having a hard time differentiating, you find something unique and make it you green crystals.  They provide a reason to believe why your offering is better even if they aren’t the whole reason or even most of the reason.  In those days, VROOM (Virtual Real Time Object Oriented Memory) was Borland’s Green Crystals.  It reached a hilarious level of success when Bill Gates was left sputtering at one user group presentation when a member of the audience suggested he needed to license VROOM from Borland if he ever expected Windows to run on the machines of the day.  VROOM was in fact a very sophisticated overlay and memory manager, and a neat piece of technology, but it’s marketing presence was far larger than its technology reality.  It was written by Istvan Cseri (now runs a big part of MSFT SQL Server) and other really bright people, so I don’t mean to take anything away from it.  It delivers real value, but not necessarily as much as the hype would imply.

BTW, it’s called green crystals marketing due to soap advertising.  Why is our soap better?  Because it has green crystals. 

Multi-tenancy was Mark Benioff’s Green Crystals for SaaS.  He had to differentiate his offering from the ASP’s of the day, and we again see the ASP curse word applied to companies who do not sufficiently comply with the vision of multi-tenancy.

Now let’s move on to the technology and a little more hard edged view of the realities.  We’re going to leave the marketing aside.

Multi-tenancy is ultimately about cost, when we look at what it delivers to the business.  It is more cost-effective in two ways.  First, it reduces machine resource requiremetns–cpu, memory, and disk.  Second, it reduces operational costs (but it isn’t the silver bullet many have claimed).  Because its goal is to reduce the number of instances, and to align everyone’s schemas, it becomes cheaper and easier to manage, and fewer admins are required.  Let’s look at each in turn, and focus on different variants of “multi-tenancy”, including some that many may not regard as “pure” enough to be called multi-tenant.

On the machine resource side, one can look at the various components, cpu, memory, and disk, and reach some conclusions.  Let’s start out with putting a single customer on one or more machines.  This is one everyone agrees is not multi-tenant.  If you use this model, you’re not sharing any resources and every customer needs enough resources for their maximum usage.  It also means a lot more machines for administrators to touch in order to keep things humming along, do backups, do upgrades, and whatever else comes up.  Take this one as a baseline we can improve upon.

Next up is to employ virtualization.  It still looks like every customer has their own complete set of software, but we’re able to put multiple customers on a single configuration (could be multiple machines if we run clustering) and thereby share some resources.  Note that this sharing will largely be variable cost sharing.  Fixed costs will still mount up.  What do I mean by fixed versus variable?  Assume every customer gets a copy of MySQL or Oracle.  There is a fixed cost to bring up an empty MySQL or Oracle schema that is charged against every customer.  However, variable costs are easily shared among the various virtual instances that exist on a single configuration, so costs are lower.  Virtualization helps administrators a bit, given that there are fewer physical boxes to touch, but there are still a lot of instances to keep up with.

Okay, let’s jump to one of the “pure” multi-tenant models.  The classic one.  In this model, we have multi-tenancy right down to the tables.  Let’s say we have an “Accounts” table that lists companies in a CRM system.  Each row corresponds to a company.  There is a column that designates which tenant owns the row.  Software is carefully written so that the column is always accessed and no tenant can see another tenant’s rows (you can see there is some potential for a mistake here though).  Efficiency is much greater because we eliminate the fixed cost overhead.  However many tenants can run on a single instance of MySQL or Oracle get to share those fixed costs instead of charging them over and over again.  There really is just one schema for administrators to look after, so the model is a lot cheaper.

Is this the best possible model?  Perhaps.  It does have a drawback or two. For example, the cost of the column to identify the tenant is now being charged on every single row of every table.  Very likely it isn’t a big cost, but it is there.  Tables will get bigger too, as all the tenants are piled in.  Presumably this can lead to scaling issues sooner.  We can federate the tables by breaking them apart into sub-tables that still have groups of tenants.  Another important consideration is that if we ever needed to do reporting on data from multiple tenants, that’s pretty easy.   We may even use our notion of “tenant” to include the divisions or business units of a larger organization.

One last model I want to mention:  multiple-schemas-on-a-server.  In this model, we don’t comingle tenants within a table.  Each tenant has their own set of tables.  Scaling is easy, we can just move the tables onto new servers.  There are some fixed costs to having more tables, but they’re often less than the cost of the extra column, and they’re way less than virtualization-style fixed costs because we are still stacking multiple tenants within a database.  This is actually a pretty powerful model.  It gives the ability to manage scaling pretty easily.  It is slightly harder to roll out changes because you roll them out to a bunch of tables rather than a single table, but that still is not too bad.   This model can also be done with less fundamental rearchitecting than a columnar multi-tenant model.

Things brings me to a definition for multi-tenancy that is the only one that makes much sense to me:

Multi-tenancy is software that to the third party server makes it transparent that there is more than one tenant running there.

My database server has no idea whether I have 1, 20, or 200 tenants whether I run columnar or multi-table.  Hence I see it as multi-tenant.  Virtualization, which may be just fine economically, is not really multi-tenant because we’re just sharing the hardware, not the software.  I don’t see these two models in terms of “purity” or “degree” (Phil Wainewright has First, Second, and Lesser Degrees in his discussion) because I can show you advantages for either of these two over the other, but both of these have significant advantages over the other models I’ve seen.

So what’s cheaper?  The latter two models, either columnar or multi-table multitenancy will be cheaper unless you run so few tenants per machine it doesn’t matter.  This is likely a function fo the size deals you’re closing.  Salesforce averages 20-odd seats per deal, so they want to cram a lot of tenants onto a single schema.  Others may run large enough deals that virtualization is fine, and I have certainly talked to some such.

There’s just one problem with all this:  the machine resources, fixed and variable, are not the lion’s share of the cost to deliver a service.  It’s Operations headcount.  While these models do somewhat ameliorate those costs, they are not the final word.  The final word is relentless automation of operations.  Facebook manages to adminster 1800 MySQL servers per DBA.  I would venture to say most SaaS vendors are nowhere close to that level of efficiency regardless of which model they run.  I certainly haven’t talked to anyone who was.  If you had sufficiently automated your operations, you could run any of the models I mentioned and still get relatively cheap costs.  This automation is the real driver of SaaS efficiency, but it isn’t sexy.  It isn’t green crystals, so nobody talks about it much.

Posted in platforms, saas | 15 Comments »