May 2024
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archive for the ‘grid’ Category

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Posted by Bob Warfield on August 27, 2010

My head is starting to hurt with all the back and forth among my Enterprise Irregulars buddies about the relationships between the complex concepts of Multitenancy, Private, and Public Clouds. A set of disjoint conversations and posts came together like the whirlpool in the bottom of a tub when it drains. I was busy with other things and didn’t get a chance to really respond until I was well and truly sucked into the vortex. Apologies for the long post, but so many wonderful cans of worms finally got opened that I just have to try to deal with a few of them. That’s why I love these Irregulars!

To start, let me rehash some of the many memes that had me preparing to respond:

– Josh Greenbaum’s assertion that Multitenancy is a Vendor, not a Customer Issue. This post includes some choice observations like:

While the benefits that multi-tenancy can provide are manifold for the vendor, these rationales don’t hold water on the user side.

That is not to say that customers can’t benefit from multi-tenancy. They can, but the effects of multi-tenancy for users are side-benefits, subordinate to the vendors’ benefits. This means, IMO, that a customer that looks at multi-tenancy as a key criteria for acquiring a new piece of functionality is basing their decision on factors that are not directly relevant to their TCO, all other factors being equal.

and:

Multi-tenancy promises to age gracelessly as this market matures.

Not to mention:

Most of the main benefits of multi-tenancy – every customer is on the same version and is updated simultaneously, in particular – are vendor benefits that don’t intrinsically benefit customers directly.

The implication being that someone somewhere will provide an alternate technology very soon that works just as good or better than multitenancy. Wow. Lots to disagree with there. My ears are still ringing from the sound of the steel gauntlet that was thrown down.

– Phil Wainewright took a little of the edge of my ire with his response post to Josh, “Single Tenancy, the DEC Rainbow of SaaS.” Basically, Phil says that any would-be SaaS vendor trying to create an offering without multitenancy is doomed as the DEC Rainbow was. They have some that sort of walks and quacks like a SaaS offering but that can’t really deliver the goods.

– Well of course Josh had to respond with a post that ends with:

I think the pricing and services pressure of the multi-tenant vendors will force single-tenant vendors to make their offerings as compatible as possible. But as long as they are compatible with the promises of multi-tenancy, they don’t need to actually be multi-tenant to compete in the market.

That’s kind of like saying, “I’m right so long as nothing happens to make me wrong.” Where are the facts that show this counter case is anything beyond imagination? Who has built a SaaS application that does not include multitenancy but that delivers all the benefits?

Meanwhile back at the ranch (we EI’s need a colorful name for our private community where the feathers really start to fly as we chew the bones of some good debates), still more fascinating points and counterpoints were being made as the topic of public vs private clouds came up (paraphrasing):

– Is there any value in private clouds?

– Do public clouds result in less lock-in than private clouds?

– Are private clouds and single tenant (sic) SaaS apps just Old School vendors attempts to hang on while the New Era dawns? Attempts that will ultimately prove terribly flawed?

– Can the economics of private clouds ever compete with public?

– BTW, eBay now uses Amazon for “burst” loads and purchases servers for a few hours at a time on their peak periods. Cool!

– Companies like Eucalyptus and Nimbula are trying to make Private Clouds that are completely fungible with Public Clouds. If you in private cloud frameworks like these means you have
to believe companies are going to be running / owning their own servers for a long time to come even if the public cloud guys take over a number of compute workloads. The Nimbula guys built EC2 and they’re no dummies, so if they believe in this, there must be something to it.

– There are two kinds of clouds – real and virtual. Real clouds are multi-tenant. Virtual clouds are not. Virtualization is an amazing technology but it can’t compete with bottoms up multi-tenant platforms and apps.

Stop! Let me off this merry go-round and let’s talk.

What It Is and Why Multitenancy Matters

Sorry Josh, but Multitenancy isn’t marketing like Intel Inside (BTW, do you notice Intel wound up everywhere anyway? That wasn’t marketing either), and it matters to more than just vendors. Why?

Push aside all of the partisan definitions of multitenancy (all your customers go in the same table or not). Let’s look at the fundamental difference between virtualization and multitenancy, since these two seem to be fighting it out.

Virtualization takes multiple copies of your entire software stack and lets them coexist on the same machine. Whereas before you had one OS, one DB, and one copy of your app, now you may have 10 of each. Each of the 10 may be a different version entirely. Each may be a different customer entirely, as they share a machine. For each of them, life is just like they had their own dedicated server. Cool. No wonder VMWare is so successful. That’s a handy thing to do.

Multitenancy is a little different. Instead of 10 copies of the OS, 10 copies of the DB, and 10 copies of the app, it has 1 OS, 1 DB, and 1 app on the server. But, through judicious modifications to the app, it allows those 10 customers to all peacefully coexist within the app just as though they had it entirely to themselves.

Can you see the pros and cons of each? Let’s start with cost. Every SaaS vendor that has multitenancy crows about this, because its true. Don’t believe me? Plug in your VM software, go install Oracle 10 times across 10 different virtual machines. Now add up how much disk space that uses, how much RAM it uses when all 10 are running, and so on. This is before you’ve put a single byte of information into Oracle or even started up an app. Compare that to having installed 1 copy of Oracle on a machine, but not putting any data into it. Dang! That VM has used up a heck of a lot of resources before I even get started!

If you don’t think that the overhead of 10 copies of the stack has an impact on TCO, you either have in mind a very interesting application + customer combination (some do exist, and I have written about them), or you just don’t understand. 10x the hardware to handle the “before you put in data” requirements are not cheap. Whatever overhead is involved in making that more cumbersome to automate is not cheap. Heck, 10x more Oracle licenses is very not cheap. I know SaaS companies who complain their single biggest ops cost is their Oracle licenses.

However, if all works well, that’s a fixed cost to have all those copies, and you can start adding data by customers to each virtual Oracle, and things will be okay from that point on. But, take my word for it, there is no free lunch. The VM world will be slower and less nimble to share resources between the different Virtual Machines than a Multitenant App can be. The reason is that by the time it knows it even needs to share, it is too late. Shifting things around to take resource from one VM and give it to another takes time. By contrast, the Multitenant App knows what is going on inside the App because it is the App. It can even anticipate needs (e.g. that customer is in UK and they’re going to wake up x hours before my customers in the US, so I will put them on the same machine because they mostly use the machine at different times).

So, no, there is not some magic technology that will make multitenant obsolete. There may be some new marketing label on some technology that makes multitenancy automatic and implicit, but if it does what I describe, it is multitenant. It will age gracefully for a long time to come despite the indignities that petty competition and marketing labels will bring to bear on it.

What’s the Relationship of Clouds and Multitenancy?

Must Real Clouds be Multitenant?

Sorry, but Real Clouds are not Multitenant because they’re based on Virtualization not Multitenancy in any sense such as I just defined. In fact, EC2 doesn’t share a core with multiple virtual machines because it can’t. If one of the VM’s started sucking up all the cycles, the other would suffer terrible performance and the hypervisors don’t really have a way to deal with that. Imagine having to shut down one of the virtual machines and move it onto other hardware to load balance. That’s not a simple or fast operation. Multi-tasking operating systems expect a context switch to be as fast as possible, and that’s what we’re talking about. That’s part of what I mean by the VM solution being less nimble. So instead, cores get allocated to a particular VM. That doesn’t mean a server can’t have multiple tenants, just that at the granularity of a core, things have to be kept clean and not dynamically moved around.

Note to rocket scientists and entrepreneurs out there–if you could create a new hardware architecture that was really fast at the Virtual Machine load balancing, you would have a winner. So far, there is no good hardware architecture to facilitate a tenant swap inside a core at a seamless enough granularity to allow the sharing. In the Multicore Era, this would be the Killer Architecture for Cloud Computing. If you get all the right patents, you’ll be rich and Intel will be sad. OTOH, if Intel and VMWare got their heads together and figured it out, it would be like ole Jack Burton said, “You can go off and rule the universe from beyond the grave.”

But, it isn’t quite so black and white. While EC2 is not multitenant at the core level, it sort of is at the server level as we discussed. And, services like S3 are multitenant through and through. Should we cut them some slack? In a word, “No.” Even though an awful lot of the overall stack cost (network, cpu, and storage) is pretty well multitenant, I still wind up installing those 10 copies of Oracle and I still have the same economic disadvantage as the VM scenario. Multitenancy is an Application characteristic, or at the very least, a deep platform characteristic. If I build my app on Force.com, it is automatically multitenant. If I build it on Amazon Web Services, it is not automatic.

But isn’t there Any Multitenant-like Advantage to the Cloud? And how do Public and Private Compare?

Yes, there are tons of benefits to the Cloud, and through an understanding and definition of them, we will tease out the relationship of Public and Private Clouds. Let me explain…

There are two primary advantages to the Cloud: it is a Software Service and it is Elastic. If you don’t have those advantages, you don’t have a Cloud. Let’s drill down.

The Cloud is a Software Service, first and foremost. I can spin up and control a server entirely through a set of API’s. I never have to go into a Data Center cage. I never have to ask someone at the Data Center to go into the Cage (though that would be a Service, just not a Software Service, an important distinction). This is powerful for basically the same reasons that SaaS is powerful versus doing it yourself with On-prem software. Think Cloud = SaaS and Data Center = On Prem and extrapolate and you’ll have it.

Since Cloud is a standardized service, we expect all the same benefits as SaaS:

– They know their service better than I do since it is their whole business. So I should expect they will run it better and more efficiently.

– Upgrades to that service are transparent and painless (try that on your own data center, buddy!).

– When one customer has a problem, the Service knows and often fixes it before the others even know it exists. Yes Josh, there is value in SaaS running everyone on the same release. I surveyed Tech Support managers one time and asked them one simple question: How many open problems in your trouble ticketing system are fixed in the current release? The answers were astounding–40 to 80%. Imagine a world where your customers see 40 to 80% fewer problems. It’s a good thing!

– That service has economic buying power that you don’t have because it is aggregated across many customers. They can get better deals on their hardware and order so much of it that the world will build it precisely to their specs. They can get stuff you can’t, and they can invest in R&D you can’t. Again, because it is aggregated across many customers. A Startup running in the Amazon Cloud can have multipe redundant data centers on multiple continents. Most SaaS companies don’t get to building multiple data centers until they are way past having gone public.

– Because it is a Software Service, you can invest your Ops time in automation, rather than in crawling around Data Center cages. You don’t need to hire anyone who knows how to hot swap a disk or take a backup. You need peeps who know how to write automation scripts. Those scripts are a leveragable asset that will permanently lower your costs in a dramatic way. You have reallocated your costs from basic Data Center grubbing around (where does this patch cable go, Bruce?), an expense, to actually building an asset.

The list goes on.

The second benefit is Elasticity. It’s another form of aggregation benefit. They have spare capacity because everyone doesn’t use all the hardware all the time. Whatever % isn’t utilized, it is a large amount of hardware, because it is aggregated. It’s more than you can afford to have sitting around idle in your own data center. Because of that, they don’t have to sell it to you in perpetuity. You can rent it as you need it, just like eBay does for bursting. There are tons of new operational strategies that are suddenly available to you by taking advantage of Elasticity.

Let me give you just one. For SaaS companies, it is really easy to do Beta Tests. You don’t have to buy 2x the hardware in perpetuity. You just need to rent it for the duration of the Beta Test and every single customer can access their instance with their data to their heart’s content. Trust me, they will like that.

What about Public Versus Private Clouds?

Hang on, we’re almost there, and it seems like it has been a worthwhile journey.

Start with, “What’s a Private Cloud?” Let’s take all the technology of a Public Cloud (heck, the Nimbulla guys built EC2, so they know how to do this), and create a Private Cloud. The Private Cloud is one restricted to a single customer. It’d be kind of like taking a copy of Salesforce.com’s software, and installing it at Citibank for their private use. Multitenant with only one tenant. Do you hear the sound of one hand clapping yet? Yep, it hurts my head too, just thinking about it. But we must.

Pawing through the various advantages we’ve discussed for the Cloud, there are still some that accrue to a Cloud of One Customer:

– It is still a Software Service that we can control via API’s, so we can invest in Ops Automation. In a sense, you can spin up a new Virtual Data Center (I like that word better than Private Cloud, because it’s closer to the truth) on 10 minutes notice. No waiting for servers to be shipped. No uncrating and testing. No shoving into racks and connecting cables. Push a button, get a Data Center.

– You get the buying power advantages of the Cloud Vendor if they supply your Private Cloud, though not if you buy software and build your Private Cloud. Hmmm, wonder what terminology is needed to make that distinction? Forrester says it’s either a Private Cloud (company owns their own Cloud) or a Hosted Virtual Private Cloud. Cumbersome.

But, and this is a huge one, the granularity is huge, and there is way less Elasticity. Sure, you can spin up a Data Center, but depending on its size, it’s a much bigger on/off switch. You likely will have to commit to buy more capacity for a longer time at a bigger price in order for the Cloud Provider to recoup giving you so much more control. They have to clear other customers away from a larger security zone before you can occupy it, instead of letting your VM’s commingle with other VM’s on the same box. You may lose the more multitenant-like advantages of the storage cluster and the network infrastructure (remember, only EC2 was stuck being pure virtual).

What Does it All Mean, and What Should My Company Do?

Did you see Forrester’s conclusion that most companies are not yet ready to embrace the Cloud and won’t be for a long time?

I love the way Big Organizations think about things (not!). Since their goal is preservation of wealth and status, it’s all about risk mitigation whether that is risk to the org or to the individual career. A common strategy is to take some revolutionary thing (like SaaS, Multitenancy, or the Cloud), and break it down into costs and benefits. Further, there needs to be a phased modular approach that over time, captures all the benefits with as little cost as possible. And each phase has to have a defined completion so we can stop, evaluate whether we succeeded, celebrate the success, punish those who didn’t play the politics well enough, check in with stakeholders, and sing that Big Company Round of Kumbaya. Yay!

In this case, we have a 5 year plan for CIO’s. Do you remember anything else, maybe from the Cold War, that used to work on 5 year plans? Never mind.

It asserts that before you are ready for the Cloud, you have to cross some of those modular hurdles:

A company will need a standardized operating procedure, fully-automated deployment and management (to avoid human error) and self-service access for developers. It will also need each of its business divisions – finance, HR, engineering, etc – to be sharing the same infrastructure. In fact, there are four evolutionary stages that it takes to get there, starting with an acclimation stage where users are getting used to and comfortable with online apps, working to convince leaders of the various business divisions to be guinea pigs. Beyond that, there’s the rollout itself and then the optimization to fine-tune it.

Holy CYA, Batman! Do you think eBay spent 5 years figuring out whether it could benefit from bursting to the Cloud before it just did it?

There’s a part of me that says if your IT org is so behind the times it needs 5 years just to understand it all, then you should quit doing anything on-premise and get it all into the hands of SaaS vendors. They’re already so far beyond you that they must have a huge advantage. There is a another part that says, “Gee guys, you don’t have to be able to build an automobile factory as good as Toyota to be able to drive a car.”

But then sanity and Political Correctness prevail, I come back down to Earth, and I realize we are ready to summarize. There are 4 levels of Cloud Maturity (Hey, I know the Big Co IT Guys are feeling more comfortable already, they can deal with a Capability and Maturity Model, right?):

Level 1: Dabbling. You are using some Virtualization or Cloud technology a little bit at your org in order to learn. You now know what a Machine Image is, and you have at least seen a server that can run them and swapped a few in and out so that you experience the pleasures of doing amazing things without crawling around the Data Center Cage.

Level 2: Private Cloud. You were impressed enough by Level 1 that you want the benefits of Cloud Technology for as much of your operation as you can as fast as you can get it. But, you are not yet ready to relinquish much of any control. For Early Level 2, you may very well insist on a Private Cloud you own entirely. Later stage Level 2 and you will seek a Hosted Virtual Private Cloud.

Level 3: Public Cloud. This has been cool, but you are ready to embrace Elasticity. You tripped into it with a little bit of Bursting like eBay, but you are gradually realizing that the latency between your Data Center and the Cloud is really painful. To fix that, you went to a Hosted Virtual Private Cloud. Now that your data is in that Cloud and Bursting works well, you are realizing that the data is already stepping outside your Private Cloud pretty often anyway. And you’ve had to come to terms with it. So why not go the rest of the way and pick up some Elasticity?

Level 4: SaaS Multitenant. Eventually, you conclude that you’re still micromanaging your software too much and it isn’t adding any value unique to your organization. Plus, most of the software you can buy and run in your Public Cloud world is pretty darned antiquated anyway. It hasn’t been rearchitected since the late 80’s and early 90’s. Not really. What would an app look like if it was built from the ground up to live in the Cloud, to connect Customers the way the Internet has been going, to be Social, to do all the rest? Welcome to SaaS Multitenant. Now you can finally get completely out of Software Operations and start delivering value.

BTW, you don’t have to take the levels one at a time. It will cost you a lot more and be a lot more painful if you do. That’s my problem with the Forrester analysis. Pick the level that is as far along as you can possibly stomach, add one to that, and go. Ironically, not only is it cheaper to go directly to the end game, but each level is cheaper for you on a wide scale usage basis all by itself. In other words, it’s cheaper for you to do Public Cloud than Private Cloud. And it’s WAY cheaper to go Public Cloud than to try Private Cloud for a time and then go Public Cloud. Switching to a SaaS Multitenant app is cheaper still.

Welcome to crazy world of learning how to work and play well together when efficiently sharing your computing resources with friends and strangers!

Posted in amazon, cloud, data center, ec2, enterprise software, grid, multicore, platforms, saas, service | 15 Comments »

What’s Hadoop Good For?

Posted by Bob Warfield on June 24, 2010

Hadoop, for those who haven’t heard of it, is an Open Source version of Google’s Map Reduce distributed computing algorithm. After reading that Adobe has agreed to Open Source their Puppet modules for managing Hadoop, I got curious about what Adobe might be doing with it. It didn’t take long on Google to find a cool Wiki page showing what a whole bunch of companies use Hadoop for.

I went in thinking (actually without too much thinking, LOL) that Hadoop implied some sort of search engine work. I knew it was more versatile, but just hadn’t thought about it. A quick read of the Wiki shows all sorts of companies using it, and it seems like one of the most common applications is log analysis. The other quasi surprising thing is that it often seems to be used with relatively fewer nodes than I would have thought. After all, it is a massively parallel algorithm. However, it is apparently also pretty handy for 10-15 node problems. Hence much smaller organizations and problems are benefiting.

My conclusion, if any, is that it must be a really handy toolkit for throwing together analysis of all sorts of things that take a little grid computing (that term is probably no longer popular) in an elastic Cloud world.

Cool beans! I love the idea of scaling up a quick hadoop run to crank out a report of some kind and then scaling the servers back down so you don’t have to pay for them. Makes sense.

Posted in cloud, grid | Leave a Comment »

Google Reports iPhone Usage 50x Other Handsets; Amazon S3 Goes Down: Low Friction Has a Cost

Posted by Bob Warfield on February 15, 2008

As I write this post there are two articles that caught my eye. For most, the iPhone and Amazon’s Web Services have little to do with one another, but I see a bit of a pattern here that’s interesting.

Slash Lane of Apple Insider reports that Google was shocked that is was seing 50 times more search requests coming from Apple iPhones than any other mobile handset — a revelation so astonishing that the company originally suspected it had made an error culling its own data. It’s an amazing statistic, really. But I can attest to hitting Google quite a lot myself whenever I’m out and about and killing time before the next meeting. In fact, I am very pleased to have my bookmarks out on a web page rather than in my browser so I can easily access all of my favorite sites from whatever device is at hand. The iPhone is quite a credible web browser. I can’t wait for the 3G version and higher speeds.

Following closely on my read of the iPhone piece is Nick Carr’s article about an Amazon S3 outage. Nothing all that earth-shattering or unexpected, just that S3 was out for several hours this morning, beginning at 7:30am EST. The gist of the article is that while the outage was to be expected, Amazon did a poor job keeping users informed of what was going on and providing explanations after the fact. Carr is right, of course, but business is always embarassed when things go wrong and the first (and wrong) human instinct is to be shy about details.

Why do these two go together? I’ll give you a hint: the tales of Facebook applications reaching millions of users in an incredibly short time also goes with the theme I’m thinking of. That theme has to do with friction. Friction is my word for all the factors that slow adoption. The time needed for word of mouth, decisionmaking, purchase, installation, getting through the learning curve, and finally being a first class citizen of whatever community results is governed by the degree of friction.

One of the things the Internet does is reduce friction. In its most extreme, friction actually reverses and becomes a propelling force. We call that viral marketing. Most of the innovations in this second Internet round (post-bubble) have been focused on reducing friction. Social Networks, for example, dramatically reduce the friction of networking. Twitter dramatically reduces the friction of blogging, right down to limiting the article length to 140 characters so you don’t have to labor over the wordsmithing.

While it’s harder, the web is also a powerful means of reducing friction for more physical things. The iPhone and Amazon Web Services are two great examples. In an extremely short time the iPhone has racked up 50x the usage of other competing handsets for the Internet. The traffic to AWS in approximately the same short time now exceeds the combined traffic for all other Amazon properties.

While the web itself helped to spread the word, I think it is no coincidence that these two have a lot to do with the web and offer a lot of value back to the web. It’s what some folks call a virtuous circle. Look for more of these as time goes on.

Now that cost side. These growth rates are not predictable. Nobody would have guessed that either business would get so big so fast. In fact, many guessed just the opposite. Even if you did guess it could happen, it would only be a guess that it could, not that it would. A prudent business would not invest in infrastructure built to the level and assumption that it would happen. That means there will be painful outages from time to time. Hopefully, the infrastructure owners will take those outages as signs that its time to double down and extend their projections of what might happen much further up and to the right. Those that succeed in keeping hold of the Tiger by the Tail will survive and prosper.

Posted in amazon, data center, grid, Marketing, multicore, Web 2.0 | 5 Comments »

When Do The SaaS Acquisition Games Begin? (A Primer on Cloud Computing Market Segments)

Posted by Bob Warfield on February 12, 2008

The Yahoo/Microsoft business has turned to utter farce. Michael Arrington’s line left me in stitches:

Wait. Yahoo and AOL? I Was Looking Forward To Something More…Fierce.

Mathew Ingram calls it “desperation squared.” We have now moved from the factual to the sublime: a sure signal to Yahoo that they need to get on with being acquired. When most of the world is laughing at you, and you are a huge company, it means you’ve lost it. You’re way past the point of return. But this is not why we’re here, for the Giants are thinking of dipping into another branch of the Cloud Computing Tree.

Tom Foremski says that Oracle recently approached Salesforce.com to gauge their interest in a possible $75/share offer. Duncan Riley at Techcrunch finds the rumor plausible, as do I. I won’t spend a lot more time on this particular scenario. It will be a question of Oracle’s resolve to buy versus Salesforce’s resolve to remain independent. But I will say this. Oracle typically spends 7-8x maintenance revenue to buy companies. If the rumor is true, they’re offering 13x trailing twelve months total revenue for Salesforce. It just goes to show the awesome financial power of a good SaaS business. It’s likely worth that much. After all, if Oracle is ever going to get started on the road to SaaS (yes, I know, they have a SaaS business already, yada, yada, but not really), starting from a seed as close to $1B a year as possible would help accelerate things. That’s a real problem, BTW: there just aren’t all that many SaaS properties out there yet for acquirers to choose from. The space isn’t very far along, and is still very young.

And yet there are machinations going on as various players try to position themselves for the coming battles. Some of these manuevers are visible, some are just off the edge where the light is pretty dim. It’s important to segment the Cloud Computing and SaaS market to gain a better understanding of the terrain. We’ll leave aside the Web 2.0 world of Facebook et al, though the infrastructure at the bottom of the market segmentation model I present is the same for the Consumer/Web 2.0 world. Markets tend to consolidate from the bottom of the technology stack up. The reason is that the bottom layers have been around a lot longer, there are more big players, and momentum there has often slowed. These are sure signs that a consolidation is in order. It’s important to know where you are in the stack because it equates to where you are in the M&A food chain. Consequently, VC’s often try to evaluate how near the bottom an idea is versus how late in the day it’s getting. Being too low in the stack when the market is very mature is usually a bad thing. Being high up early is oddly almost never a bad thing. The very top of the stack is apps, and it takes apps to propel the other layers forward.

All things considered, if you have a killer idea for an app, that’s where you should place your bets. That would be another reason for Oracle to pay a premium for Salesforce. The other thing to keep in mind is that the line of safety keeps moving upward. The snapshot I’ll portray today has that line hovering at the Value Added Hoster level. It won’t be long before it moves up a notch to encompass the Virtualizers.

The Battle for SaaS Hosting and Platform Dominance

At the very bottom of the SaaS stack are the hosters and platform builders. There are several armies on the battlefield jockeying already. There are roughly three market segments:

First are the old-school hosters that basically offer raw machines and Internet connectivity: “A Cage and a Pipe.” These guys are very long in the tooth for the current Cloud Computing era. The trouble is they are experts on the physical plant but don’t add much value otherwise, and their expertise is now heavily commoditized. If they don’t learn to offer more value soon, their days are numbered, hence they’re in the “red” zone.

Next up are the value added hosters. Start with a Cage and a Pipe and add Some Service. Perhaps that’s as simple as providing system administrators and DBA’s. Service can become more elaborate. This group is currently a very popular choice for SaaS startups I talk to. Very few of these companies are considering the Red Zone. But the Value Added Hosters need to move upstream as fast as they can, lest they start to go red too. The services they offer are not hard for the Cage and Pipe crowd to bring on. There is so far minimal proprietary technology adding value. Aside from the problem that others can add services, it creates a secondary problem that the cost to deliver the service is higher. We’ve talked before about how much more efficient SaaS players have to be than conventional users of Enteprise Software. The Yellow Zone is borderline in that respect.

It shouldn’t be surprising, therefore, when we read things like OpSource’s acquisition of billing company LeCayla. It gives them technology and a new service to inch them closer to the Green Zone.

This brings us to the Green Zone, which I have dubbed “The Virtualizers.” Virtualization is their chief technology differentiator, although there is often a whole lot more. These players want to bring on as many generic components as they can to complete a full Platform as a Service offering. This is the most interesting and vigorous space, and I predict it represents the future. If the Red and Yellow Zones can’t find a way to get there, they’ll find themselves increasingly commoditized and marginalized, making their segments very tough businesses indeed. The Green Zone brings a number of essential advantages, although every player doesn’t offer every advantage.

One of the big advantages is true On-demand computing. With Amazon and many others you can buy servers buy the hour as needed to deal with load spikes of various kinds. This leads to a tremendous savings for most organizations, and makes it possible for startups to pay the big bucks only if they’re successful and have the big bucks. It’s a radical reduction in friction, and that almost always leads to radical growth. So it is here. Amazon recently reported more web traffic going to Amazon Web Services than the rest of Amazon’s properties combined.

Companies like 3Tera (check out my 3Tera interview posts) and Q-Layer offer such virtulized data centers in the form of software. Buy their software and you can create a virtual datacenter. Or you can buy the hosting as well from these companies and their partners. They’re very important players because they represent the means by which the Red and Yellow Zones can become Green.

Sun deserves special mention after their purchase of MySQL. If I were being completely objective, Sun is still very much in the Yellow Zone. I’m giving Sun and Jonathan Schwartz the benefit of the doubt in terms of where they’re going. They do offer Sun Grid, and they certainly have the wherewithal. Whether the organization can really pull together and get it done remains to be seen, but MySQL is a very promising new jewel in that crown.

SaaS Tools

The level above the platform consists of Tools. First thing to note about this category is that “Tool” is a dirty word among the VC’s and other money mongering intelligentsia. The story goes that nobody ever got rich on tools, the world now expects tools to be given away, yada, yada. BTW, I disagree with that sentiment. There have been lots of very successful tools companies. I think the real issue is that it’s hard for the Money Men to evaluate tools. Everyone promises to be able to turn a noobie programmer into a powerhouse of productivity that can single handedly reproduce SAP’s entire suite over the weekend. Unless you are extremely technical and immersed continuously in the world of Tools, it’s very hard to separate the hype from the reality and the religion from the irrelevant. Nevertheless, this is a real category, and there’s actually a lot going on here.

I break this market into three segments:

At the bottom, just above the Virtualizers from the prior diagram, we have Systems Software, which I’m classifying here as Databases and App Servers. Normally we would include operating systems, but they’re spread all around and largely play in the Virtualizer category. In other words, what’s interesting about Operating Systems vis a vis SaaS and Cloud Computing is virtualization features. This area is dangerously close to the Platforms where most of the Giants are. Sun has already set a big foot down here with MySQL. Amazon is trying to change the game entirely with SimpleDB. There are some players, such as Elastra, that are trying to skate between Amazon and the rest of the world by offering MySQL on Amazon. My take is that such plays need to get big really fast or diversify into other services because the window here has to be closing. There is already so much traffic on Amazon, and so many folks using MySQL there, that it seems likely a single solution will emerge and Amazon is in a good position to dictate what that will be. I can hear Amazon on the phone call now:

Really, you don’t want to sell to us? Well, we’re going to deliver your product on AWS in about 6 months and it will be the preferred solution for the platform.

Or that call could be to a MySQL competitor. There are several, and some say products like PostgresSQL are better for various reasons such as scalability. What would it mean to Sun if Amazon acquired one and built it into their fabric? What does it mean to others lower in the stack if all the good DB’s get bought and incorporated into the fabric of Giants? Definite strategic manuevering possibilities here.

Next up are the Languages. Since the dawn of computing, there have been Language Wars. A lot of this is about separating the religion from the irrelevant, BTW. Nevertheless, we have the new school of scripting languages circling the castle of traditional curly braced languages like Java and C++ (not that the new guys are bereft of curly braces!). Their battering rams are pummeling the iron doors of performance ceaselessly with the promise of productivity. Cheap among these are PHP, Python, and Ruby on Rails. There are successes and failures to point to for all of them. PHP is largely what powers Yahoo and many older web properties. Python, while Open Source, seems to be the one championed by Google. After all, they got Guido. Ruby on Rails is one that I find interesting, because it doesn’t yet have a big power partner. It’s Open Source, but without the partner, it remains something of a Free Spirit. Perhaps that makes it an ideal nucleus for an upstart wanting to take on the Cloud Computing Giants. Heroku would be one such possibility. I’ve seen a demo, and it surely did seem pretty cool. The Ruby brand is still strong, and could propel the right offering far. Zend is working hard to have a go at PHP as well. BTW, I would put Force squarely in the language category. Yes, it is all of the layers below too, but there is a rich set of functionality that adds language and framework, not to mention you must use their proprietary langauge.

I can’t move on from Languages without mentioning Salesforce’s Force either. They view it as a Platform-as-a-Service, but it offers so much more than something like Amazon (so far at least) that it deserves a spot higher in the stack. Force includes a language that is Java-like, but proprietary to Salesforce. Most developers these days have a problem with proprietary. They prefer Open Source. But that’s not even the real Achilles Heel. Force is currently way overpriced to make it practical for ISV’s. As I’ve discussed many times, your Cost of Service needs to be as far below 50% as you can get. With Force starting out at $50 a seat month, customers must charge $100-200 a seat month to achieve reasonable margins. That’s largely not possible for ISV’s, so Force is mostly an IT pheonomenon. That makes it less strategic, but perhaps a better cash cow for Salesforce.

What’s this Enterprise Tools category?

Enterprise IT is used to having a rich ecosystem that fills in the gaps. When you think about it, purchasing the software application is just a small piece of the overall organism that is created when that app goes into production. There are many products bolstering and augmenting the application’s functionality. Don’t like the reporting provided out of the box? Plug in a Business Intelligence Tool. Need to integrate the application with other applications without writing too much custom code? There’s everything from ETL tools ala Informatica to shift data between tables to complex messaging systems from companies like Tibco. Need help managing logon information and implementing single sign on (SSO)? There’s LDAP, Active Directory, and a ton of other products out there.

Almost all of that is gone with Cloud Computing. As someone quipped, “It isn’t that the data is in THE cloud, it just isn’t in MY data center anymore.” And in fact, THE cloud is really many clouds: one for each data center of each provider you’re doing business with. Even more interesting, a lot of the Old School providers of this stuff have technology that isn’t real relevant to the Cloud Computing Era, and many of them have been bought so they can be milked. Witness all the BI vendors that have been absorbed. Their time of innovation is done.

That’s actually great news. The SaaS Enterprise Tools category is the lowest true Green Field opportunity in this model. Nobody owns it. The Giants are mostly absent. And there are even surprisingly few startups about. Perhaps it just doesn’t seem sexy enough, but there are real problems here that need solving. I had lunch the other day with Mike Hoskins of Pervasive. Among many other areas, they do a good business out of software that pumps data out of Salesforce and into your local data center so you can apply your BI tools to it. I’ve interviewed Ken Rudin of LucidEra for this blog. They provide BI solutions in the Saas model, largely based on data from Salesforce again. Another great example is EMC’s recent acquisition of SaaS backup vendor Mozy.

These are good opportunities in this segment. There are customers with real pain and minimal competition so far. The Giants are ill-positioned to jump in because of the disruptive business model that is SaaS. I would expect to see a lot more action here before it’s over, but there is a very interesting move that just took place that seems to have largely been ignored. Workday, Duffield’s Peoplesoft Version Two, has just acquired SOA integration tool vendor Cape Clear. I think this is really an interesting move. Yes, I’m sure they needed to be able to easily integrate a lot of systems outside Workday to sell their application, but I wonder if there is more going on here? For example, at some point, I expect to see fine grain network effects emerge from the topology of the clouds. These will be a function of the need to shift data between applications to integrate them. There’s a real speeds and feeds issue there that has to be addressed. It will be advantageous to run your software in the same cloud as what it integrates with. This will favor really big clouds like Amazon’s. I could also see it triggering partnerships bolstered by high speed dedicated links between data centers. One example is Joyent’s dedicated link to the Facebook data center, which gives them a real advantage hosting Facebook applets.

Is Workday trying to lock in a part of that future integration pie? Not clear, but there sure isn’t much else beyond Cape Clear in the space right now and Workday’s application is the kind that wants to be the system of record nexus for everything else. Dana Gardner discusses how increasingly, it is the Service and not the Software that drives acquisitions like this. After the merger, you won’t be able to buy Cape Clear except as a Service (now dubbed “Integration as a Service”). Given that it was a very high quality offering, Cape Clear gives Workday an interesting and valuable differentiator, if nothing else. One of the big puzzles of SaaS is how to get the more complex domains installed much more cheaply than conventional Enterprise Software. Integrating with a bunch of Legacy systems can make that really hard unless you have a toolset like Cape Clear to simplify the job. To the extent the tool is bought to integrate other SaaS vendors, it can serve as valuable lead generation to go sell the primary Workday Suite into Enterprises that clearly have SaaS underway. All in all, I would rate this as a canny and highly strategic move that Workday has made.

SaaS Enterprise and Desktop Applications

This brings us finally to the topmost slices of the layer cake, applications. I include here both desktop and Enterprise applications, so it’s everything from spreadsheets and word processors in the cloud to Salesforce.com. That’s a lot of ground to cover, and it has barely been penetrated. There are numerous application categories for which there are not yet any SaaS offerings, and many of the offerings that are available are still in their early days yet. Most of the application companies I talk to are seeing unbridled demand. It seems likely that for early markets there are enough customers out there in the SaaS early adopter crowd that you can go pretty far just because your offering is SaaS, assuming it works, of course.

What’s Strategic and Who’s Being Left Out?

First, there is an overall megatrend at work here, and that is the move from proprietary to open. Companies will over time be less and less inclined to run datacenters. Giant Cloud Centers like Amazon Web Services will be the new black and the New Open for that world. That Openness will drive throughout the stack in an expanding wavefront, because Open wants to connect to Open. That makes All Things Open strategic in this Cloud Computing Era.

Second, let’s talk briefly about acquisition strategy. If your goal is to acquire SaaS market share and scale, there isn’t much available. Salesforce is the largest pure SaaS vendor and they’re still under a billion in annual revenues, although they’re closing in on it. That means acquisitions at this stage in the market should be more focused on capturing Strategic Choke Points than cubic dollars.

Let’s review potential choke points:

– Hosting and Platforms: Look at the 3Tera and Q-Layer offerings as a means of supercharging data centers into the Cloud Era. There are probably other players I’ve missed, but these guys give a flavor. Be aware that virtualization is all the rage. I personally have met 2 different Entrepreneurs in Residence at major Silicon Valley VC’s in just the last month who are focused on virtualization. There’s a lot of attention here, and we can’t forget VMWare, nor the fact that the OS makers all want to build it into the OS. The nice thing about something like a 3Tera is that its a lot more than just virtualization. The real answer is to recast virtualization as a solution, and thereby move up the stack. Simply Continuous, for example, offers a Disaster Recovery solution based on virtualization. Those EIRs I mention are also interested in solutions more than generic virtualization.

– Systems Software: Sun’s purchase of MySQL signalled that consolidation has begun here. We’re going to see the clash of the Relational DB’s versus the new era SimpleDB-style systems. I have to expect that all the action over at Amazon will flush others out of the woodwork some time this year, especially Microsoft and Google. The former may be overly preoccupied with Yahoo and therefore delayed. As for App Servers, look for Dark Horses specific to the new languages. Someone who does something really great for Cloud Computing may have a leg up, but I’m not sure how long it will last. If you want to hang out in this layer, be focused outside the limelight. LucidEra took over an open source column store DB and focused it around SaaS BI needs. That’s safely out of the line of fire between MySQL and the SimpleDB’s of the world. In fact, there are likely more opportunities in the BI-specific space. Certainly this was very late in maturation for conventional On-premises. I wonder if someone will build a Teradata equivalent in the Cloud, for example?

– Languages: This is as low in the stack as I’d want to be innovating unless I had a serious niche picked out. The world seems to be clamoring for new languages at the moment, so maybe there’s a good shot here. And so far, nobody is very far along at packaging any of the new languages so they’re easy to use for Cloud Computing. Stay away from the crowded niche of proprietary “non-programmer” languages. These are the Bugees, Cogheads, and the like. They’re really more like dBase or Access in the Cloud than they are Languages in the Cloud. If one of these players can really hijack a major language and get a big enough lead, it will be interesting. It’s very hard though, with Open Source. It levels the playing field unless you’re very careful about how you add value.

– Enterprise Tools: Huge opportunity here. There is no compelling generic BI offering for SaaS. Workday just bought arguably the best SOA offering in Cape Clear. Yet many application domains require these tools and a whole lot more. If you are a startup looking to be acquired, think about what services your company could add to the Amazon umbrella. What are the things that would spread like wildfire among the couple hundred thousand developers who have accounts on Amazon? Build your solution so it scales well and takes advantage of Amazon’s pricing for communication within their cloud and you could go far. One thing I think is glaringly apparent and needed, for example, is an OpenID service for Amazon. There are many many others. Deconstruct the current On-premises IT ecosystem and see what makes sense for SaaS.

– Applications: If you want a strategic choke point, you want to own a system of record. They’re the blue chip properties in the Enterprise Suites. That’s because everything else gets its data from some system of record or another. Let’s not be totally focused on the past though. The Cloud is ideal for collaboration. How can you combine a system of record domain with serious collaboration to build a new killer category? Worth thinking about.

I think I’ve provided a decent framework for thinking about the SaaS world in terms of where the action is, what makes sense for M&A, and where the opportunities may be. If there’s one thing I’m certain of, it’s that we’re early days on Cloud Computing and there is a lot more opportunity out there than I’ve portrayed in this brief article. There will also be a lot more change, and market segmentation could be viewed along many more dimensions than the one I’ve portrayed here.

Food for thought.

Related Articles

Just noticed Cote refers to the folks at the bottom of my stack as the “Morlocks”. Remember the nasty troglodytes from H.G. Wells the Time Machine? I don’t think the Morlocks are all that likely to eat the “Blond People” who are apparently the SaaS applications, but stranger things have happened!

I just watched the Google App Engine announcement. It places them at the language level, which is a big leap up the stack I’ve drawn in this post. It really raises the stakes for those playing at the lower levels! See my post for more.

Posted in amazon, data center, grid, saas, strategy, Web 2.0 | 13 Comments »

IBM Trying to Keep Up With the Cloud Jones’s

Posted by Bob Warfield on January 3, 2008

Can you tell that the whole cloud computing thing is ratcheting up a few notches in intensity? I blame Amazon, who’ve rolled out a ton of initiatives and gotten lots of traction among startups. But we ain’t seen nothin’ yet, friends.

Already there are signs that others are feeling like the train is leaving the station. One of the more interesting is that IBM is bringing on CouchDB’s Damien Katz to work on the project full time. It seems to me that IBM is making this move to ensure that they have an answer for Amazon’s SimpleDB in the form of CouchDB. Thanks to Patrick Logan for pointing this out in his own blog post.

We’re going to see this pace continue to accelerate, and we’re going to see those who want to be players jockeying to make sure that they have all the elements in their Cloud Platform Suite. It’s still to early to tell what the exact combination of ingredients for success will be, but so far it looks like Amazon is the head chef when we see others trying to emulate what they’ve done.

Meanwhile this is fantastic news for developers and startups that want to embrace these technologies. The danger in things like Amazon’s Web Services is that they are so unique that you become utterly dependant on them. The more others offer the same sort of services, the more competition can work its magic and make the whole scene more vibrant, cheaper, and innovative.

Viva les Cloud Computing!

Posted in amazon, data center, grid, platforms, saas, strategy | 1 Comment »

Amazon Raises the Cloud Platform Bar Again With DevPay

Posted by Bob Warfield on January 1, 2008

Wow, what an exciting time to be watching the Amazon Cloud Platform evolve. We’re just beginning to think through the recent SimpleDB announcement when Amazon launches DevPay. Lucid Era CEO Ken Rudin says land grabs are all about a race to the top of the mountain to plant your flag there first. It seems like Amazon has hired a helicopter in the quest to get there first. Google, Yahoo, and others are barely talking about their cloud platforms and here is Amazon with new developments piling up on each other. And unlike some of the developments announced by companies like Google, this stuff is ready to go. They’re not just talking about it.

What’s DevPay all about, anyway? Simply put, Amazon are providing a service to automate your billing. If you use their web services to offer a service of your own, it gives you the ability to let Amazon deal with billing for you. It’s based off the pricing model for the rest of the Amazon Web Services like EC2 and S3, but you can use any combination of one-time charges, recurring monthly charges, and metered Amazon Web Service usage. You have total flexibility to price your applications either higher or lower than your AWS usage. In addition, they’re promising to put everything they know about how to do e-commerce (and who knows more than Amazon?) behind making the user experience great for your customers and you.

It’s not a tremendous big step forward, but it’s useful. It’s another brick in the wall. There are companies out there providing SaaS infrastructure for whom billing is a big piece of their offering, so obviously it is a problem that people care about having solved. What are the pros and cons of this particular approach?

Let’s start with the pros. If you are going to use Amazon Web Services anyway, DevPay makes the process dead simple for you to get paid for your service. It’s ideal for microISV’s as a way to monetize their creations. The potential is there for interesting revenue that’s tied to usage in the classic SaaS way.

What about the cons? Here there are many, depending on what sort of business you are in and how you want to be percieved by customers. I break it down into two major concerns: flexibility and branding. Let’s start with branding, which I think is the more important concern. It’s not clear to me from the announcement how you would go about disassociating your offering from Amazon so that it becomes your stand alone brand. You and your customers are going to have to acknowledge and accept that the offering you provide is part of the Amazon collective. Resistance is futile. This is the moral equivalent of not being able to accept a credit card directly, and instead having to refer customers to PayPal. It works, but it detracts a from your “big time” image. If having a big time stand-alone image is important for you, DevPay is a non-starter at this stage. It’s not clear to me that Amazon would have to keep it that way for all time, but perhaps they need to protect their own image as well, and would insist on it.

Second major problem is flexibility. Yes, Amazon says you can “use any combination of one-time charges, recurring monthly charges, and metered Amazon Web Service usage”. That sounds flexible, but it casts your business in light of what resources it consumes on Amazon. Suppose you want a completely different metric? Perhaps you have another expense that is not well correlated with Amazon of some kind that has to be built in, for example. Perhaps you need to do something completely arbitrary. It doesn’t look to me like Amazon can facilitate that at the present.

Both of these limitations are things Amazon could choose to clean up. So far, the impression one gets is that Amazon is just putting a pretty face on the considerable internal resources they’ve developed for their primary business and making them available. What will be interesting is to see what happens when (and if) Amazon is prepared to add value in ways that never mattered to their core business. Meanwhile, they’re doing a great job stealing a march on potential competition. As a SaaS business, they should be quite sticky. Anyone that writes for their platform will have a fair amount of work to back out and try another platform. DevPay is another example. It will create network lock-in by tying your customer’s business relationship in terms of billing and payment to Amazon, and in turn tying that to your use of Amazon Web Services. For example, that same lack of flexibility might prevent you from migrating your S3 or EC2 usages to, say, Google. There doesn’t look to be a way for you to build the Google costs into your billing in a flexible way.

We’ll see the next 5 to 10 years be a rich period of innovation and transition to Cloud Computing Platforms. Just as many of the original PC OS platforms disappeared (CP/M anyone?) after an initial flurry of activity, and others have changed radically in importance (it no longers matters whether you run PC or Mac does it?), so too will there be dramatic changes here. The beneficiaries will be users as well as the platform vendors, but it’s going to take nimbleness and prescient thinking to place all your bets exactly right. The good news is the cost of making a mistake is far less than it had been in the era of building your own datacenters!

Related Articles

To Rule the Clouds Takes Software: Why Amazon’s SimpleDB is a Huge Next Step

Coté’s Excellent Description of the Microsoft Web Rift

Posted in amazon, data center, ec2, grid, saas, strategy | 5 Comments »

Eventual Consistency Is Not That Scary

Posted by Bob Warfield on December 22, 2007

Amazon’s new SimpleDB offering, like many other post-modern databases such as CouchDB, offers massive scaling potential if users will accept eventual consistency. It feels like a weighty decision. Cast in the worst possible light, eventual consistency means the database will sometimes return the wrong answer in the interests of allowing it to keep scaling. Gasp! What good is a database that returns the wrong answer? Why bother?

Often waiting for the write answer (sorry, that inadvertant slip makes for a good pun so I’ll leave it in place) returns a different kind of wrong answer. Specifically, it may not return an answer at all. The system may simply appear to hang.

How does all this come about? Largely, it’s a function of how fast changes in the database can be propogated to the point they’re available to everyone reading from the database. For small numbers of users (i.e. we’re not scaling at all), this is easy. There is one copy of the data sitting in a table structure, we lock up the readers so they can’t access it whenever we change that data, and everyone always gets the right answer. Of course, solving simple problems is always easy. It’s solving the hard problems that lands us the big bucks. So how do we scale that out? When we reach a point where we are delivering that information from that one single place as fast as it can be delivered, we have no choice but to make more places to deliver from. There are many different mechanisms for replicating the data and making it all look like one big happy (but sometimes inconsistent) database, let’s look at them.

Once again, this problem may be simpler when cast in a certain way. The most common and easiest approach is to keep one single structure as the source of truth for writing, and then replicate out changes to many other databases for reading. All the common database software supports this. If your single database could handle 100 users consistently, you can imagine if those 100 users were each another database you were replication to, suddenly you could handle 100 * 100 users, or 10,000 users. Now we’re scaling. There are schemes to replicate the replicated and so on and so forth. Note that in this scenario, all writing must still be done on the one single database. This is okay, because for many problems, perhaps even the majority, readers far outnumber writers. In fact, this works so well, that we may not even use databases for the replication. Instead, we might consider a vast in-memory cache. Software such as memcached does this for us quite nicely, with another order of magnitude performance boost since reading things in memory is dramatically faster than trying to read from disk.

Okay, that’s pretty cool, but is it consistent? This will depend on how fast you can replicate the data. If you can get every database and cache in the system up to date between consecutive read requests, you are sure to be consistent. In fact, it just has to get done between read requests for any piece of data that changed, which is a much lower bar to hurdle. If consistency is critical, the system may be designed to inhibit reading until changes have propogated. It take some very clever algorithms to do this well without throwing a spanner into the works and bringing the system to its knees performance-wise.

Still, we can get pretty far. Suppose your database can service 100 users with reads and writes and keep it all consistent with appropriate performance. Let’s say we replace those 100 users with 100 copies of your database to get up to 10,000 users. It’s now going to take twice as long. During the first half, we’re copying changes from the Mother Server to all of the children. The second half we’re serving the answers to the readers requesting them. Let’s say we can keep the overall time the same just by halving how many are served. So the Mother Server talks to 50 children. Now we can scale to 50 * 50 = 2500 users. Not nearly as good, but still much better than not scaling at all. We can go 3 layers deep and have Mother serve 33 children each serve 33 grand children to get to 33 * 33 * 33 = 35,937 users. Not bad, but Google’s founders can still sleep soundly at night. The reality is we probably can handle a lot more than 100 on our Mother Server. Perhaps she’s good for 1000. Now the 3-layered scheme will get us all the way to 333*333*333 = 36 million. That starts to wake up the sound sleepers, or perhaps makes them restless. Yet, that also means we’re using over 100,000 servers too: 1 Mothers talks to 333 children who each have 333 grandchildren. It’s a pretty wasteful scheme.

Well, let’s bring in Eventual Consistency to reduce the waste. Assume you are a startup CEO. You are having a great day, because you are reading the wonderful review of your service in Techcrunch. It seems like the IPO will be just around the corner after all that gushing does it’s inevitable work and millions suddenly find their way to your site. Just at the peak of your bliss, the CTO walks in and says she has good news and bad news. The bad news is the site is crashing and angry emails are pouring in. The other bad news is that to fix it “right”, so that the data stays consistent, she needs your immediate approval to purchase 999 servers so she can set up a replicated scheme that runs 1 Mother Server (which you already own) and 999 children. No way, you say. What’s the good news? With a sly smile, she tells you that if you’re willing to tolerate a little eventual consistency, your site could get by on a lot fewer servers than 999.

Suppose you are willing to have it take twice as long as normal for data to be up to date. The readers will read just as fast, it’s just that if they’re reading something that changed, it won’t be correct until the second consecutive read or page refresh. So, our old model that had the system able to handle 1,000 users, and replicated to 999 servers to handle 1 million users used to have to go to 3 tiers (333 * 333 * 333) to get to the next level at 36 million and still serve everything consistently and just as fast. If we relax the “just as fast”, we can let our Mother Server handle 2,000 at half the speed to get to 2000 * 1000 = 2 million users on 3 tiers with 2000 servers instead of 100,000 servers to get to 36 million. If we run 4x slower on writes, we can get 4000*1000 = 4 million users with 4000 servers. Eventually things will bog down and thrash, but you can see how tolerating Eventual Consistency can radically reduce your machine requirements in this simple architecture. BTW, we all run into Eventual Consistency all the time on the web, whether or not we know it. I use Google Reader to read blogs and WordPress to write this blog. Any time a page refresh shows you a different result when you didn’t change anything, you may be looking at Eventual Consistency. Even if you suspect others changed something, Google Reader still comes along frequently and says an error occured and asks me to refresh. It’s telling me they relied on Eventual Consistency and I have an inconsistent result.

As I mention, these approaches can still be wasteful of servers because of all the data copies that are flowing around. This leads us to wonder, “What’s the next alternative?” Instead of just using servers to copy data to other servers, which is a prime source of the waste, we could try to employ what’s called a sharded or Federated architecture. In this approach, there is only one copy of each piece of data, but we’re dividing up that data so that each server is only responsible for a small subset of it. Let’s say we have a database keeping up with our inventory for a big shopping site. It’s really important to have it be consistent so that when people buy, they know the item was in stock. Hey, it’s a contrived example and we know we can cheat on it, but go with it. Let’s further suppose we have 100,000 SKU’s, or different kinds of items in our inventory. We can divide this across 100 servers by letting each server be responsible for 1,000 items. Then we write some code that acts as the go-between with the servers. It simply checks the query to see what you are looking for, and sends your query to the correct sub-server. Voila, you have a sharded architecture that scales very efficiently. Our replicated model would blow out 99 copies from the 1 server, and it could be about 50 times faster (or handle 50x the users as I use a gross 1/2 time estimate for replication time) on reads, but it was no faster at all on writes. That wouldn’t work for our inventory problem because writes are so common during the Christmas shopping season.

Now what are the pitfalls of sharding. First, there is some assembly required. Actually, there is a lot of assembly required. It’s complicated to build such architectures. Second, it may be very hard to load balance the shards. Just dividing up the product inventory across 100 servers is not necessarily helpful. You would want to use a knowledge of access patterns to divide the products so the load on each server is about the same. If all the popular products wound up on one server, you’d have a scaling disaster. These balances can change over time and have to be updated, which brings more complexity. Some say you never stop fiddling with the tuning of a sharded architecture, but at least we don’t have Eventual Consistency. Hmmm, or do we? If you can ever get into a situation where there is more than one copy of the data and the one you are accessing is not up to date, Eventual Consistency could rear up as a design choice made by the DB owners. In that case, they just give you the wrong answer and move on.

How can this happen in the sharded world? It’s all about that load balancing. Suppose our load balancer needs to move some data to a different shard. Suppose the startup just bought 10 more servers and wants to create 10 additional shards. While that data is in motion, there are still users on the site. What do we tell them? Sometimes companies can shut down the service to keep everything consistent while changes are made. Certainly that is one answer, but it may annoy your users greatly. Another answer is to tolerate Eventual Consistency while things are in motion with a promise of a return to full consistency when the shards are done rebalancing. Here is a case where the Eventual Consistency didn’t last all that long, so maybe that’s better than the case where it happens a lot.

Note that consistency is often in the eye of the beholder. If we’re talking Internet users, ask yourself how much harm there would be if a page refresh delivered a different result. In may applications, the user may even expect or welcome a different result. An email program that suddenly shows mail after a refresh is not at all unexpected. That the user didn’t know the mail was already on the server at the time of the first refresh doesn’t really hurt them. There are cases where absolute consistency is very important. Go back to the sharded database example. It is normal to expect every single product in the inventory to have a unique id that lets us find that part. Those ids have to be unique and consistent across all of the shards. It is crucially important that any id changes are up to date before anything else is done or the system can get really corrupted. So, we may create a mechanism to generate consistent ids across shards. This adds still more architectural complexity.

There are nightmare scenarios where it becomes impossible to shard efficiently. I will over simplify to make it easy and not necessarily correct, but I hope you will get the idea. Suppose you’re dealing with operations that affect many different objects. The objects are divided into shards naturally when examined individually, but the operations between the objects span many shards. Perhaps the relationships between shards are incompatible to the extent that there is no way to shard them across machines such that every single operation doesn’t hit many shards instead of a single shard. Hitting many shards will invalidate the sharding approach. In times like this, we will again be tempted to opt for Eventual Consistency. We’ll get to hitting all the shards in our sweet time, and any accesses before that update is finished will just live with inconsistent results. Such scenarios can arise where there is no obvious good sharding algorithm, or where the relationships between the objects (perhaps its some sort of real time collaborative application where people are bouncing around touching objects unpredictably) are changing much too quickly to rebalance the shards. One really common case of an operation hitting many shards is queries. You can’t anticipate all queries such that any of them can be processed within a single shard unless you sharply limit the expressiveness of the query tools and languages.

I hope you come away from this discussion with some new insights:

– Inconsistency derives from having multiple copies of the data that are not all in sync.

– We need multiple copies to scale. This is easiest for reads. Scaling writes is much harder.

– We can keep copies consistent at the expense of slowing everything down to wait for consistency. The savings in relaxing this can be quite large.

– We can somewhat balance that expense with increasingly complex architecture. Sharding is more efficient than replication, but gets very complex and can still break down, for example.

– It’s still cheaper to allow for Eventual Consistency, and in many applications, the user experience is just as good.

Big web sites realized all this long ago. That’s why sites like Amazon have systems like SimpleDB and Dynamo that are built from the ground up with Eventual Consistency in mind. You need to look very carefully at your application to know what’s good or bad, and also understand what the performance envelope is for the Eventual Consistency. Here are some thoughts from the blogosphere:

Dare Obasanjo

The documentation for the PutAttributes method has the following note

Because Amazon SimpleDB makes multiple copies of your data and uses an eventual consistency update model, an immediate GetAttributes or Query request (read) immediately after a DeleteAttributes or PutAttributes request (write) might not return the updated data.

This may or may not be a problem depending on your application. It may be OK for a del.icio.us style application if it took a few minutes before your tag updates were applied to a bookmark but the same can’t be said for an application like Twitter. What would be useful for developers would be if Amazon gave some more information around the delayed propagation such as average latency during peak and off-peak hours.

Here I think Dare’s example of Twitter suffering from Eventual Consistency is interesting. In Twitter, we follow mico-blog postings. What would be the impact of Eventual Consistency? Of course it depends on the exact nature of the consistency, but lets look at our replicated reader approach. Recall that in the Eventual Consistency version, we simply tolerate that we allow reads to come in so fast that some of the replicated read servers are not up to date. However, they are up to date with respect to a certain point in time, just not necessarily the present. In other words, I could read at 10:00 am and get results on one server that are up to date through 10:00 am and on another results only up to date through 9:59 am. For Twitter, depending on which server my session is connected to, my feeds may update a little behind the times. Is that the end of the world? For Twitter users, if they are engaged in a real time conversation, it means the person with the delayed feed may write something that looks out of sequence to the person with the up to date feed whenever the two are in a back and forth chat. OTOH, if Twitter degraded to that mode rather than taking longer and longer to accept input or do updates, wouldn’t that be better?

Erik Onnen

Onnen wrote a post called “Socializing Eventual Consistency” that has two important points. First, many developers are not used to talking about Eventual Consistency. The knee jerk reaction is that it’s bad, not the right thing, or an unnecessary compromise for anyone but a huge player like Amazon. It’s almost like a macho thing. Onnen lacked the right examples and vocabulary to engage his peers when it was time to decide about it. Hopefully all the chatter about Amazon’s SimpleDB and other massively scalable sites will get more familiarity flowing around these concepts. I hope this article also makes it easier.

His other point is that when push comes to shove, most business users will prefer availability over consistency. I think that is a key point. It’s also a big takeaway from the next blog:

Werner Vogels

Amazon’s CTO posted to try to make Eventual Consistency and it’s trade offs more clear for all. He lays a lot of good theoretical groundwork that boils down to explaining that there are tradeoffs and you can’t have it all. This is similar to the message I’ve tried to portray above. Eventually, you have to keep multiple copies of the data to scale. Once that happens, it becomes harder and harder to maintain consistency and still scale. Vogels provides a full taxonomy of concepts (i.e. Monotonic Write Consistency et al) with which to think about all this and evaluate the trade offs. He also does a good job pointing out how often even conventional RDMS’s wind up dealing with inconsistency. Some of the best (and least obvious to many) examples include the idea that your mechanism for backups is often not fully consistent. The right answer for many systems is to require that writes always work, but that reads are only eventually consistent.

Conclusion

I’ve covered a lot of consistency related tradeoffs involved in database systems for large web architectures. Rest assured, that unless you are pretty unsuccessful, you will have to deal with this stuff. Get ahead of the curve and understand for your application what the consistency requirements will be. Do not start out being unnecessarily consistent. That’s a premature optimization that can bite you in many ways. Relaxing consistency as much as possible while still delivering a good user experience can lead to radically better scaling as well as making your life simpler. Eventual Consistency is nothing to be afraid of. Rather, it’s a key concept and tactic to be aware of.

Personally, I would seriously look into solutions like Amazon’s Simple DB while I was at it.

Posted in amazon, data center, enterprise software, grid, platforms, soa, software development | 6 Comments »

What if Twitter Was Built on Amazon’s Cloud?

Posted by Bob Warfield on December 18, 2007

There was recent bellyaching in the blogosphere again about Twitter being down. Dave Winer grumbles, “What other basic form of communication goes down for 12 hours at a time?” There are various comments, and in the end, apparently it was about their moving ISP’s. Twitter themselves had this to say:

Twitter is humming along now after a late night. Our team worked earnestly into the night and morning on our largest and most complex maintenance project ever. Everything went pretty much according to plan except for one thing: an incorrect switch.

The switch in question caps traffic an unacceptable level. In order to correct this, we’ll need to get some hardware installed. Unfortunately, that means we’re not done with our datacenter move just yet. This type of work can be frustrating but it’s all towards Twitter’s highest goal: reliability.

Such moves are never easy, they always include a hitch of some kind, and the Twitter customer base is hopelessly addicted to the medium so Twitter hears about it whenever the turn the thing off for any period of time. I look at this and for me it’s just one more reason I wouldn’t want to own a datacenter.

Suppose your service, or maybe even Twitter, was built on Amazon’s Cloud or some other Utility Computing solution. You don’t own the servers, you are renting them. If loads go up, you can simply rent more in direct proportion to the loads and on 10 minutes notice. A recent High Scalability article on scaling Twitter shows they don’t really have all that many servers:

1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.

8 Sun X4100s.

10 boxes, in other words. Now it comes time to upgrade. Much pain and frustration. To do it well, and without interruption, they really need 2 complete copies of their infrastructure. This way, they can prepare the new version and start cutting users over to it while leaving the old one running. When everyone is over, the old system can be decommissioned. For many startups, owning twice as much hardware as they use is just out of the question. The more successful they become, the more expensive it becomes to entertain such a luxury. Not so on a utility computing service like Amazon’s. Purchase the use of twice as many servers for just how long it takes for a successful upgrade and then cut them loose afterward.

There are detractors to the Amazon approach out there, but do we really think it would make Twitter much less reliable? What if it made it much more reliable?

Here’s another thought that runs rampant: how well would Amazon’s new SimpleDB work for a service like Twitter? It seems tailormade. Certainly the notion of a “texty” database with up to 1024 characters per field seems like a fit. It would be fascinating to see some of the Twitterati put up a Twitter clone on Amazon’s Web Services using SimpleDB just to see how well it works and how quickly it could be put together. Given the platform and the requirements of the application, it seems like it would not be that hard to do the experiment. It would certainly make for an interesting test of how well Amazon’s infrastructure really works.

Posted in data center, ec2, grid, platforms, Web 2.0 | 2 Comments »

To Rule the Clouds Takes Software: Why Amazon SimpleDB is a Huge Next Step

Posted by Bob Warfield on December 15, 2007

One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them…

J. R. R. Tolkien

There is much interesting cloud-related news in the blogosphere. Various pundits are sharing a back and forth on the potential for cloud centralization to result in just a very few datacenters and what that might mean. The really big news is Amazon’s fascinating new addition to their cloud platform of SimpleDB. Let’s talk about what it all means.

Sun’s CTO, Greg Papadopoulos, has been predicting that the earth’s compute resources will resolve into about “five hyperscale, pan-global broadband computing services giants” — with Sun, in its version of this future scenario, the primary supplier of hardware and operations software to those giants. The last was channeled via Phil Wainewright, who goes on to ask, “What is it about a computing grid that’s inherently “more centralized” in nature?” He feels that Nick Carr has missed the mark and swallowed Sun’s line hook, line, and sinker. For his part, Carr’s only crime was to seize on a good story, because at the same dinner, another Sun executive, Subodh Bapat, was telling Carr that sometime soon a major datacenter failure would have “major national effects.” The irony is positively juicy with Sun talking out both sides of their proverbial mouths.

The tradeoff that Carr and Wainewright are worried about is one of economies of scale that favor centralization versus flexibility and resiliency that favors decentralization. Where they differ is that Carr sees economies of scale winning in a world where IT matters less and less and Wainewright favors the superior architectural possibilities of decentralization. Is datacenter centralization inexorable? In a word, yes, but it may not boil down to just 5 data center owners, and it may take quite a while for the forces at work to finish this evolution. The factors that determine who the eventual winners will be are also quite interesting, and have the potential to change a lot of landscapes that today are relatively isolated. Let’s consider what the forces of centralization are.

First, there is a huge migration of software underway to the cloud. In other words, software that is never installed on your machine or in your company’s datacenter. It resides in the cloud and comes to you via the browser. Examples include SaaS on the business side and the vast armada of consumer Web 2.0 products such as Facebook. No category is safe from this trend, not even traditional bastions as should be clear from the growing crop of Microsoft Office competitors that reside in the cloud.

Second, this migration leads to centralization. The mere act of building around a cloud architecture, even if it is a private cloud in your own company’s datacenter, leads to centralization. After all, software is moving off your desktop and into that datacenter. When many companies are aggregated into a single datacenter, into a SaaS multi-tenant architecture, for example, further centralization occurs. When you offer a ubiquitous service to the masses, as is the case with something like Google, the requirements to deliver that can lead to some of the largest datacenter operations in the land.

Third, there are the afore-mentioned economies of scale. Google has grown so large that it now builds its own special-purpose switches and servers to enable it to grow more cheaply. The big web empires are all built on the notion of scaling out rather than scaling up, and they run on commodity hardware. Because they have so many servers, automating their care and feeding has been baked into their DNA. Not so with most corporate datacenters that are just beginning to see the fruits of crude generic technologies like virtualization that seek to be all things to all people. Virtualization is a great next step for them, but there are bigger steps ahead yet that will further reduce costs.

Fourth, the ultimate irony is that centralization begats centralization through network effects. This is the story of the big consumer web properties. Every person that joins a social network adds more value to the network than the prior person did. The value of the network grows exponentially. This connectedness is facilitated most easily in today’s world by centralization. Vendors that start to get traction increase their network effects in various ways: Amazon charges to bring data in and out of their cloud, but not to transfer between services within the cloud.

Lastly, there are green considerations at work. The biggest costs associated with datacenters these days are around electricity and cooling. Microsoft is building a data center in Siberia, which is both cold and pretty central to Asia. Consider this: given the speed of light over a fiber connection, what is the cost of latency in having a data center somewhere far north (and cold) in Canada like Winnipeg versus far south (and hot) like Austin, Texas? It’s 1349 miles, which, as the photon travels (186,000 miles per second) is about 7.2 milliseconds. The world’s fastest hard drive, the nifty Mtron solid state disks I’m now coveting thanks to Engadget and Kevin Burton, can only write a paltry 80K or so bytes in that time: not even enough for one photo at decent resolution. So consider a ring of datacenter clusters built in colder regions. Centralized computing is up north where the cold that computers like is nearly free for the asking: just open a window many days. Or come closer. Put it up on a mountain peak. Immerse it near a hydro dam and get the juice cheaper too. It doesn’t matter. Laying fiber is pretty cheap compared to paying the energy bills.

The next question is trickier: how do these clouds compete? Eventually, they will become commoditized, and they will compete on price, but we are a long ways from that point. At least 10 years or more. Before that can happen, customers have to agree on what the essential feature sets are for this “product”. I believe this is where software comes into play, and that should be a matter of great concern for the hosting providers of today whose expertise largely does not revolve around software as a way to add value. As Eric Schmidt said (via Nick Carr) when he started saying Google would enter this market:

For clouds to reach their potential, they should be nearly as easy to program and navigate as the Web. This, say analysts, should open up growing markets for cloud search and software tools—a natural business for Google and its competitors.

Some will immediately react with, “Hold it a minute, what about the hardware? What about the network?” The best of the cloud architectures will commoditize those considerations away. In fact, commoditization will start down at the bottom of the technology stack and work its way up. The first stage of that, BTW, is already almost over. That was the choice of CPU. MIPS? PowerPC? SPARC? No, Intel/AMD are the winners. The others still exist (not all of them!), but they’ve peaked and are on their way down at various terminal velocities. Their owners need to milk them for profit, but it would be a losing battle to invest there. Even Macs now carry Intel inside, and Sun now carries the ticker symbol “JAVA”, a not-so-subtle hat tip to the importance of software.

Hardware boxes are largely a dead issue too. There is too little opportunity to differentiate for very long and the cpu’s dictate an awful lot of what must be done. Dell is an assembler and marketer of the lowest cost components delivered just in time lest they devalue in inventory. Sun still pushes package design, and it may have some relevance to centralization, but this will be commoditized because of centralization.

Next up will be the operating system. Again, we’re pretty far down the path of Linux. Corporations still carry a lot of other things inside their firewalls, but the clouds will be populated almost exclusively with Linux, and we could already see that has happened if we could get reliable statistics on it. Linux defines the base minimum of what a cloud offering has to provide: utility computing instances running Linux. This is exactly what Amazon’s EC2 offers.

What else does the cloud need? Reliable archival storage. Again, Amazon offers this with S3. Cloud consumers are adopting it in droves because it makes sense. It’s a better deal than a raw disk array because it adds value versus that disk array for archival storage. The value is in the form of resiliency and backup. Put the data on S3 and forget about those problems. This begins the commoditization of storage. Is it any wonder that EMC bought VMWare and that a software offering is now most of their market cap? Hardware guys, put on your thinking caps, this will get much worse. What software assets do you bring to the table.

3Tera is a service I’ve talked about before that has a very similar offering available from multiple hosting partners of theirs. They create a virtual SAN that you can backup and mirror at the click of a mouse. They let you configure Linux instances to your heart’s content. Others will follow. IBM’s Blue Cloud offers much the same. This collection is today’s blueprint for what the Cloud offers in terms of a platform.

But, this platform is a moving target, and it will keep moving up the stack. Amazon just announced another rung up with SimpleDB. For most software that goes into the Cloud, once you have an OS and a file system, the next thing you want to see is a database. Certainly when I attended Amazon Startup Project, the availability of a robust database solution was the number one thing folks wanted to see Amazon bring out. The GM of EC2 promised me that this was on the way and that there would be several announcements before the end of the year. First we saw the availability of EC2 instances that had more memory, disk, and cpu, so that they’d make better database hosts. SimpleDB is much more ambitious. It’s a replacement for the conventional database as embodied in products like mySQL and Oracle that was designed from the ground up to live in a cloud computing world. At one stroke it solves a lot of very interesting problems that used to challenge would-be EC2 users around the database.

Along the lines of my list of factors that drive data center centralization, Phil Windley says the economics are impossible to stop. Scoble asks whether MySQL, Oracle, and SQL Server are dead:

Since Amazon brought out its S3 storage service, I’ve seen many many startups give up data centers altogether.

Tell me why the same thing won’t happen here.

There is no doubt in my mind that all startups will give up having datacenters altogether before this ends. However, before we get too head up in assuming that SimpleDB gives us that opportunity, let’s drop back and consider what it’s limitations are:

– It is similar to a relational database, but there are significant differences. Code will have to be reworked to run there, even if it doesn’t run afoul of the other issues.

– Latency is a problem when your database is in another datacenter from the rest of your code. Don MacAskill brings this one up, and all I can say is that this is another network effect that leads to more centralization. If you like Simple DB, it’s another reason to bring all of your code inside Amazon’s cloud.

– All fields are strings, and they are limited to 1024 characters. Savvy developers can use the 1024 characters to find unlimited size files on S3, as well as other methods like combining fields to get around this limit. Mind you, a lot can be done with that, but it is again a difference from traditional RDMS systems and it means more work for developers that must overcome the limitation.

– There are no joins, if you want them (and many proponents of hugely scalable sites view joins as evil), you have to roll your own.

– Transactions and consistency are also absent. Reads are not guaranteed to be fully up to date with writes.

– There is no indexing and a whole host of other trappings that database afficionados have gotten comfortable with.

Mind you, serious web software is created within these limitations including some at Amazon itself. In exchange for living with them, you get massively scalable database access at good performance and very cheaply. And, as Techcrunch says, you may be able to get rid of one of the highest cost IT operations jobs around, database administration, and your costs are even lower. Remember my analysis that shows SaaS vendors need to achieve 16:1 operations cost advantages over conventional software and you can see this is a big step in that direction already.

There is no doubt that cloud computing will be massively disruptive, and that Amazon are well on their way in the race to plant their flag at the top of the mountain. The pace of progress for Amazon Web Services has been blistering this year, and much more hype free than what we’ve gotten from the likes of Google and Facebook when it comes to platform speak. It’s almost odd that we haven’t heard more from these other players, and especially from the likes of Google. GigaOm says that Simple DB completes the Amazon Web Services Trifecta. They go on to say that Amazon’s announcements have the feel of a well thought out long term strategy, while Google’s make it sound like the ad hoc grab bag of tools. I think that’s true, and perhaps reflective of Google’s culture, which is hugely decentralized to the point of giving developers 20% free time to work on projects of their choosing. The problem is that such a culture can more easily give us a grab bag of applications, as Google has, than it can provide a well-designed platform, as Amazon has. Or, as Mathew Ingram puts it, while everyone else was talking about it, Amazon went ahead and did it.

I’ve talked to a dozen or so startups that are eagerly working with the Amazon Web Services and having great success, as well as some frustrations. They require rethinking the old ways. Integrity issues are particularly different in this brave new world, as are issues of latency. That matters to how a lot of folks think about their applications. Because of the learning curve, I don’t plan to go out and short Oracle immediately, but the sand has started running in the hourglass. There will be more layers added to the cloud, and over time it will become harder and harder to ignore. There will be economic advantage to those who embrace the new ways, and penalties for those who don’t. This is a bet-your-business drama that’s unfolding, make no mistake. At the very least, you need to get yourself educated about what these kinds of services offer and what they mean for application architecture.

Business located low in the stack I’ve mentioned will be hit hard if they don’t have a strategy to embrace and win a piece of the cloud computing New Deal. We’re talking hardware manufacturers like Sun, Dell, IBM, and HP. Software infrastructure comes next. Applications that depend on low cost delivery, aka SaaS, are also very much in the crosshairs, although probably at a slightly later date.

Welcome to the brave new world of utility cloud computing. Long live the server, the server is dead!

Related Articles

Amazon Raises the Cloud Platform Bar Again With DevPay

Coté’s Excellent Description of the Microsoft Web Rift : Nice post on cloud computing at Microsoft

Posted in amazon, data center, ec2, grid, platforms, saas, Web 2.0 | 10 Comments »

Scalability is a Requirement for Startups

Posted by Bob Warfield on December 6, 2007

Dharmesh Shah wonders whether startups should ignore scalability:

You’re worrying about scalability too early. Don’t blow your limited resources on preparing for success. Instead, spend them on increasing the chances that you’ll actually succeed.

It’s an interesting question: should startups worry about scalability, or does that get in the way of finding a proper product/market fit? If you’ve read my blog much you’ll know that I view achieving that product/market fit as the highest priority for a startup, and I’m not alone, Marc Andreesen says it too. I think this is so important that I have advocated some relatively radical architectural ramifications to help facilitate the flexibility of a product so it can evolve towards that ideal even faster.

But where does scalability fit in? Can you achieve that product/market fit without it? For most startups, I think it is either difficult to verify a true product/market fit without it, or worse, you may achieve it only to immediately fall to earth a victim of poor user experience. There are certainly plenty of examples of companies that started out great, seemed to have that product/market fit, but got into persistent hot water because they couldn’t scale out a good user experience when their site began to take off. Fred Wilson writes recently about his love/hate relationship with Technorati, which has been a good example of this.

Here is another question, “How much success do you need to verify product/market fit?” Signing up a few customers to a beta, or even having a large beta is not really enough in my opinion. It’s pretty easy to get a ton of people to try something that sounds sexy and is promoted well. The question is whether that really takes well enough. Marc Andreesen’s Ning is a good example. When they launched their original product it required a fair amount of custom programming to create a custom Social Network. They had 30,000 social networks created even so, but the service wasn’t taking off. Michael Arrington was calling it R.I.P. Then they released a version that eliminated the need for programming and suddenly the product/market fit was there and it took off like a rocket, crossing 100,000 social networks in record time. Clearly Ning had to deal with scalability before they could learn much about their product/market fit.

Google is another great example of this. They had to scale from day one because of the problem they were solving. Om Malik says their infrastructure and ability to scale is actually their strategic advantage. Certainly the nature of the problem Google wanted to solve required scalability from day one. This is how Aloof Schipperke wants to view the question when he says, “Scalability is a requirement, not an optimization.” It’s a bit of a double entendre. One could say it is a requirement that all startups deal with it, or one could say startups need to evaluate whether scalability is a requirement in their domain. I’m in that latter camp. Figure out what success really looks like. When do you know you have product/market fit? Be conservative. What are the requirements to get there? Aloof lumps scalability in with other “ilities”. Can your startup reach product/market fit without security, for example? The answers may surprise you if you’re really honest about it.

Chances are, you may have to do more to be sure about product/market fit than you are comfortable with in release 1.0. You’ll need a phased plan for how to get there. Lest you use this as an excuse to ignore scalability until the last minute, keep in mind that these phased plans should have short milestones. Quarterly or six month iterations at most. Scaling a really poorly architected application can amount to a painful rewrite. So do a phasing plan for scaling. What are the big ticket items you’ll need to enable early so that scaling later is not too hard? There are a few well-known touchpoints that can make scalability easier. I’m not going to go over all of them, you know what I’m talking about: statelessness, RESTful web services, and beware the database. If you don’t know about these things, get some people on your team who do! It’s not hard to start out with a plan in mind about your eventual scalablity and just make sure that along the way you don’t inadvertently shoot yourself in the foot. It usually boils down to securing the two ends of the puzzle with good scalability:

– How will the client side web servers scale?

– How will the database back end scale?

Make a plan for what it will look like when it’s done, and put phased milestones in place to get there over time.

Here’s another key issue. Dharmesh’s original question assumes scalability and user/experience compete for scarce resources. Ed Sim somewhat follows this path too when he writes that it’s hard to sell Scalability. Aren’t we talking about the tradeoffs between UI/Features and Infrastructure (web or DB)? Are the same engineers really doing both things? It seems to me a lot more common to have a “front end” or application group and a “back end” or infrastructure group, even if “group” is a bit grandiose for a couple of people. Take the opportunity to map out how the modules produced by these two groups will communicate. Make that communication architecturally clean so the groups are decoupled. Make the communication work the way it will when you build out scalability, but then don’t build it out at first. This will enable the infrastructure group’s agenda to decouple from the user experience guys.

BTW, if you’re thinking the true competition between the two is you want to hire all user experience people with your capital and no infrastructure, that just sounds like a bad idea to me. It’s hard to deliver good user experience if your infrastructure is lousy, buggy, and doesn’t perform. There are ample studies that show the speed with which your application serves up pages is a big contributor to user experience as well.

I’ve gone down this path before of having essentially two small teams and making sure there was clean communication between their code from the start. My company PriceRadar was lucky enough to land a partnership with AskJeeves early on. Part of the deal was we had to pass a load test that showed we could handle 10,000 simultaneous users hammering our application. At the time, most of my experience and developers were from the Microsoft world, so we were .NET all the way. I remember meeting with the advisory board for a company called iSharp. It was an all-star cast of web application CTO’s and VP’s of Engineering. We went around the table to hear what everyone was doing. I was the only Microsoft guy in the room, and the Unix crowd just laughed when I told them we had to pass this big load test. AskJeeve’s CTO was there as well as the fellow in charge of AOL Instant Messenger and about 10 others. They flat said it was impossible on Unix. In less than a month we had it all working with a distributed grid architecture. The front end guys were never even involved and changed little or no code. The back end guys didn’t sleep much, but they emerged triumphant. And the entire team was about 10 developers, per my small team mentality.

Yes Virginia, you should worry about your scalability, but it need not be all consuming. You can handle it.

Posted in data center, grid, multicore, platforms, saas, strategy, Web 2.0 | 2 Comments »

« Previous Entries

	Camels to Cars, Arti… on A Picture of the Multicore Cri…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	Start With an Audien… on The Very First Thing a Foundin…
	Breaking through the… on Reflections on Six Years of Co…

SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Blog Tools

Archives

Recent Comments

Pages

Top Posts

Recent Posts

Meta

Archive for the ‘grid’ Category

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Like this:

What’s Hadoop Good For?

Like this:

Google Reports iPhone Usage 50x Other Handsets; Amazon S3 Goes Down: Low Friction Has a Cost

Like this:

When Do The SaaS Acquisition Games Begin? (A Primer on Cloud Computing Market Segments)

Like this:

IBM Trying to Keep Up With the Cloud Jones’s

Like this:

Amazon Raises the Cloud Platform Bar Again With DevPay

Like this:

Eventual Consistency Is Not That Scary

Like this:

What if Twitter Was Built on Amazon’s Cloud?

Like this:

To Rule the Clouds Takes Software: Why Amazon SimpleDB is a Huge Next Step

Like this:

Scalability is a Requirement for Startups

Like this:

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Blog Tools

Tags

Archives

Recent Comments

Pages

Top Posts

Recent Posts

Meta

Archive for the ‘grid’ Category

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: