SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘ec2’ Category

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Posted by Bob Warfield on August 27, 2010

My head is starting to hurt with all the back and forth among my Enterprise Irregulars buddies about the relationships between the complex concepts of Multitenancy, Private, and Public Clouds.  A set of disjoint conversations and posts came together like the whirlpool in the bottom of a tub when it drains.  I was busy with other things and didn’t get a chance to really respond until I was well and truly sucked into the vortex.  Apologies for the long post, but so many wonderful cans of worms finally got opened that I just have to try to deal with a few of them.  That’s why I love these Irregulars!

To start, let me rehash some of the many memes that had me preparing to respond:

–  Josh Greenbaum’s assertion that Multitenancy is a Vendor, not a Customer Issue.  This post includes some choice observations like:

While the benefits that multi-tenancy can provide are manifold for the vendor, these rationales don’t hold water on the user side.

That is not to say that customers can’t benefit from multi-tenancy. They can, but the effects of multi-tenancy for users are side-benefits, subordinate to the vendors’ benefits. This means, IMO, that a customer that looks at multi-tenancy as a key criteria for acquiring a new piece of functionality is basing their decision on factors that are not directly relevant to their TCO, all other factors being equal.


Multi-tenancy promises to age gracelessly as this market matures.

Not to mention:

Most of the main benefits of multi-tenancy – every customer is on the same version and is updated simultaneously, in particular – are vendor benefits that don’t intrinsically benefit customers directly.

The implication being that someone somewhere will provide an alternate technology very soon that works just as good or better than multitenancy.  Wow.  Lots to disagree with there.  My ears are still ringing from the sound of the steel gauntlet that was thrown down.

–  Phil Wainewright took a little of the edge of my ire with his response post to Josh, “Single Tenancy, the DEC Rainbow of SaaS.”  Basically, Phil says that any would-be SaaS vendor trying to create an offering without multitenancy is doomed as the DEC Rainbow was.  They have some that sort of walks and quacks like a SaaS offering but that can’t really deliver the goods.

–  Well of course Josh had to respond with a post that ends with:

I think the pricing and services pressure of the multi-tenant vendors will force single-tenant vendors to make their offerings as compatible as possible. But as long as they are compatible with the promises of multi-tenancy, they don’t need to actually be multi-tenant to compete in the market.

That’s kind of like saying, “I’m right so long as nothing happens to make me wrong.”  Where are the facts that show this counter case is anything beyond imagination?  Who has built a SaaS application that does not include multitenancy but that delivers all the benefits?

Meanwhile back at the ranch (we EI’s need a colorful name for our private community where the feathers really start to fly as we chew the bones of some good debates), still more fascinating points and counterpoints were being made as the topic of public vs private clouds came up (paraphrasing):

–  Is there any value in private clouds?

–  Do public clouds result in less lock-in than private clouds?

–  Are private clouds and single tenant (sic) SaaS apps just Old School vendors attempts to hang on while the New Era dawns?  Attempts that will ultimately prove terribly flawed?

–  Can the economics of private clouds ever compete with public?

–  BTW, eBay now uses Amazon for “burst” loads and purchases servers for a few hours at a time on their peak periods.  Cool!

–  Companies like Eucalyptus and Nimbula are trying to make Private Clouds that are completely fungible with Public Clouds.  If you  in private cloud frameworks like these means you have
to believe companies are going to be running / owning their own servers for a long time to come even if the public cloud guys take over a number of compute workloads.  The Nimbula guys built EC2 and they’re no dummies, so if they believe in this, there must be something to it.

–  There are two kinds of clouds – real and virtual.  Real clouds are multi-tenant. Virtual clouds are not. Virtualization is an amazing technology but it can’t compete with bottoms up multi-tenant platforms and apps.

Stop!  Let me off this merry go-round and let’s talk.

What It Is and Why Multitenancy Matters

Sorry Josh, but Multitenancy isn’t marketing like Intel Inside (BTW, do you notice Intel wound up everywhere anyway?  That wasn’t marketing either), and it matters to more than just vendors.  Why?

Push aside all of the partisan definitions of multitenancy (all your customers go in the same table or not).   Let’s look at the fundamental difference between virtualization and multitenancy, since these two seem to be fighting it out.

Virtualization takes multiple copies of your entire software stack and lets them coexist on the same machine.  Whereas before you had one OS, one DB, and one copy of your app, now you may have 10 of each.  Each of the 10 may be a different version entirely.  Each may be a different customer entirely, as they share a machine.  For each of them, life is just like they had their own dedicated server.  Cool.  No wonder VMWare is so successful.  That’s a handy thing to do.

Multitenancy is a little different.  Instead of 10 copies of the OS, 10 copies of the DB, and 10 copies of the app, it has 1 OS, 1 DB, and 1 app on the server.  But, through judicious modifications to the app, it allows those 10 customers to all peacefully coexist within the app just as though they had it entirely to themselves.

Can you see the pros and cons of each?  Let’s start with cost.  Every SaaS vendor that has multitenancy crows about this, because its true.  Don’t believe me?  Plug in your VM software, go install Oracle 10 times across 10 different virtual machines.  Now add up how much disk space that uses, how much RAM it uses when all 10 are running, and so on.  This is before you’ve put a single byte of information into Oracle or even started up an app.  Compare that to having installed 1 copy of Oracle on a machine, but not putting any data into it.  Dang!  That VM has used up a heck of a lot of resources before I even get started!

If you don’t think that the overhead of 10 copies of the stack has an impact on TCO, you either have in mind a very interesting application + customer combination (some do exist, and I have written about them), or you just don’t understand.  10x the hardware to handle the “before you put in data” requirements are not cheap.  Whatever overhead is involved in making that more cumbersome to automate is not cheap.  Heck, 10x more Oracle licenses is very not cheap.  I know SaaS companies who complain their single biggest ops cost is their Oracle licenses. 

However, if all works well, that’s a fixed cost to have all those copies, and you can start adding data by customers to each virtual Oracle, and things will be okay from that point on.  But, take my word for it, there is no free lunch.  The VM world will be slower and less nimble to share resources between the different Virtual Machines than a Multitenant App can be.  The reason is that by the time it knows it even needs to share, it is too late.  Shifting things around to take resource from one VM and give it to another takes time.  By contrast, the Multitenant App knows what is going on inside the App because it is the App.  It can even anticipate needs (e.g. that customer is in UK and they’re going to wake up x hours before my customers in the US, so I will put them on the same machine because they mostly use the machine at different times).

So, no, there is not some magic technology that will make multitenant obsolete.  There may be some new marketing label on some technology that makes multitenancy automatic and implicit, but if it does what I describe, it is multitenant.  It will age gracefully for a long time to come despite the indignities that petty competition and marketing labels will bring to bear on it.

What’s the Relationship of Clouds and Multitenancy?

Must Real Clouds be Multitenant?

Sorry, but Real Clouds are not Multitenant because they’re based on Virtualization not Multitenancy in any sense such as I just defined.  In fact, EC2 doesn’t share a core with multiple virtual machines because it can’t.  If one of the VM’s started sucking up all the cycles, the other would suffer terrible performance and the hypervisors don’t really have a way to deal with that.  Imagine having to shut down one of the virtual machines and move it onto other hardware to load balance.  That’s not a simple or fast operation.  Multi-tasking operating systems expect a context switch to be as fast as possible, and that’s what we’re talking about.  That’s part of what I mean by the VM solution being less nimble.  So instead, cores get allocated to a particular VM.  That doesn’t mean a server can’t have multiple tenants, just that at the granularity of a core, things have to be kept clean and not dynamically moved around. 

Note to rocket scientists and entrepreneurs out there–if you could create a new hardware architecture that was really fast at the Virtual Machine load balancing, you would have a winner.  So far, there is no good hardware architecture to facilitate a tenant swap inside a core at a seamless enough granularity to allow the sharing.  In the Multicore Era, this would be the Killer Architecture for Cloud Computing.  If you get all the right patents, you’ll be rich and Intel will be sad.  OTOH, if Intel and VMWare got their heads together and figured it out, it would be like ole Jack Burton said, “You can go off and rule the universe from beyond the grave.”

But, it isn’t quite so black and white.  While EC2 is not multitenant at the core level, it sort of is at the server level as we discussed.  And, services like S3 are multitenant through and through.  Should we cut them some slack?  In a word, “No.”  Even though an awful lot of the overall stack cost (network, cpu, and storage) is pretty well multitenant, I still wind up installing those 10 copies of Oracle and I still have the same economic disadvantage as the VM scenario.  Multitenancy is an Application characteristic, or at the very least, a deep platform characteristic.  If I build my app on, it is automatically multitenant.  If I build it on Amazon Web Services, it is not automatic.

But isn’t there Any Multitenant-like Advantage to the Cloud?  And how do Public and Private Compare?

Yes, there are tons of benefits to the Cloud, and through an understanding and definition of them, we will tease out the relationship of Public and Private Clouds.  Let me explain…

There are two primary advantages to the Cloud:  it is a Software Service and it is Elastic.  If you don’t have those advantages, you don’t have a Cloud.  Let’s drill down.

The Cloud is a Software Service, first and foremost.  I can spin up and control a server entirely through a set of API’s.  I never have to go into a Data Center cage.  I never have to ask someone at the Data Center to go into the Cage (though that would be a Service, just not a Software Service, an important distinction).  This is powerful for basically the same reasons that SaaS is powerful versus doing it yourself with On-prem software.  Think Cloud = SaaS and Data Center = On Prem and extrapolate and you’ll have it. 

Since Cloud is a standardized service, we expect all the same benefits as SaaS:

– They know their service better than I do since it is their whole business.  So I should expect they will run it better and more efficiently.

– Upgrades to that service are transparent and painless (try that on your own data center, buddy!).

– When one customer has a problem, the Service knows and often fixes it before the others even know it exists.  Yes Josh, there is value in SaaS running everyone on the same release.  I surveyed Tech Support managers one time and asked them one simple question:  How many open problems in your trouble ticketing system are fixed in the current release?  The answers were astounding–40 to 80%.  Imagine a world where your customers see 40 to 80% fewer problems.  It’s a good thing!

– That service has economic buying power that you don’t have because it is aggregated across many customers.  They can get better deals on their hardware and order so much of it that the world will build it precisely to their specs.  They can get stuff you can’t, and they can invest in R&D you can’t.  Again, because it is aggregated across many customers.  A Startup running in the Amazon Cloud can have multipe redundant data centers on multiple continents.  Most SaaS companies don’t get to building multiple data centers until they are way past having gone public. 

–  Because it is a Software Service, you can invest your Ops time in automation, rather than in crawling around Data Center cages.  You don’t need to hire anyone who knows how to hot swap a disk or take a backup.  You need peeps who know how to write automation scripts.  Those scripts are a leveragable asset that will permanently lower your costs in a dramatic way.  You have reallocated your costs from basic Data Center grubbing around (where does this patch cable go, Bruce?), an expense, to actually building an asset.

The list goes on.

The second benefit is Elasticity.  It’s another form of aggregation benefit.  They have spare capacity because everyone doesn’t use all the hardware all the time.  Whatever % isn’t utilized, it is a large amount of hardware, because it is aggregated.  It’s more than you can afford to have sitting around idle in your own data center.  Because of that, they don’t have to sell it to you in perpetuity.  You can rent it as you need it, just like eBay does for bursting.  There are tons of new operational strategies that are suddenly available to you by taking advantage of Elasticity.

Let me give you just one.  For SaaS companies, it is really easy to do Beta Tests.  You don’t have to buy 2x the hardware in perpetuity.  You just need to rent it for the duration of the Beta Test and every single customer can access their instance with their data to their heart’s content.  Trust me, they will like that.

What about Public Versus Private Clouds?

Hang on, we’re almost there, and it seems like it has been a worthwhile journey.

Start with, “What’s a Private Cloud?”  Let’s take all the technology of a Public Cloud (heck, the Nimbulla guys built EC2, so they know how to do this), and create a Private Cloud.  The Private Cloud is one restricted to a single customer.  It’d be kind of like taking a copy of’s software, and installing it at Citibank for their private use.  Multitenant with only one tenant.  Do you hear the sound of one hand clapping yet?  Yep, it hurts my head too, just thinking about it.  But we must.

Pawing through the various advantages we’ve discussed for the Cloud, there are still some that accrue to a Cloud of One Customer:

–  It is still a Software Service that we can control via API’s, so we can invest in Ops Automation.  In a sense, you can spin up a new Virtual Data Center (I like that word better than Private Cloud, because it’s closer to the truth) on 10 minutes notice.  No waiting for servers to be shipped.  No uncrating and testing.  No shoving into racks and connecting cables.  Push a button, get a Data Center.

–  You get the buying power advantages of the Cloud Vendor if they supply your Private Cloud, though not if you buy software and build  your Private Cloud.  Hmmm, wonder what terminology is needed to make that distinction?  Forrester says it’s either a Private Cloud (company owns their own Cloud) or a Hosted Virtual Private Cloud.  Cumbersome.

But, and this is a huge one, the granularity is huge, and there is way less Elasticity.  Sure, you can spin up a Data Center, but depending on its size, it’s a much bigger on/off switch.  You likely will have to commit to buy more capacity for a longer time at a bigger price in order for the Cloud Provider to recoup giving you so much more control.  They have to clear other customers away from a larger security zone before you can occupy it, instead of letting your VM’s commingle with other VM’s on the same box.  You may lose the more multitenant-like advantages of the storage cluster and the network infrastructure (remember, only EC2 was stuck being pure virtual). 

What Does it All Mean, and What Should My Company Do?

Did you see Forrester’s conclusion that most companies are not yet ready to embrace the Cloud and won’t be for a long time?

I love the way Big Organizations think about things (not!).  Since their goal is preservation of wealth and status, it’s all about risk mitigation whether that is risk to the org or to the individual career.  A common strategy is to take some revolutionary thing (like SaaS, Multitenancy, or the Cloud), and break it down into costs and benefits.  Further, there needs to be a phased modular approach that over time, captures all the benefits with as little cost as possible.  And each phase has to have a defined completion so we can stop, evaluate whether we succeeded, celebrate the success, punish those who didn’t play the politics well enough, check in with stakeholders, and sing that Big Company Round of Kumbaya.  Yay!

In this case, we have a 5 year plan for CIO’s.  Do you remember anything else, maybe from the Cold War, that used to work on 5 year plans?  Never mind.

It asserts that before you are ready for the Cloud, you have to cross some of those modular hurdles:

A company will need a standardized operating procedure, fully-automated deployment and management (to avoid human error) and self-service access for developers. It will also need each of its business divisions – finance, HR, engineering, etc – to be sharing the same infrastructure.  In fact, there are four evolutionary stages that it takes to get there, starting with an acclimation stage where users are getting used to and comfortable with online apps, working to convince leaders of the various business divisions to be guinea pigs. Beyond that, there’s the rollout itself and then the optimization to fine-tune it.

Holy CYA, Batman!  Do you think eBay spent 5 years figuring out whether it could benefit from bursting to the Cloud before it just did it?

There’s a part of me that says if your IT org is so behind the times it needs 5 years just to understand it all, then you should quit doing anything on-premise and get it all into the hands of SaaS vendors.  They’re already so far beyond you that they must have a huge advantage.  There is a another part that says, “Gee guys, you don’t have to be able to build an automobile factory as good as Toyota to be able to drive a car.”

But then sanity and Political Correctness prevail, I come back down to Earth, and I realize we are ready to summarize.  There are 4 levels of Cloud Maturity (Hey, I know the Big Co IT Guys are feeling more comfortable already, they can deal with a Capability and Maturity Model, right?):

Level 1:  Dabbling.  You are using some Virtualization or Cloud technology a little bit at your org in order to learn.  You now know what a Machine Image is, and you have at least seen a server that can run them and swapped a few in and out so that you experience the pleasures of doing amazing things without crawling around the Data Center Cage.

Level 2:  Private Cloud.  You were impressed enough by Level 1 that you want the benefits of Cloud Technology for as much of your operation as you can as fast as you can get it.  But, you are not yet ready to relinquish much of any control.  For Early Level 2, you may very well insist on a Private Cloud you own entirely.  Later stage Level 2 and you will seek a Hosted Virtual Private Cloud.

Level 3:  Public Cloud.  This has been cool, but you are ready to embrace Elasticity.  You tripped into it with a little bit of Bursting like eBay, but you are gradually realizing that the latency between your Data Center and the Cloud is really painful.  To fix that, you went to a Hosted Virtual Private Cloud.  Now that your data is in that Cloud and Bursting works well, you are realizing that the data is already stepping outside your Private Cloud pretty often anyway.  And you’ve had to come to terms with it.  So why not go the rest of the way and pick up some Elasticity?

Level 4:  SaaS Multitenant.  Eventually, you conclude that you’re still micromanaging your software too much and it isn’t adding any value unique to your organization.  Plus, most of the software you can buy and run in your Public Cloud world is pretty darned antiquated anyway.  It hasn’t been rearchitected since the late 80’s and early 90’s.  Not really.  What would an app look like if it was built from the ground up to live in the Cloud, to connect Customers the way the Internet has been going, to be Social, to do all the rest?  Welcome to SaaS Multitenant.  Now you can finally get completely out of Software Operations and start delivering value.

BTW, you don’t have to take the levels one at a time.  It will cost you a lot more and be a lot more painful if you do.  That’s my problem with the Forrester analysis.  Pick the level that is as far along as you can possibly stomach, add one to that, and go.  Ironically, not only is it cheaper to go directly to the end game, but each level is cheaper for you on a wide scale usage basis all by itself.  In other words, it’s cheaper for you to do Public Cloud than Private Cloud.  And it’s WAY cheaper to go Public Cloud than to try Private Cloud for a time and then go Public Cloud.  Switching to a SaaS Multitenant app is cheaper still.

Welcome to crazy world of learning how to work and play well together when efficiently sharing your computing resources with friends and strangers!

Posted in amazon, cloud, data center, ec2, enterprise software, grid, multicore, platforms, saas, service | 15 Comments »

Netflix’s Movie Cloud is Moving into the Amazon Cloud

Posted by Bob Warfield on May 7, 2010

Netfllix has always been an extremely progressive company.  I know the founders, Marc Randolph and Reed Hastings well, and many of the employees too.  There is an amazing amount of brainpower behind the scenes there and it shows with their great products and great story.

I read with interest Larry Dignan’s piece about their usage of Amazon Web Services to move key parts of the Netflix infrastructure into the Cloud.  It doesn’t seem that long since I remember being asked to visit Netflix and tell them about my company’s experience moving into the Amazon Cloud.  I expected to meet in Reed Hasting’s office with perhaps a couple of people, but was surprised to find they had assembled a small auditorium of developers to hear the story.  I spent a little over an hour telling them how we’d done it and answering questions and then went away.

As an aside, this is how smart companies go to school–by sharing information broadly rather than hoarding it at the top, and by bringing in outsiders who can add to the collective knowledge pool.  When was the last time your company did something like this?  It’s so easy here in Silicon Valley, which is dense with sharp insights and hard-won experience.  Take advantage of it, it’s the least you can do after paying the high cost of living here!

I admit, I wondered whether they’d carry it off or even get started, or whether they were just curious.  Moving to the Cloud is a big step for a big thriving company.  There are a lot of moving parts that have to be orchestrated for it to be successful.  But as I said, they are an extremely progressive company with a lot of very bright people.  Color me very impressed with the speed at which they were able to move.

Cool beans!

PS  Amazon has a press release / case study with more detail on just what Netflix is doing.

Posted in amazon, cloud, ec2, saas | 2 Comments »

Minimizing the Cost of SaaS Operations

Posted by Bob Warfield on March 29, 2010

SaaS software is much more dependent on being run by the numbers than conventional on-premises software because the expenses are front loaded and the costs are back loaded.  SAP learned this the hard way with its Business By Design product, for example.  If you run the numbers, there is a high degree of correlation between low-cost of delivering the service and high growth rates among public SaaS companies.  It isn’t hard to understand–every dollar spent delivering the service is a dollar that can’t be spent to find new customers or improve the service.

So how do you lower your cost to deliver a SaaS service? 

At my last gig, Helpstream, we got our cost down to 5 cents per seat per year.  I’ve talked to a lot of SaaS folks and nobody I’ve yet met got even close.  In fact, they largely don’t believe me when I tell them what the figures were.  The camp that is willing to believe immediately wants to know how we did it.  That’s the subject of this “Learnings” blog post.  The formula is relatively complex, so I’ll break it down section by section, and I’ll apologize up front for the long post.

Attitude Matters:  Be Obsessed with Lowering Cost of Service

You get what you expect and inspect.  Never a truer thing said than in this case.  It was a deep-seated part of the Helpstream culture and strategy that Cost of Service had to be incredibly low.  So low that we could exist on an advertising model if we had to.  While we never did, a lot was invested in the critical up front time when it mattered to get the job done.  Does your organization have the religion about cutting service cost, or are there 5 or 6 other things that you consider more important?

Go Multi-tenant, and Probably go Coarse-grained Multi-tenant

Are you betting you can do SaaS well enough with a bunch of virtual machines, or did you build a multi-tenant architecture?  I’m skeptical about your chances if you are in the former camp unless your customers are very very big.  Even so, the peculiar requirements of very big customers (they will insist on doing things their way and you will cave) will drive your costs up.

Multi-tenancy lets you amortize a lot of costs so that they’re paid once and benefit a lot of customers.  It helps smooth loads so that as one customer has a peak load others probably don’t.  It clears the way to massive operations automation which is much harder in a virtual machine scenario.

Multi-tenancy comes in a lot of flavors.  For this discussion, let’s consider fine-grained versus coarse-grained.  Fine grain is the Salesforce model.  You put all the customers together in each table and use a field to extract them out again.  Lots of folks love that model, even to a religious degree that decrees only this model is true multi-tenancy.  I don’t agree.  Fine grained is less efficient.  Whoa!  Sacrilege!  But true, because you’re constantly doing the work of separating one tenant’s records from another.  Even if developers are protected from worrying about it by clever layering of code, it can’t help but require more machine resources to constantly sift records.

Coarse-grained means every customer gets their own database, but these many databases are all on the same instance of the database server.  This is the model we used at Helpstream.  It turns out that a relatively vanilla MySQL architecture can support thousands of tenants per server.  That’s plenty!  Moreover, it requires less machine resources and it scales better.  A thread associated with a tenant gets access to the one database right up front and can quit worrying about the other customers right then.  A server knows that the demands on a table only come from one customer and it can allocate cpus table by table.  Good stuff, relatively easy to build, and very efficient.

The one down side of coarse grain I have discovered is that its hard to analyze all the data across customers because it’s all in separate tables.  Perhaps the answer is a data warehouse constructed especially for the purpose of such analysis that’s fed from the individual tenant schemas.

Go Cloud and Get Out of the Datacenter Business

Helpstream ran in the Amazon Cloud using EC2, EBS, and S3.  We had help from OpSource because you can’t run mail servers in the Amazon Cloud–the IP’s are already largely black listed due to spammers using Amazon.  Hey, spammers want a low-cost of ops too!

Being able to spin up new servers and storage incrementally, nearly instantly (usually way less than 10 minutes for us to create a  new multi-tenant “pod”), and completely from a set of API’s radically cuts costs.  Knowing Amazon is dealing with a lot of the basics like the network infrastructure and replicating storage to multiple physical locations saves costs.  Not having to crawl around cages, unpack servers, or replace things that go bad is priceless. 

Don’t mess around.  Unless your application requires some very special hardware configuration that is unavailable from any Cloud, get out of the data center business.  This is especially true for small startups who can’t afford things like redundant data centers in multiple locations.  Unfortunately, it is a hard to impossible transition for large SaaS vendors that are already thoroughly embedded in their Ops infrastructure.  Larry Dignan wrote a great post capturing how Helpstream managed the transition to Amazon.

Build a Metadata-Driven Architecture

I failed to include this one in my first go-round because I took it for granted people build Metadata-driven architectures when they build Multi-tenancy.  But that’s only partially true, and a metadata-driven architecture is a very important thing to do.

Metadata literally means data about data.  For much of the Enterprise Software world, data is controlled by code, not data.  Want some custom fields?  Somebody has to go write some custom code to create and access the fields.  Want to change the look and feel of a page?  Go modify the HTML or AJAX directly.

Having all that custom code is anathema, because it can break, it has to be maintained, its brittle and inflexible, and it is expensive to create.  At Helpstream, we were metadata happy, and proud of it.  You could get on the web site and provision a new workspace in less than a minute–it was completely automated.  Upgrades for all customers were automated.  A tremendous amount of customization was available through configuration of our business rules platform.  Metadata gives your operations automation a potent place to tie in as well.

Open Source Only:  No License Fees!

I know of SaaS businesses that say over half their operating costs are Oracle licenses.  That stuff is expensive.  Not for us.  Helpstream had not a single license fee to pay anywhere.  Java, MySQL, Lucene, and a host of other components were there to do the job.

This mentality extends to using commodity hardware and Linux versus some fancy box and an OS that costs money too.  See for example Salesforce’s switch.

Automate Operations to Death

Whatever your Operations personnel do, let’s hope it is largely automating and not firefighting.  Target key areas of operational flexibility up front.  For us, this was system monitoring, upgrades, new workspace provisioning, and the flexibility to migrate workspaces (our name for a single tenant) to different pods (multi-tenant instances). 

Every time there is a fire to be fought, you have to ask several questions and potentially do more automation:

1.  Did the customer discover the problem and bring it to our attention?  If so, you need more monitoring.  You should always know before your customer does.

2.  Did you know immediately what the problem was, or did you have to do a lot of digging to diagnose?  If you had to do digging, you need to pump up your logging and diagnostics.  BTW, the most common Ops issue is, “Your service is too slow.”  This is painful to diagnose.  It is often an issue with the customer’s own network infrastructure for example.  Make sure to hit this one hard.  You need to know how many milliseconds were needed for each leg of the journey.  We didn’t finish this one, but were actively thinking of implementing capabilities like Google uses to tell with code at the client when a page seems slow.  Our pages all carried a comment that told how long it took at the server side.  By comparing that with a client side measure of time, we would’ve been able to tell whether it was “us” or “them” more easily.

3.  Did you have to perform a manual operation or write code to fix the problem?  If so, you need to automate whatever it was.

This all argues for the skillset needed by your Ops people, BTW.  It also argues to have Ops be a part of Engineering, because you can see how much impact there is on the product’s architecture.

Hit the Highlights of Efficient Architecture

Without going down the rathole of premature optimization, there is a lot of basic stuff that every architecture should have.  Thread pooling.  Good clean multi-threading that isn’t going to deadlock.  Idempotent operations and good use of transactions with rollback in the face of errors.  Idempotency means if the operation fails you can just do it again and everything will be okay.  Smart use of caching, but not too much caching.  How does your client respond to dropped connections?  How many round trips does the client require to do a high traffic page?

We used Java instead of one of the newer slower languages.  Sorry, didn’t mean to be pejorative, and I know this is a religious minefield, but we got value from Java’s innate performance.  PHP or Python are pretty cool, but I’m not sure they are what you want to squeeze every last drop of operations cost out of your system.  The LAMP stack is cheap up front, but SaaS is forever.

Carefully Match Architecture with SLA’s

The Enterprise Software and IT World is obsessed with things like failover.  Can I reach over and unplug this server and automatically failover to another server without the users ever noticing?  That’s the ideal.  But it may be a premature optimization for your particular application.  Donald Knuth says, “97% of the time: premature optimization is the root of all evil.” 

Ask yourself how much is enough?  We settled on 10 minutes with no data loss.  If our system crashed hard and had to be completely restarted, it was good enough if we could do that in less than 10 minutes and no loss of data during that time.  That meant no failover was required, which greatly simplified our architecture. 

To implement this, we ran a second MySQL replicated from the main instance and captured EBS backup snapshots from that second server.  This took the load of snapshotting off the main server and gave us a cheaper alternative to a true full failover.  If the main server died, it could be brought back up again in less than 10 minutes with the EBS volume mounted and away we would go.  The Amazon infrastructures makes this type of architecture easy to build and very successful.  Note that with coarse-grained multi-tenancy, one could even share the backup server across multiple multi-tenant instances.

Don’t Overlook the Tuning!

Tuning is probably the first thing you thought of with respect to cutting costs, right?  Developers love tuning.  It’s so satisfying to make a program run faster or scale better.  That’s probably because it is an abstract measure that doesn’t involve a customer growling about something that’s making them unhappy.

Tuning is important, but it is the last thing we did.  It was almost all MySQL tuning too.  Databases are somewhat the root of all evil in this kind of software, followed closely by networks and the Internet.  We owe a great debt of gratitude to the experts at Percona.  It doesn’t matter how smart you are, if the other guys already know the answer through experience, they win.  Percona has a LOT of experience here, folks.


Long-winded, I know.  Sorry about that, but you have to fit a lot of pieces together to really keep the costs down.  The good news is that a lot of these pieces (metadata-driven architecture, cloud computing, and so on) deliver benefits in all sorts of other ways besides lowering the cost to deliver the service.  Probably the thing I am most proud of about Helpstream was just how much software we delivered with very few developers.  We never had more than 5 while I was there.  Part of the reason for that is our architecture really was a “whole greater than the sum of its parts” sort of thing.  Of course a large part was also that these developers were absolute rock stars too!

Posted in cloud, data center, ec2, enterprise software, platforms, saas, software development | 5 Comments »

One Week Later on Amazon Web Services

Posted by Bob Warfield on December 8, 2008

Well it’s official.  My company, Helpstream, has now been running our application entirely on Amazon Web Services for a week and we’re very happy with the result–it’s better, faster, and cheaper.  We’ve gotten a more robust system for our multitenant SaaS application that’s actually cheaper and easier for us.  Customers are reporting that the application even seems faster than it had been.  The effort involved was not too bad, though we did go through a multi-stage process before committing everything to Amazon.  I’ve chronicled that process on our corporate blog if you’re interested in seeing how such transitions are done.

Meanwhile, I can’t imagine why startups are fooling around with their own data centers.  Easy for me to say, we were too just one short week ago!  But seriously folks, given the current economy and the fact that you can deliver a better service more easily and cheaply with Amazon, why wouldn’t you make that a high priority?

I remember sitting in our weekly staff meeting with my Products organization discussing how to phase the transition.  We’ve got quite a lot of business activity on the horizon, as well as over 120 customers using the service at present.  I was arguing for more baby steps and my fear that we might screw something up.  My Director of Operations made the statement that when he looked at Amazon versus the sort of datacenter a startup can run, he couldn’t understand how we could afford to wait any longer than we had to.   What he meant was that the capabilities of AWS were not something we could even begin to approach any time soon.  When we took a careful look at what we were afraid of happening in a move, it turned out there was a strategy to mitigate every single risk.  So, we put together our migration plan and got on with it.  Boy were we happy we did!

Posted in amazon, cloud, ec2, enterprise software, platforms, saas | 6 Comments »

Hurry, The Cloud Computing Platform Opportunity is Perishable!

Posted by Bob Warfield on April 7, 2008

As I write this post, many are predicting that the big announcement from Google tonight will be that it’s opening up BigTable for the world to use.  At least Kevin Burton and Mike Arrington think so.  I hope so, because the world needs a lot more cloud computing choices.  I wonder how many have figured out just how little time remains to introduce new cloud computing platforms?

Ray Ozzie has said, “[the cloud market] really isn’t being taken seriously right now by anybody except Amazon.”  He’s right on the mark:  it isn’t being taken seriously by anyone except Amazon.  The distant runner up is Benioff’s  I say distant because there are a lot of problems with it, not the least of which is an economic model that makes it completely untenable for anyone but big corporate IT to use.  Technically, it is a completely closed and proprietary environment that offers only minimal leverage.  It’s true, they’re very seirous about it, so in that sense we should add them to the list, but the way they’re going about it makes it seem less than serious.

Here’s an important tip for various big industry players who’ve made noise about Cloud Computing at various points:  it’s a perishable opportunity!  You don’t have forever to contemplate how to get in and start winning.


Because ultimately it boils down to differentiation and commoditization like any market.  The longer you wait, the more bipolar the market becomes.  Allow Amazon to get too strong and you’ll have two choices:

–  Copy Amazon’s API’s very closely and charge a lot less. 

–  Launch a radically different approach that offers big advantages in some other way.

The middle ground will be untenable.  An API or service that is only slightly better than Amazon’s but is incompatible won’t succeed.  We’ve seen this time and time again in our industry.  It’ll play out the same way here.  For a brief time everyone can be slightly different.  Then the world will discover the differences don’t matter and they’ll gravitate towards one player.  If someone already has huge momentum (e.g. Amazon), you must either be incredibly differentiated or much cheaper.  Both are pretty hard to do.

We could ask whether Amazon has already reached a stage that only the two options can fly.  I don’t think so.  Not quite anyway.  It takes longer than you’d think, although their success has been phenomenal.  My prediction is that the window to introduce a major new cloud computing platform initiative is not quite 2 years.  If you’re not out by end of 2009, you will face a major uphill struggle.  In fact, if you’re not a great big player, the window is much less.

There are significant challenges for the big players to execute quickly enough:

–  Sun never seems to execute on anything quickly enough.  Sorry guys, but the company just doesn’t evolve very fast.  That’s why you’re buying properties like MySQL, right?

–  Google wants to be a precision machine, focused on squeazing margin out of a lucrative model.  What would they do, if like Amazon, they announced this thing and suddenly had more traffic to it than their core properties?  They have a history of absorbing startups and then taking a long time to get the thing to a level they feel is commensurate with their standards.  Cloud computing is in many ways worse.  They lose control and let other people’s software run inside their firewalls on their servers. 

–  Microsoft is in the unenviable position the old RISC world was in against Intel.  They have to build everything themselves on their platforms.  There is no synergy with third parties.  It’s ironic really.  The Intel/Microsoft PC Kiretsu could divide and conquer and they were so successful even Apple finally went Intel because the others couldn’t afford to do it themselves.

–  Yahoo?  People used to talk about them in the same breath, but clearly the wheels are coming off that stagecoach.  For a big player, cloud computing is not a little investment.  Particularly now when there is quite a lot of momentum already built.  Yahoo’s bets are laid, and they’re a lame duck besides.  Count them out.

–  IBM?  Could be.  They’ve made announcements but the follow up is weak.  IBM could certainly afford to throw enough services at the problem to get it going until the technology catches up.  They can sure sell such a thing.  The biggest challenge they have is their command and control culture may never let it reach critical mass.

–  Tata et al:  Big Indian or Chinese.  Why not?  These are huge companies overseas.  They have the expertise to do quite a lot.  The Asian markets are hot, hot, hot, and they’re not that well served by Amazon.  These guys would be my bet for the odds on Dark Horse players if they get it and can get their act together.  They’re ideal as low cost providers and like IBM, they can throw service at it until they get it right.  There is surprisingly little technology required at this stage to get started at the level Amazon is at.  You need an EC2 and an S3 clone and a bit of window dressing that does something they don’t.  How about an identity system?  I’ve written about that before.  Wouldn’t you think if a service was announced business would fly to it overseas?

Meanwhile, Amazon is coming to a sort of crossroads as well.  The traffic to Amazon Web Services exceeds the traffic to the rest of their properties combined.  This is no longer a remaindering strategy for unused MIPs as many VC’s I talked to late last year seemed to feel.  Amazon is now experiencing significant growth and scaling pains for the service.  EC2 just went down for about an hour for many customers.

This is both good news and bad news for Amazon.  The good news is that they’re learning how to keep these systems up and they others haven’t even started up that learning curve.  The bad news is it annoys customers mightily. 

The other thing I watch Amazon for is signs they’ll offer anything with AWS that they didn’t already have to build for their core business.  The availability of something interesting and new would be a further signal that this is not just a remaindering business.  More importantly, it would be a further barrier to entry and exit around their valuable property.  As it stands, EC2, S3, and SimpleDB are pretty low level.  They do not represent big barriers.  All that is available in one form or another via Open Source to others who want to play.  Amazon’s expertise in billing and payment processing is more differentiated, but not compelling and as currently offered, very Amazon-centric.

Note to Werner Vogels:  it’s time look for key innovations in AWS to build lock in while you continue to make the service more robust.

Note to others:  Time is running out.  Get in the game or move on.

Note to self:  Look for a dip and buy AMZN stock.

Related Articles

Google responded well to the challenges I set forth above with App Engine.  See my blog post for more details.  By focusing on language support instead of raw virtual machines, they’ve actually raised the bar in the sort of way I keep saying Amazon needs to above and below in the comments.  I stick to my 2 year prognosis.  If you aren’t a Big Player here within 2 years, the window will close.  What Google has done is raise the ante on what you must deliver to be in the poker game.

Posted in amazon, data center, ec2, platforms, saas, Web 2.0 | 13 Comments »

Amazon Raises the Cloud Platform Bar Again With DevPay

Posted by Bob Warfield on January 1, 2008

Wow, what an exciting time to be watching the Amazon Cloud Platform evolve.  We’re just beginning to think through the recent SimpleDB announcement when Amazon launches DevPayLucid Era CEO Ken Rudin says land grabs are all about a race to the top of the mountain to plant your flag there first.  It seems like Amazon has hired a helicopter in the quest to get there first.  Google, Yahoo, and others are barely talking about their cloud platforms and here is Amazon with new developments piling up on each other.  And unlike some of the developments announced by companies like Google, this stuff is ready to go.  They’re not just talking about it.

What’s DevPay all about, anyway?  Simply put, Amazon are providing a service to automate your billing.  If you use their web services to offer a service of your own, it gives you the ability to let Amazon deal with billing for you.  It’s based off the pricing model for the rest of the Amazon Web Services like EC2 and S3, but you can use any combination of one-time charges, recurring monthly charges, and metered Amazon Web Service usage. You have total flexibility to price your applications either higher or lower than your AWS usage.  In addition, they’re promising to put everything they know about how to do e-commerce (and who knows more than Amazon?) behind making the user experience great for your customers and you.

It’s not a tremendous big step forward, but it’s useful.  It’s another brick in the wall.  There are companies out there providing SaaS infrastructure for whom billing is a big piece of their offering, so obviously it is a problem that people care about having solved.  What are the pros and cons of this particular approach?

Let’s start with the pros.  If you are going to use Amazon Web Services anyway, DevPay makes the process dead simple for you to get paid for your service.  It’s ideal for microISV’s as a way to monetize their creations.  The potential is there for interesting revenue that’s tied to usage in the classic SaaS way.

What about the cons?  Here there are many, depending on what sort of business you are in and how you want to be percieved by customers.  I break it down into two major concerns: flexibility and branding.  Let’s start with branding, which I think is the more important concern.  It’s not clear to me from the announcement how you would go about disassociating your offering from Amazon so that it becomes your stand alone brand.  You and your customers are going to have to acknowledge and accept that the offering you provide is part of the Amazon collective.  Resistance is futile.  This is the moral equivalent of not being able to accept a credit card directly, and instead having to refer customers to PayPal.  It works, but it detracts a from your “big time” image.  If having a big time stand-alone image is important for you, DevPay is a non-starter at this stage.  It’s not clear to me that Amazon would have to keep it that way for all time, but perhaps they need to protect their own image as well, and would insist on it.

Second major problem is flexibility.  Yes, Amazon says you can “use any combination of one-time charges, recurring monthly charges, and metered Amazon Web Service usage”.  That sounds flexible, but it casts your business in light of what resources it consumes on Amazon.  Suppose you want a completely different metric?  Perhaps you have another expense that is not well correlated with Amazon of some kind that has to be built in, for example.  Perhaps you need to do something completely arbitrary.  It doesn’t look to me like Amazon can facilitate that at the present.

Both of these limitations are things Amazon could choose to clean up.  So far, the impression one gets is that Amazon is just putting a pretty face on the considerable internal resources they’ve developed for their primary business and making them available.  What will be interesting is to see what happens when (and if) Amazon is prepared to add value in ways that never mattered to their core business.  Meanwhile, they’re doing a great job stealing a march on potential competition.  As a SaaS business, they should be quite sticky.  Anyone that writes for their platform will have a fair amount of work to back out and try another platform.  DevPay is another example.  It will create network lock-in by tying your customer’s business relationship in terms of billing and payment to Amazon, and in turn tying that to your use of Amazon Web Services.  For example, that same lack of flexibility might prevent you from migrating your S3 or EC2 usages to, say, Google.  There doesn’t look to be a way for you to build the Google costs into your billing in  a flexible way.

We’ll see the next 5 to 10 years be a rich period of innovation and transition to Cloud Computing Platforms.  Just as many of the original PC OS platforms disappeared (CP/M anyone?) after an initial flurry of activity, and others have changed radically in importance (it no longers matters whether you run PC or Mac does it?), so too will there be dramatic changes here.  The beneficiaries will be users as well as the platform vendors, but it’s going to take nimbleness and prescient thinking to place all your bets exactly right.  The good news is the cost of making a mistake is far less than it had been in the era of building your own datacenters!

Related Articles

To Rule the Clouds Takes Software: Why Amazon’s SimpleDB is a Huge Next Step

Coté’s Excellent Description of the Microsoft Web Rift

Posted in amazon, data center, ec2, grid, saas, strategy | 5 Comments »

What if Twitter Was Built on Amazon’s Cloud?

Posted by Bob Warfield on December 18, 2007

There was recent bellyaching in the blogosphere again about Twitter being down.  Dave Winer grumbles, “What other basic form of communication goes down for 12 hours at a time?”  There are various comments, and in the end, apparently it was about their moving ISP’s.  Twitter themselves had this to say:

Twitter is humming along now after a late night. Our team worked earnestly into the night and morning on our largest and most complex maintenance project ever. Everything went pretty much according to plan except for one thing: an incorrect switch.

The switch in question caps traffic an unacceptable level. In order to correct this, we’ll need to get some hardware installed. Unfortunately, that means we’re not done with our datacenter move just yet. This type of work can be frustrating but it’s all towards Twitter’s highest goal: reliability.

Such moves are never easy, they always include a hitch of some kind, and the Twitter customer base is hopelessly addicted to the medium so Twitter hears about it whenever the turn the thing off for any period of time.  I look at this and for me it’s just one more reason I wouldn’t want to own a datacenter.

Suppose your service, or maybe even Twitter, was built on Amazon’s Cloud or some other Utility Computing solution.  You don’t own the servers, you are renting them.  If loads go up, you can simply rent more in direct proportion to the loads and on 10 minutes notice.  A recent High Scalability article on scaling Twitter shows they don’t really have all that many servers:

  • 1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.
  • 8 Sun X4100s.
  • 10 boxes, in other words.  Now it comes time to upgrade.  Much pain and frustration.  To do it well, and without interruption, they really need 2 complete copies of their infrastructure.  This way, they can prepare the new version and start cutting users over to it while leaving the old one running.  When everyone is over, the old system can be decommissioned.  For many startups, owning twice as much hardware as they use is just out of the question.  The more successful they become, the more expensive it becomes to entertain such a luxury.  Not so on a utility computing service like Amazon’s.  Purchase the use of twice as many servers for just how long it takes for a successful upgrade and then cut them loose afterward.

    There are detractors to the Amazon approach out there, but do we really think it would make Twitter much less reliable?  What if it made it much more reliable?

    Here’s another thought that runs rampant:  how well would Amazon’s new SimpleDB work for a service like Twitter?  It seems tailormade.  Certainly the notion of a “texty” database with up to 1024 characters per field seems like a fit.  It would be fascinating to see some of the Twitterati put up a Twitter clone on Amazon’s Web Services using SimpleDB just to see how well it works and how quickly it could be put together.  Given the platform and the requirements of the application, it seems like it would not be that hard to do the experiment.  It would certainly make for an interesting test of how well Amazon’s infrastructure really works.

    Posted in data center, ec2, grid, platforms, Web 2.0 | 2 Comments »

    To Rule the Clouds Takes Software: Why Amazon SimpleDB is a Huge Next Step

    Posted by Bob Warfield on December 15, 2007

    One Ring to rule them all, One Ring to find them,
    One Ring to bring them all and in the darkness bind them…

    J. R. R. Tolkien

    There is much interesting cloud-related news in the blogosphere.  Various pundits are sharing a back and forth on the potential for cloud centralization to result in just a very few datacenters and what that might mean.  The really big news is Amazon’s fascinating new addition to their cloud platform of SimpleDB.  Let’s talk about what it all means.

    Sun’s CTO, Greg Papadopoulos, has been predicting that the earth’s compute resources will resolve into about “five hyperscale, pan-global broadband computing services giants” — with Sun, in its version of this future scenario, the primary supplier of hardware and operations software to those giants. The last was channeled via Phil Wainewright, who goes on to ask, “What is it about a computing grid that’s inherently “more centralized” in nature?”  He feels that Nick Carr has missed the mark and swallowed Sun’s line hook, line, and sinker.  For his part, Carr’s only crime was to seize on a good story, because at the same dinner, another Sun executive, Subodh Bapat, was telling Carr that sometime soon a major datacenter failure would have “major national effects.”  The irony is positively juicy with Sun talking out both sides of their proverbial mouths.

    The tradeoff that Carr and Wainewright are worried about is one of economies of scale that favor centralization versus flexibility and resiliency that favors decentralization.  Where they differ is that Carr sees economies of scale winning in a world where IT matters less and less and Wainewright favors the superior architectural possibilities of decentralization.  Is datacenter centralization inexorable?  In a word, yes, but it may not boil down to just 5 data center owners, and it may take quite a while for the forces at work to finish this evolution.  The factors that determine who the eventual winners will be are also quite interesting, and have the potential to change a lot of landscapes that today are relatively isolated.  Let’s consider what the forces of centralization are.

    First, there is a huge migration of software underway to the cloud.  In other words, software that is never installed on your machine or in your company’s datacenter.  It resides in the cloud and comes to you via the browser.  Examples include SaaS on the business side and the vast armada of consumer Web 2.0 products such as Facebook.  No category is safe from this trend, not even traditional bastions as should be clear from the growing crop of Microsoft Office competitors that reside in the cloud.

    Second, this migration leads to centralization.  The mere act of building around a cloud architecture, even if it is a private cloud in your own company’s datacenter, leads to centralization.  After all, software is moving off your desktop and into that datacenter.  When many companies are aggregated into a single datacenter, into a SaaS multi-tenant architecture, for example, further centralization occurs.  When you offer a ubiquitous service to the masses, as is the case with something like Google, the requirements to deliver that can lead to some of the largest datacenter operations in the land. 

    Third, there are the afore-mentioned economies of scale.  Google has grown so large that it now builds its own special-purpose switches and servers to enable it to grow more cheaply.  The big web empires are all built on the notion of scaling out rather than scaling up, and they run on commodity hardware.  Because they have so many servers, automating their care and feeding has been baked into their DNA.  Not so with most corporate datacenters that are just beginning to see the fruits of crude generic technologies like virtualization that seek to be all things to all people.  Virtualization is a great next step for them, but there are bigger steps ahead yet that will further reduce costs.

    Fourth, the ultimate irony is that centralization begats centralization through network effects.  This is the story of the big consumer web properties.  Every person that joins a social network adds more value to the network than the prior person did.  The value of the network grows exponentially.  This connectedness is facilitated most easily in today’s world by centralization.  Vendors that start to get traction increase their network effects in various ways:  Amazon charges to bring data in and out of their cloud, but not to transfer between services within the cloud.

    Lastly, there are green considerations at work.  The biggest costs associated with datacenters these days are around electricity and cooling.  Microsoft is building a data center in Siberia, which is both cold and pretty central to Asia.  Consider this:  given the speed of light over a fiber connection, what is the cost of latency in having a data center somewhere far north (and cold) in Canada like Winnipeg versus far south (and hot) like Austin, Texas?  It’s 1349 miles, which, as the photon travels (186,000 miles per second) is about 7.2 milliseconds.  The world’s fastest hard drive, the nifty Mtron solid state disks I’m now coveting thanks to Engadget and Kevin Burton, can only write a paltry 80K or so bytes in that time:  not even enough for one photo at decent resolution.  So consider a ring of datacenter clusters built in colder regions.  Centralized computing is up north where the cold that computers like is nearly free for the asking: just open a window many days.  Or come closer.  Put it up on a mountain peak.  Immerse it near a hydro dam and get the juice cheaper too.  It doesn’t matter.  Laying fiber is pretty cheap compared to paying the energy bills.

    The next question is trickier: how do these clouds compete?  Eventually, they will become commoditized, and they will compete on price, but we are a long ways from that point.  At least 10 years or more.  Before that can happen, customers have to agree on what the essential feature sets are for this “product”.  I believe this is where software comes into play, and that should be a matter of great concern for the hosting providers of today whose expertise largely does not revolve around software as a way to add value.   As Eric Schmidt said (via Nick Carr) when he started saying Google would enter this market:

    For clouds to reach their potential, they should be nearly as easy to program and navigate as the Web. This, say analysts, should open up growing markets for cloud search and software tools—a natural business for Google and its competitors.

    Some will immediately react with, “Hold it a minute, what about the hardware?  What about the network?”  The best of the cloud architectures will commoditize those considerations away.  In fact, commoditization will start down at the bottom of the technology stack and work its way up.  The first stage of that, BTW, is already almost over.  That was the choice of CPU.  MIPS?  PowerPC?  SPARC?  No, Intel/AMD are the winners.  The others still exist (not all of them!), but they’ve peaked and are on their way down at various terminal velocities.  Their owners need to milk them for profit, but it would be a losing battle to invest there.  Even Macs now carry Intel inside, and Sun now carries the ticker symbol “JAVA”, a not-so-subtle hat tip to the importance of software.

    Hardware boxes are largely a dead issue too.  There is too little opportunity to differentiate for very long and the cpu’s dictate an awful lot of what must be done.  Dell is an assembler and marketer of the lowest cost components delivered just in time lest they devalue in inventory.  Sun still pushes package design, and it may have some relevance to centralization, but this will be commoditized because of centralization.

    Next up will be the operating system.  Again, we’re pretty far down the path of Linux.  Corporations still carry a lot of other things inside their firewalls, but the clouds will be populated almost exclusively with Linux, and we could already see that has happened if we could get reliable statistics on it.  Linux defines the base minimum of what a cloud offering has to provide:  utility computing instances running Linux.  This is exactly what Amazon’s EC2 offers.

    What else does the cloud need?  Reliable archival storage.  Again, Amazon offers this with S3.  Cloud consumers are adopting it in droves because it makes sense.  It’s a better deal than a raw disk array because it adds value versus that disk array for archival storage.  The value is in the form of resiliency and backup.  Put the data on S3 and forget about those problems.  This begins the commoditization of storage.  Is it any wonder that EMC bought VMWare and that a software offering is now most of their market cap?  Hardware guys, put on your thinking caps, this will get much worse.  What software assets do you bring to the table.

    3Tera is a service I’ve talked about before that has a very similar offering available from multiple hosting partners of theirs.  They create a virtual SAN that you can backup and mirror at the click of a mouse.  They let you configure Linux instances to your heart’s content.  Others will follow.  IBM’s Blue Cloud offers much the same.  This collection is today’s blueprint for what the Cloud offers in terms of a platform.

    But, this platform is a moving target, and it will keep moving up the stack.  Amazon just announced another rung up with SimpleDB.  For most software that goes into the Cloud, once you have an OS and a file system, the next thing you want to see is a database.  Certainly when I attended Amazon Startup Project, the availability of a robust database solution was the number one thing folks wanted to see Amazon bring out.  The GM of EC2 promised me that this was on the way and that there would be several announcements before the end of the year.  First we saw the availability of EC2 instances that had more memory, disk, and cpu, so that they’d make better database hosts.  SimpleDB is much more ambitious.  It’s a replacement for the conventional database as embodied in products like mySQL and Oracle that was designed from the ground up to live in a cloud computing world.  At one stroke it solves a lot of very interesting problems that used to challenge would-be EC2 users around the database.

    Along the lines of my list of factors that drive data center centralization, Phil Windley says the economics are impossible to stop.  Scoble asks whether MySQL, Oracle, and SQL Server are dead:

    Since Amazon brought out its S3 storage service, I’ve seen many many startups give up data centers altogether.

    Tell me why the same thing won’t happen here.

    There is no doubt in my mind that all startups will give up having datacenters altogether before this ends.  However, before we get too head up in assuming that SimpleDB gives us that opportunity, let’s drop back and consider what it’s limitations are:

    – It is similar to a relational database, but there are significant differences.  Code will have to be reworked to run there, even if it doesn’t run afoul of the other issues.

    – Latency is a problem when your database is in another datacenter from the rest of your code.  Don MacAskill brings this one up, and all I can say is that this is another network effect that leads to more centralization.  If you like Simple DB, it’s another reason to bring all of your code inside Amazon’s cloud.

    – All fields are strings, and they are limited to 1024 characters.  Savvy developers can use the 1024 characters to find unlimited size files on S3, as well as other methods like combining fields to get around this limit.  Mind you, a lot can be done with that, but it is again a difference from traditional RDMS systems and it means more work for developers that must overcome the limitation.

    – There are no joins, if you want them (and many proponents of hugely scalable sites view joins as evil), you have to roll your own. 

    – Transactions and consistency are also absent.  Reads are not guaranteed to be fully up to date with writes.

    – There is no indexing and a whole host of other trappings that database afficionados have gotten comfortable with.

    Mind you, serious web software is created within these limitations including some at Amazon itself.  In exchange for living with them, you get massively scalable database access at good performance and very cheaply.  And, as Techcrunch says, you may be able to get rid of one of the highest cost IT operations jobs around, database administration, and your costs are even lower.  Remember my analysis that shows SaaS vendors need to achieve 16:1 operations cost advantages over conventional software and you can see this is a big step in that direction already.

    There is no doubt that cloud computing will be massively disruptive, and that Amazon are well on their way in the race to plant their flag at the top of the mountain.  The pace of progress for Amazon Web Services has been blistering this year, and much more hype free than what we’ve gotten from the likes of Google and Facebook when it comes to platform speak.  It’s almost odd that we haven’t heard more from these other players, and especially from the likes of Google.  GigaOm says that Simple DB completes the Amazon Web Services Trifecta.  They go on to say that Amazon’s announcements have the feel of a well thought out long term strategy, while Google’s make it sound like the ad hoc grab bag of tools.  I think that’s true, and perhaps reflective of Google’s culture, which is hugely decentralized to the point of giving developers 20% free time to work on projects of their choosing.  The problem is that such a culture can more easily give us a grab bag of applications, as Google has, than it can provide a well-designed platform, as Amazon has.  Or, as Mathew Ingram puts it, while everyone else was talking about it, Amazon went ahead and did it.

    I’ve talked to a dozen or so startups that are eagerly working with the Amazon Web Services and having great success, as well as some frustrations.  They require rethinking the old ways.  Integrity issues are particularly different in this brave new world, as are issues of latency.  That matters to how a lot of folks think about their applications.  Because of the learning curve, I don’t plan to go out and short Oracle immediately, but the sand has started running in the hourglass.  There will be more layers added to the cloud, and over time it will become harder and harder to ignore.  There will be economic advantage to those who embrace the new ways, and penalties for those who don’t.  This is a bet-your-business drama that’s unfolding, make no mistake.  At the very least, you need to get yourself educated about what these kinds of services offer and what they mean for application architecture.

    Business located low in the stack I’ve mentioned will be hit hard if they don’t have a strategy to embrace and win a piece of the cloud computing New Deal.  We’re talking hardware manufacturers like Sun, Dell, IBM, and HP.  Software infrastructure comes next.  Applications that depend on low cost delivery, aka SaaS, are also very much in the crosshairs, although probably at a slightly later date.

    Welcome to the brave new world of utility cloud computing.  Long live the server, the server is dead!

    Related Articles

    Amazon Raises the Cloud Platform Bar Again With DevPay

    Coté’s Excellent Description of the Microsoft Web Rift :  Nice post on cloud computing at Microsoft

    Posted in amazon, data center, ec2, grid, platforms, saas, Web 2.0 | 10 Comments »

    Cloud Computing in Someone Else’s Cloud: The Future

    Posted by Bob Warfield on November 16, 2007

    Ever hear of a fabless chip company?  This is a company that sells Integrated Circuits but owns no manufacturing facilities.  They just write software, in effect, and send it out to someone else’s fab.  Brilliant.  Many kinds of manufacturers often do the same.  After all, manufacturing may not be the distinctive competency of a company, or the company may achieve better economies of scale by using centralized manufacturing owned by much large companies.

    This is starting to happen big time with web software.  IBM just announced they’re going to join Amazon in the cloud computing business with “Blue Cloud”.  Companies will be able to buy capacity in someone else’s cloud which they sell as their own.  No need to own any hardware or even visit a colo center.  Why would you want to own a datacenter if you didn’t have to?  Why would you think you can do it as well as Amazon or IBM?  Many others including Yahoo, Google, and Microsoft will be a part of this future.  Sun is already there with Sun Grid. 

    So far, the formulas are pretty similar.  IBM and Amazon are both Linux-based systems built on virtualization software.  At some point, if enough hardware capacity is locked up in this rental data centers, it will become an important sales channel for all server hardware manufacturers.  Take Dell for example.  They’ve always sold direct.  Shouldn’t they consider this kind of business, especially when other hardware companies are going there?  What about HP?  Look at it as a way for hardware makers to switch from the equivalent of perpetual licensing to the SaaS rental model. 

    What about Microsoft?  Can .NET be as successful if they don’t build a Cloud Computing Service that is .NET based?  Seems to me this is a strategic imperative for the OS crowd lest Linux steal the show.  Sun is already there with Solaris on Sun Grid.  This is the system my old alma mater Callidus Software uses to host their SaaS solution and it works well.  IBM is not missing the chance to offer PowerPC as well as x86 servers for Blue Cloud.  IBM is also partnering with Google around Cloud Computing, so there may be all sorts of interesting bedfellows before this new paradigm is done rolling out.

    A great example that’s being written about by Scoble and others is Mogulus.  CEO Max Haot says they don’t own a single server, it’s all being done on Amazon, and yet they’re serving live video channels to 15,000 people with just over $1M in funding.  You’ve got to love it!  A number of other serverless and near serverless companies commented on Scoble’s post if you want to see more.  These big guys are not the only ones in the business.  Certainly companies like OpSource and Rackspace count too. 

    There are many potential advantages, and a few pitfalls.  First the advantages: it’s a whole lot easier and cheaper to build out your infrastructure this way.  Why have anything to do with touching or owning any real hardware?  How does that add value to your business?  The real innovators will make it easy to flex your capacity and add more servers on extremely short notice.  Take a look at your average graph of web activity:

    CNN Traffic

    This is traffic for  Notice how spikey it is?  Those are some big spikes.  If you web service hits one, you must either have a ton of extra servers on tap, or deal with your site getting painfully slow or going down altogether.  With a utility computing or grid service such as Amazon EC2, you can provision new servers on 10 minutes notice, use them until the load goes away, and then quit paying for them.  Payment is in 1 hour increments. 

    I know a SaaS vendor whose load doubles predictably one week out of every month because of what his app does.  He owns twice the servers to handle this peak.  He’s growing fast enough at the moment that he doesn’t sweat it much, but at some point, he could really benefit by flexing capacity.

    Now let’s talk about downsides.  First, most software doesn’t just run unchanged on these utility grids.  Even if it did, most software isn’t written to dynamically vary it’s use of servers.  Adding servers requires some manual rejiggering.  Amazon has a particularly difficult pitfall: you have to write your software to deal with a server going down without warning and losing all it’s data.  In fairness, you should have written your software to handle that anyway because it could happen that you whole machine is toast, but most companies don’t start out writing software that way.  There are companies, Elastra is one, that purport to have solutions to these problems.  Elastra has a MySQL solution that uses Amazon’s fabulously bulletproof S3 as it’s file system.

    The second issue isn’t so much a downside really.  We can’t blame these services for it at any rate.  What I’m talking about is automation.  To really take advantage here you need to radically increase your automation levels.  I recently saw a demo of some new 3Tera capabilities that I’ll be writing about that help a lot here.

    The bottom line?  You’re missing out if you’re not exploring utility computing: it can save you a bundle and make life a lot easier.  The subtext is that there are also a lot of new technologies, vendors, and partnerships coming down the pipe to help maximize the benefits.

    Related Articles

    Nick Carr picks up the theme.  One of the commenters raises an excellent point.  Using an IBM or Amazon gives peace of mind to customers of small startups.

    Posted in business, data center, ec2, grid, saas, strategy, Web 2.0 | 3 Comments »

    Amazon Beefs Up EC2 With New Options

    Posted by Bob Warfield on October 16, 2007

    I’ve been a big fan of Amazon’s Web Services for quite a while and attended their Startup Project, which is an afternoon seeing what it can do and hearing from entrepreneurs who’ve built on this utility computing fabric.  Read my writeup on the Startup Project for more.  Amazon has been steadily rolling out improvements, such as the addition of SLA’s for the S3 storage service.  Today, there is big news in the Amazon EC2 camp:

    Amazon has just announced two new instance types for their EC2 utility computing service.  The original type will continue to be available as the “small” type.  The “large” type has four times the CPU, RAM, and Disk Storage, while the “extra large” has eight times the CPU, RAM, and Disk.  The large and extra large also sport 64 bit cpus.  Supersize your EC2!

    Why do this?  Because the original small instance was a tad lightweight for database activity with just 1.7GB of RAM while the extra large at 15GB is about right.  Imagine a cluster of the extra large instances running memcached and you can see how this going to dramatically improve the possibilities for hosting large sites.

    One of the neat things about this new announcement is pricing.  They’ve basically linearly scaled pricing.  Whereas a small instance costs 10 cents per instance hour, the extra large has 8x the capacity and costs 8×10 cents or 80 cents per hour.

    What’s next?  These new instances open a lot of possibilities, but Amazon still doesn’t have painless persistence for databases like mySQL.  If you are running mySQL on an extra large instance and the server goes down for whatever reason, all the data on it is lost and you have to rebuild a new machine around some form of hot backup or failover.  That exercise has been left to the user.  It’s doable: you have to solve the problem in any data center of what you plan to do if the disk totally crashes and no data can be recovered.  However, folks have been vocally requesting a better solution from Amazon where the data doesn’t go away and the machine can be rebooted intact.  I was told by the EC2 folks at the Startup Project to expect 3 announcements before the end of the year that were related.  I’m guessing this is the first such announcement and two more will follow. 

    There’s tremendous excitement right now around these kinds of offerings.  They virtualize the data center to reduce the cost and complexity of setting up the infrastructure to do web software.  They allow you to flex capacity up or down and pay as you go.  Amazon is not the only such option.  I’ll be reporting on some others shortly.  It’s hard to see how it makes sense to build your own data center without the aid of one of these services any more. 

    Posted in amazon, ec2, grid, multicore, platforms, saas, software development, Web 2.0 | 2 Comments »

    %d bloggers like this: