SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for May 25th, 2009

10 Things You Don’t Need to Do In the Clouds

Posted by Bob Warfield on May 25, 2009

Sometimes a breakthrough paradigm shift eliminates the need for all kinds of things.  Word processors and laser printers killed a lot of other things that were once thriving including typewriters, liquid paper, and Linotype machines.  So it is with the Cloud.  When I chat with my Director of Operations at Helpstream, we’re always chuckling about how much better life is in the Amazon Cloud for our company.

As I read through unread blog posts with Google Reader, I’m going to note 10 things we don’t need to worry about since we’re in the Cloud:

1.  NetApp’s new DataDomain data de-duping product.   NetApp bought a company with a cool technology.  Plug it in place of you tape backups and you can backup to hard disk because this thing eliminates redundant data–sort of a very backup-savvy compression algorithm.  But if you’re in the Cloud, who cares?  Your Cloud vendor worries about this stuff.  You just buy it by the gigabyte, as much as you like, and do whatever.  Backup already looks like it is a hard disk with S3 and especially Elastic Block Store.  This is one whole chunk of costs and complexity you can safely ignore because it just doesn’t matter to you and you couldn’t install it in your vendor’s Cloud if it did.

2.  Server power consumption.  It’s out of your hands.  Sure there are really cool new technologies, like Dell’s Fortuna server that is the size of a hard disk and uses 20-30W.  But it doesn’t matter.  You aren’t choosing the servers in your Cloud.  The good news is that any really large scale Cloud vendor like Amazon will be choosing servers with great performance per watt, because it lowers their cost basis.  If they’re selling a commodity, like EC2, they’ll have to pass those savings on to you too.  Best of all, you can feel good about these being more green solutions than you’re likely to have the expertise to create in your own data center.

3.  Worrying about big iron or little iron (or little big iron where a proprietary cpu is in a small chassis?).  Should I run the best servers Sun (or some other Big Iron vendor) can provide?  Or should I just run lots of little commodity “Lintel” (Linux + Intel/AMD) boxes?  Quit worrying about it, because you can’t affect this.  In all likelihood your Cloud vendor has Lintel.  You have no idea which hardware brand they use, so you can quit caring about that too.  All those specs, which rack form factor, yada, yada, just don’t matter any more.  You have a handful of virtual machines you can choose from.  There are relatively few specifications to focus on for those virtual machines.  Someone else has probably already figured out how to set up memcached or whatever on those machines and how to optimize the software for that footprint.  You should certainly try some experiments because your software may be different, but the search space is sharply limited.  That’s a good thing, isn’t it?  Now you can focus instead of poring over a gillion spec sheets outer joined to a gillion different purchase deals.

4.  Worrying about MIPs in general.  As Om Malik so correctly points out, its the megabits (of connectivity) not the MIPs that count these days.  We haven’t been able to get more MIPs like we used to for a while, because of the multicore crisis.  Sure, we get more cores, but we don’t get faster clock speeds.  Everyone is ooohing and aaaahing that the iPhone will get a 1.5x faster cpu.  Does anyone remember back when you got a PC twice as faster every 18 months?  They never felt twice as fast.  Most of the time you could only tell if you went back to the slower machine, which seemed sooooo slooooow.  People will hardly notice the faster iPhone, unless they go back to an older one.  Meanwhile, those in the clouds can get all the MIPs they want, provided they’re ready to use elastically scaled cores loosely coupled over a LAN.

5.  Wholesale bandwidth costs.  Why worry about it if all your data is in the Cloud?  All you care about is how fast an individual browser can access that Cloud.  Granted, a big office requires a fair bit of bandwidth, but nothing like a data center.  Moreover, your Cloud vendor probably has multiple data centers in multiple geographies as well as CDN capabilities, so you are now geographically distributed in terms of connectibility.

6.  Which load balancing box to buy.   Forget about it.  Your Cloud vendor does this for you, and even if they didn’t, you’ll have to use software because you don’t get to install any custom hardware in their Cloud.  With the advent of Amazon offering load balancing as a service of their Cloud, all you need to think about is how to use it with your application.  Life gets simpler and more focused again.

7.  Hardware monitoring.  Amazon’s new CloudWatch service tracks all the usual low level monitoring (cpu load, disk i/o, network i/o, and so on) on one minute intervals.  The data is kept around for two weeks.  This is all stuff you’d have to monitor somehow.  You’d have to find some monitor software, install it, learn how to use it, yada, yada.  With CloudWatch, you just have to learn to use what’s already there.  Amazon had to get this and a lot of other things to work just to have a Cloud.  You get a handy assist from that.  People who want to compare Amazon on a raw server cost basis never look at these kinds of costs.

8.  Creating multiple data centers for redunancy and for multiple geographies.  Werner Vogels, Amazon’s CTO, makes it sound so simple:

The Amazon Elastic Compute Cloud (Amazon EC2) embodies much of what makes infrastructure as a service such a powerful technology; it enables our customers to build secure, fault-tolerant applications that can scale up and down with demand, at low cost. Core in achieving these levels of efficiency and fault-tolerance is the ability to acquire and release compute resources in a matter of minutes, and in different Availability Zones.

Elastic availability of compute resources in multiple different Availability Zones (e.g. datacenters) in a matter of minutes?  First, it’s impossible for small companies to afford multiple redundant data centers.  They all reach a scale before dealing with that.  The Cloud levels that playing field so anyone and everyone can afford it day 1.  Just the sanity of having your data backed up to S3 with multiple copies in different physical locations is wonderful.  Second, even when you reach the size of being able to afford multiple data centers, it is a hugely expensive and complex undertaking.  Why would you ever want to deal with this if you didn’t have to?

9.  Exactly how to configure complex software like MySQL for my particular server instances.  Most of the Clouds have libraries of machine instances where somebody else (hopefully even the vendor who made the software) has set it all up, blessed it, snapshotted the image, and made it available.  Mount that image on an EC2 virtual server and away you go with something you know works.  Even if you are not on Amazon and don’t have Amazon Machine Instances like that, other clouds have these options too.  3Tera, for example, builds software for Cloud Owners and has what they call their Enterprise App Store.  These are pre-configured and ready-to-run instances. 

10.  Worry your engineers are spending valuable time worrying about infrastructure and worse physically visiting that infrastructure instead of doing something that gives your company a distinct competitive advantage.  Why build a datacenter if everyone else has one?  Let them make that investment while you invest elsewhere.  Werner Vogels gives a great example that is appropriate since the Indy 500 just ran Sunday.   Their site has a unique problem.  It requires a huge amount of resources to deliver a rich user experience:  multiple video streams including views from the cockpits of drivers’ cars with audio feeds and telemetry.  The challenge, as Vogel puts it, was that it isn’t used very frequently:

This is a high load application but it only runs three times a year. They found that they had to move a lot of engineers into data centers to keep their servers up. When they moved to cloud infrastructure they made 75% cost savings, the majority of which was on the people side; now they can manage everything from their armchair at home.

So there you have it.  10 things you don’t have to deal with if your data center is in the Cloud.  These are 10 things based on the pseudo-random collection of blog posts in my Google Reader RSS feeds.  There are many more out there, and I’m not even going to claim these are the 10 most important things.

Don’t you need fewer things to worry about so you can focus on what actually makes the difference?

Posted in amazon, cloud, data center | 18 Comments »

%d bloggers like this: