SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘multicore’ Category

How Moore’s Law Put Apple in the Driver’s Seat and Cost Steve Ballmer His Job

Posted by Bob Warfield on January 24, 2014

With the Mac’s 30th anniversary, lots of folks are writing all sorts of articles about it, so I thought it only fitting to bring up my own thoughts on what happened and how Apple got control away from Microsoft.  It’s not a theory I have seen anywhere else, but it’s the one that makes the most sense to me.

Recently, I spent the afternoon upgrading my PC.  I added 2 higher capacity SSD disks, a new graphics card, and a new power supply.  I had planned to add a CPU with more cores, but I couldn’t find it and frankly, I didn’t look all that hard because I knew it wasn’t going to matter very much.

Upgrading my PC is something I used to do like clockwork every 2 years.  I looked forward to it and always enjoyed the results–my computer would be at least 2X faster.  While it didn’t always feel 2X faster, the previous machine (when I still had access to it or one just like it) always felt a lot more than 2X slower.  Life was good in the upgrade heyday for the likes of Microsoft and Intel.  Steve Jobs was this idiosyncratic guy who made cool machines that you couldn’t upgrade easily.  Everyone knew Microsoft had stolen a lot of Apple’s ideas but it was okay, because heck, Apple stole a lot of ideas from places like Xerox PARC.  There were Mac users, but they were a tiny minority, so tiny that Jobs was actually fired from his own company at one point.

Fast forward to my recent upgrade experience.  I hadn’t done an upgrade in 5 years, didn’t feel like I had missed much, and didn’t spend nearly as much money on the upgrade as I had in those times past.  Before that prior upgrade it was probably at least another 3 or 4 years to get to an upgrade.  That one 2 upgrades back was largely motivated by a defective hard disk too, so I’m not even sure it counts.

Times have sure changed for Intel, Microsoft, and Apple too.  Apple is now the World’s Most Amazing company.  Microsoft is in the dumper, Steve Ballmer has lost his job, and Intel just announced they’re laying off another 5000 people.

What happened?

People will say, “That Steve Jobs was just so brilliant, he invented all these new products around music, telephones, and tablets, that nobody wants PC’s any more.”  In other words, Apple out-innovated and out-Industrial Designed Microsoft.  They even changed the game so it isn’t about PC’s any more–it’s all about Mobile now.  We’re firmly in the Post-PC Era goes the buzz.  VC’s are in a rush to invest in Mobile.  It’s Mobile First, Mobile is Eating the World, mobile, mobile, mobile, yada, yada, yada.

But I don’t know anyone who has quit using their PC’s.  Quit upgrading?  Absolutely!  Putting a lot of time on their mobile devices?  Yup.  But quit using PC’s?  No.  Absolutely not.   There are many many apps people use almost exclusively on PC’s.  These are the apps that create content, they don’t just consume it.  One could argue they are the ones that add the most value, though they are not the ones that necessarily get the majority of our time.  Some people are totally online with Office-style apps, but they still much prefer them on their PC’s–no decent keyboard on their tablet or phone.  Bigger screens are better for spreadsheets–you can never see enough cells on the darned things.  And most are still using Microsoft Office apps installed on their PC’s.  CADCAM, which is my day job, is totally focused on desktops and maybe laptops.  Graphic Design?  Photoshop on a PC (well a Mac, and probably a laptop, but they sure don’t want to give up the big gorgeous monitor on the desk much).  Accounting and Bookkeeping?  That’s my wife’s daily work–Quick Books.  Enterprise Software?  Yeah sure, they got mobile apps, but mostly they’re desktop.  Did people unplug all the desktop clients?  No, not even close.  They simply killed the 2 year upgrade cycle.

People will say Microsoft was just too slow, copied without ever innovating, and missed all the key trends.  There is no doubt that all those things were true as well.  But think about it.  Apple has always been great at Industrial Design and Innovation.  Microsoft has always been slow and missed key trends.  Remember the old adage that it takes Microsoft 3 releases before they have a decent product.  That’s been true their entire history.  Something had to be different for these two companies and their relationship to the market.  Something had to fundamentally change.

What’s wrong with Microsoft and Intel has little to do with people quitting their use of PC’s and switching over to Mobile.  It’s not a case of choose one, it is a case of, “I want all of the above.”  There are essentially three things that have happened to Microsoft and Apple on the desktop:

#1 – People stopped upgrading every two years because there was no longer a good reason to do so.

#2 – People who wanted a gadget fix got a whole raft of cool phones and tablets to play with instead of upgrading their PC’s, and Microsoft botched their entry into the mobile market.

#3 –  People who wouldn’t consider spending so much money on a computer that couldn’t be upgraded when it would be clearly obsolete in 2 years suddenly discovered their computer wasn’t obsolete even after 5 years.  So they decided to invest in something new:  Industrial Design.  I can afford to pay for fruit on my machine, just like I used to pay for polo players on my shirts back in the Yuppie Age (I like cheap T-shirts now).  It’s the age old siren’s call:  I can be somebody cool because of a label.

#1 was an unmitigated disaster for Microsoft, and the carnage continues today.  #2 was a botched opportunity for Microsoft they may very well be too late to salvage and it created a huge entre for Apple.  #3 cemented Apple’s advantage by letting them sell high dollar PC’s largely on the basis of Industrial Design.

That’s the desktop PC market.  The server market has been equally painful for Microsoft, but we’ll keep that one simple since Apple doesn’t really play there.  Suffice to say that Open Source, the Cloud, and Moore’s Law did their job there too.  The short story is that there is still a certain amount of #1 in the server market, because machines don’t get enough faster with each Moore’s Law Cycle.  They do get more cores, but that largely favors Cloud operations, which have the easiest time making use of endless more cores.  Unfortunately, the Cloud is hugely driven by economics and doesn’t want to pay MSFT for OS software licenses if they can install Open Source Unix.  Plus, they negotiate huge volume discounts.  They are toe to toe and nose to nose with Microsoft.  So to those first 3 problems, we can add #4 for Microsoft’s server market:

#4 –  Open Source and the Cloud has made it hard to impossible for Microsoft to succeed well in the server world.

Why did people quit upgrading?

Simple put, Moore’s Law let them down.  In fairness to Gordon Moore, all he really said was that the number of transistors would double every 2 years, and that law continues in force.  But, people used to think that meant computers would be twice as fast every 2 years and that has come to a bitter end for most kinds of software.

If you want to understand exactly when #1 began and how long it’s been going on, you need look no further than the Multicore Crisis, which I started writing about almost since the inception of this blog.  Here is a graph from way back when of CPU clock speeds, which govern how fast they run:

Notice we peaked in 2006.  What a run we had going all the way back to the 1970’s–30 years doubling performance every 2 years.  That’s the period when dinosaurs, um, I mean Microsoft, ruled the world.

Oh but surely that must have changed since that graph was created?  Why, that was 7 or 8 years ago–an eternity for the fast-paced computer industry.  In fact, we are still stuck in Multicore Crisis Tar Pit.  A quick look at Intel’s web site suggests we can buy a 3.9 GHz clock speed but nothing faster.  By now, we’ve had 4 Moore Cycles since 2006, and cpu’s should be 16X faster by the old math.  They’re not even close.  So Moore’s Law continues to churn out more transistors on a CPU, but we’re unable to make them go faster.  Instead, the chips grow more powerful by virtue of other metrics:

–  We can fit more memory on a chip, but it runs no faster.  However, it has gotten cheap enough we can make solid state disks.

–  We can add more cores to our CPU’s, but unless our software can make use of more cores, nobody cares.  It’s mostly Cloud and backend software that can use the cores.  Most of the software you or I might run can’t, so we don’t care about more cores.

–  We can make graphics cards faster.  Many algorithms process every pixel, and this is ideal for the very specialized multi-core processors that are GPU’s (Graphics Processing Units).  When you have a 4K display, having the ability to process thousands more pixels simultaneously is very helpful.  But, there are issues here too.  Graphics swallows up a lot of processing power while delivering only subtle improvements to the eye.  Yes, we love big monitors, retina displays, and HD TV.  But we sure tolerate a lot on our mobile devices and by the way, did games really get 2X visually better every 2 years?  No, not really.  They’re better, but it’s subtle.  And we play more games where that kind of thing doesn’t matter.  Farmville isn’t exactly photo realistic.

Will Things Stay This Way Forever?

Microsoft got shot out of the saddle by a very subtle paradigm shift–Moore’s Law let them down.  Most would say it hasn’t been a bad thing for Microsoft to become less powerful.  But it is a huge dynamic that Microsoft is caught up in.  Do they realize it?  Will the new CEO destined to replace Steve Ballmer realize this is what’s happened?  Or will they just think they had a slip of execution here, another there, but oh by the way aren’t our profits grand and we’ll just work a little harder and make fewer mistakes and it’ll all come back.  So far, they act like it is the latter.

And what of Apple?  They’re not the only ones who can do Industrial Design, but they sure act like that’s all that matters in the world.  And Apple has made it important enough that everyone wants to do it.  Don’t get me wrong, I love Industrial Design.  One of the reasons I like Pinterest is it is filled with great designs you can pin on your board.  Is Apple really the only company that can do competent Industrial Design?  Do they have a monopoly on it to the extent that justifies their current profit margins?  Color me skeptical.  Think that new Mac Pro is more than industrial design?  Is it really that much high performance?  The Wall Street Journal doesn’t think so.  How about this hacker that made a Mac Pro clone out of a trash can:

GermanProHack2

GermanProHack

Is it as slick as the real thing?  Aw heck no.  Absolutely not.  But it was made by a hobbyist and professionals can do a lot better.  Companies like BMW are getting involved in this whole design thing too:

BMWAngleView

How Can Apple and Microsoft Win?

Apple has the easier job by far–they need to exploit network effects to create barriers to exit for the new mobile ecosystems they’ve built.  They’re not doing too badly, although I do talk to a lot of former iPhone users who tried an Android and believe it is just as good.  For network effect, iTunes is fabulous, but the video ecosystem is currently up for grabs.  Netflix and Amazon seem closer to duking that out than Apple.  Cook should consider buying Netflix–he may be too late to build his own.  Tie it to the right hardware and it rocks.He should consider buying Facebook too, but it may not be for sale.  Network effects are awesome if you can get them, but they’re not necessarily that easy to get.

Meanwhile, Apple will continue to play on cool.  I’ve been saying to friends for years that Apple is not a computer company, it is a Couturier ala Armani.  It is a coachbuilder ala Pininfarina.  It is an arbiter of fashion and style, but if the world became filled with equally as fashionable artifacts, it isn’t clear Apple could succeed as well as it does today.  Those artifacts are out there.  Artists need less help than ever before to sell their art.  Fashion is a cult of personality, packaging, and perception.  We lost the personality in Steve Jobs.  That’s going to be tough and Apple needs to think carefully about it.  They seem more intent on homogenizing the executive ranks as if harmony is the key thing.  It isn’t.  Fashion has nothing to do with harmony and everything to do with temperamental artistes.

Another problem Apple has is an over-reliance on China.  They’ve already had some PR problems with it and they are moving some production back to North America.  But it may not be enough.

Most people don’t realize it, but $1 of Chinese GDP produces 5X as much carbon footprint as $1 of US GDP produced here in America.  In a world that is increasingly sensitive to Global Warming, it could be a real downside if people realized that the #1 thing they could personally do to minimize it is to quit buying Chinese made products.  Apple can fix human rights violations to some extent, but fixing the carbon footprint problem will take a lot longer.  Apple is not alone on this–the Computer and Consumer Electronics sectors are among the worst about offshoring to China.  But, if the awareness was there, public opinion could start to swing, and it could create opportunities for alternatives.  And fashion is nothing but public opinion.  Ask the artists that have fallen because the world became aware of some prejudice or some viral quote that didn’t look good for them.  That’s the problem with Fashion–it changes constantly and there’s always a cool new kid on the block.

Microsoft has a much tougher job.  The thing they grew up capitalizing on–upgrade cycles–no longer exists.  They have to learn new skills or figure out a way to bring back the upgrade cycles.  And, they need to get it done before the much weaker first generation networks effects of their empire finish expiring.  So far they are not doing well at all.  Learning to succeed at mobile with smart phones and tablets, for example.  They have precious little market share, a long list of missed opportunities, and little indication that will change soon.  Learning to succeed with Industrial Design.  Have you seen the flaps around Windows 8?  Vista?  Those were mostly about Design issues.  Microsoft doesn’t worship Design with a capital “D” as Apple does.  It worships Product Management, which is a different thing entirely, though most PM’s fancy themselves Design Experts.  Microsoft is just too darned Geeky to be Design-Centric.  It’s not going to happen and it doesn’t matter if they get some amazing Design Maven in as the new CEO.  That person will simply fail at changing so many layers of so many people to be able to see things the Design Way.

Operate it autonomously from the top the way Steve Jobs did Apple?  The only guy on the planet who could do that is Bill Gates and he doesn’t seem interested.  But, Gates and Ballmer will make sure any new guy has to be much more a politician and much less a dictator, so running it autonomously from the top will fail.  Actually, Bill is not the only one who good do it–Jeff Bezos could also do a fine job and his own company, Amazon, is rapidly building exactly the kinds of network effects Microsoft needs.  The only way that happens is if Microsoft allows Amazon to buy it at fire sale prices.  Call that an end game result if the Board can’t get the Right Guy into the CEO’s seat.

The best acquisition Microsoft could make right now is Adobe.  It still has some residual Old School Network effects given that designers are stuck on Photoshop and their other tools.  Plus Adobe is building a modern Cloud-based Creative Suite business very quickly.  But this is a stopgap measure at best.

Can the upgrade cycle be re-ignited?

There is a risky play that caters to Microsoft’s strengths, and that would restore the upgrade cycle.  Doing so requires them to overcome the Multicore Crisis.  Software would have to once again run twice as fast with each new Moore Cycle.  Pulling that off requires them to create an Operating System and Software Development Tools that make can harness the full power of as many cores as you can give it while allowing today’s programmers to be wildly successful building software for the new architecture.  It’s ambitious, outrageous even, but it plays to Microsoft’s strengths and its roots.  It started out selling the Basic Programming Language and added an Operating System to core.  Regaining the respect of developers by doing something that audacious and cool will add a lot more to Microsoft than gaining a couple more points of Bing market share.  Personally, I assign a higher likelihood to Microsoft being able to crack the Multicore Crisis than I do to them being able to topple Google’s Search Monopoly.

Let’s suspend disbelief and imagine for a minute what it would be like.

Microsoft ships a new version of Windows and a new set of development tools.  Perhaps an entirely new language.  They call that ensemble “MulticoreX”.  They’ve used their influence to make sure all the usual suspects are standing there on the stage with them when they launch.  What they demonstrate on that stage is blinding performance.  Remember performance?  “Well performance is back and it’s here to stay,” they say.  Here’s the same app on the same kind of machine.  The one on the left uses the latest public version of Windows.  The one on the right uses the new MulticoreX OS and Tools.  It runs 8X faster on the latest chips.  Plus, it will get 2X faster every year due to Moore’s Law (slight marketing exaggeration, every other year).  BTW, we will be selling tablets and phones based on the same technology.  Here is an MS Surface running an amazing video game.  Here is the same thing on iPad.  Here’s that app on our MulticoreX reference platform that cost $1500 and is a non-MulticoreX version of the same software on a $10,000 Mac Pro.  See?  MulticoreX is running circles around the Mac Pro.  Imagine that!  Oh, and here is a Porsche Design computer running MulticoreX and here’s the Leatherman PC for hard working handy men to put in their garages, and here is the Raph Lauren designed tablet–look it has design touches just like the Bugattis and Ferraris Mr Lauren likes to collect!

ShelbyGT500KR

Performance is back and it’s here to stay!

Can it be done?

As I said, it is a very risky play.  It won’t be easy, but I believe it is possible.  Microsoft already has exactly the kind of people on staff already that could try to do it.  We were doing something similar with success at my grad school, Rice University, back in the day.  It will likely take something this audacious to regain their crown if they’re ever going to.  They need a Skunkworks Lockheed SR-71 style project to pull it off.  If they can make it easy for any developer to write software that uses 8 cores to full effect without hardly trying, it’ll be fine if they have no idea how to do 16 cores and need to figure that out as the story unfolds.  It also creates those wonderful lock-in opportunities.  There’ll be no end of patents, and this sort of thing is genuinely hard to do, so would-be copiers may take a long time to catch up, if ever.

This is not a play that can be executed by a Board that doesn’t understand technology very well or that is more concerned about politics and glad handing than winning.  Same for the CEO.  It needs a hard nosed player with vision who won’t accept failure and doesn’t care whose feathers are ruffled along the way.  They can get some measure of political air cover by making it a skunkworks.  Perhaps it should even be moved out of Seattle to some controversial place.  It needs a chief architect who directly has their fingers in the pie and is a seriously Uber Geek.  I’d nominate Anders Hejlsberg for the position if it was my magic wand to wave.

It’s these human factors that will most likely prevent it from happening moreso than the technical difficulty (which cannot be underestimated).

Posted in apple, business, multicore, platforms, software development, strategy | 2 Comments »

Big Data is a Small Market Compared to Suburban Data

Posted by Bob Warfield on February 2, 2013

BurbsBig Data is all the rage, and seem to be one of the prime targets for new entrepreneurial ventures since VC-dom started to move from Consumer Internet to Enterprise recently.  Yet, I remain skeptical about Big Data for a variety of reasons.  As I’ve noted before, it seems to be a premature optimization for most companies.  That post angered the Digerati who are quite taken with their NoSQL shiny objects, but there have been others since who reach much the same conclusion.  The truth is, Moore’s Law scales faster than most organizations can scale their creation of data.  Yes, there are some few out of millions of companies that are large enough to really need Big Data and yes, it is so fashionable right now that many who don’t need it will be talking about it and using it just so they can be part of the new new thing.  But they’re risking the problems many have had when they adopt the new new thing for fashion rather than because it solves real problems they have.

This post is not really about Big Data, other than to point out that I think it is a relatively small market in the end.  It’ll go the way of Object Oriented Databases by launching some helpful new ideas, the best of which will be adopted by the entrenched vendors before the OODB companies can reach interesting scales.  So it will be with Hadoop, NoSQL, and the rest of the Big Data Mafia.  For those who want to get a head start on the next wave, and on a wave that is destined to be much more horizontal, much larger, and of much greater appeal, I offer the notion of Suburban Data.

While I shudder at the thought of any new buzzwords, Suburban Data is what I’ve come up with when thinking about the problem of massively parallel architectures that are so loosely coupled (or perhaps not coupled at all) that they don’t need to deal with many of the hard consistency problems of Big Data.  They don’t care because what they are is architectures optimized to create a Suburb of very loosely coordinated and relatively small collections of data.  Think of Big Data’s problems as being those of the inner city where there is tremendous congestion, real estate is extremely expensive, and it makes sense to build up, not out.  Think Manhattan.  It’s very sexy and a wonderful place to visit, but a lot of us wouldn’t want to live there.  Suburban Data, on the other hand, is all about the suburbs.  Instead of building giant apartment buildings where everyone is in very close proximity, Suburban Data is about maximizing the potential of detached single family dwellings.  It’s decentralized and there is no need for excruciatingly difficult parallel algorithms to ration scarce services and enforce consistency across terabytes.

Let’s consider a few Real World application examples.

WordPress.com is a great place to start.  It consists of many instances of WordPress blogs.  Anyone who likes can get one for free.  I have several, including this Smoothspan Blog.  Most of the functionality offered by wp.com does not have to coordinate between individual blogs.  Rather, it’s all about administering a very large number of blogs that individually have very modest requirements on the power of the underlying architecture.  Yes, there are some features that are coordinated, but the vast majority of functionality, and the functionality I tend to use, is not.  If you can see the WordPress.com example, web site hosting services are another obvious example.  They just want to give out instances as cheaply as possible.  Every blog or website is its own single family home.

There are a lot of examples along these lines in the Internet world.  Any offering where the need to communicate and coordinate between different tenants is minimized is a good candidate.  Another huge area of opportunity for Suburban Data are SaaS companies of all kinds.  Unless a SaaS company is exclusively focused on extremely large customers, the requirements of an average SaaS instance in the multi-tenant architecture are modest.  What customers want is precisely the detached single family dwelling, at least that’s what they want from a User Experience perspective.  Given that SaaS is the new way of the world, and even a solo bootstrapper can create a successful SaaS offering, this is truly a huge market.  The potential here is staggering, because this is the commodity market.

Look at the major paradigm shifts that have come before and most have amounted to a very similar (metaphorically) transition.  We went from huge centralized mainframes to mini-computers.  We went from mini-computers to PC’s.  Many argue we’re in the midst of going from PC’s to Mobile.  Suburban Data is all about how to create architectures that are optimal for creating Suburbs of users.

What might such architectures look like?

First, I think it is safe to say that while existing technologies such as virtualization and the increasing number of server hardware architectures being optimized for data center use (Facebook and Google have proprietary hardware architectures for their servers) are a start, there is a lot more that’s possible and the job has hardly begun.  To be the next Oracle in the space needs a completely clean sheet design from top to bottom.  I’m not going to map the architecture out in great detail because its early days and frankly I don’t know all the details.  But, let’s Blue Sky a bit.

Imagine an architecture that puts at least 128 x86 compatible (we need a commodity instruction set for our Suburbs) cores along with all the RAM and Flash Disc storage they need onto the equivalent of a memory stick for today’s desktop PC’s.  Because power and cooling are two of the biggest challenges in modern data centers, the Core Stick will use the most miserly architectures possible–we want a lot of cores with reasonable but no extravagant clock speeds.  Think per-core power consumption suitable for Mobile Devices more than desktops.  For software, let’s imagine these cores run an OS Kernel that’s built around virtualization and the needs of Suburban Data from the ground up.  Further, there is a service layer running on top of the OS that’s also optimized for the Suburban Data world but has the basics all ready to go:  Apache Web Server and MySQL.  In short, you have 128 Amazon EC2 instances potent enough to run 90% of the web sites on the Internet.  Now let’s create backplanes that fit a typical 19″ rack set up with all the right UPS and DC power capabilities the big data centers already know how to do well.  The name of the game will be Core Density.  We get 128 on a memory stick, and let’s say 128 sticks in a 1U rack mount, so we can support 16K web instances in one of those rack mounts.

There will many valuable problems to solve with such architectures, and hence many opportunities for new players to make money.  Consider what has to be done to reinvent hierarchical storage manage for such architectures.  We’ve got a Flash local disc with each core, but it is probably relatively small.  Hence we need access to storage on a hierarchical basis so we can consume as much as we want and it seamlessly works.  Or, consider communicating with and managing the cores.  The only connections to the Core Stick should be very high speed Ethernet and power.  Perhaps we’ll want some out of band control signals for security’s sake as well.  Want to talk to one of these little gems, just fire up the browser and connect to its IP address.  BTW, we probably want full software net fabric capabilities on the stick.

It’ll take quite a while to design, build, and mature such architectures.  That’s fine, it’ll give us several more Moore cycles in which to cement the inevitability of these architectures.

You see what I mean when I say this is a whole new ballgame and a much bigger market than Big Data?  It goes much deeper and will wind up being the fabric of the Internet and Cloud of tomorrow.

Posted in business, cloud, data center, enterprise software, multicore, platforms, saas, service | 2 Comments »

Single Tenant, Multitenant, Private and Public Clouds: Oh My!

Posted by Bob Warfield on August 27, 2010

My head is starting to hurt with all the back and forth among my Enterprise Irregulars buddies about the relationships between the complex concepts of Multitenancy, Private, and Public Clouds.  A set of disjoint conversations and posts came together like the whirlpool in the bottom of a tub when it drains.  I was busy with other things and didn’t get a chance to really respond until I was well and truly sucked into the vortex.  Apologies for the long post, but so many wonderful cans of worms finally got opened that I just have to try to deal with a few of them.  That’s why I love these Irregulars!

To start, let me rehash some of the many memes that had me preparing to respond:

–  Josh Greenbaum’s assertion that Multitenancy is a Vendor, not a Customer Issue.  This post includes some choice observations like:

While the benefits that multi-tenancy can provide are manifold for the vendor, these rationales don’t hold water on the user side.

That is not to say that customers can’t benefit from multi-tenancy. They can, but the effects of multi-tenancy for users are side-benefits, subordinate to the vendors’ benefits. This means, IMO, that a customer that looks at multi-tenancy as a key criteria for acquiring a new piece of functionality is basing their decision on factors that are not directly relevant to their TCO, all other factors being equal.

and:

Multi-tenancy promises to age gracelessly as this market matures.

Not to mention:

Most of the main benefits of multi-tenancy – every customer is on the same version and is updated simultaneously, in particular – are vendor benefits that don’t intrinsically benefit customers directly.

The implication being that someone somewhere will provide an alternate technology very soon that works just as good or better than multitenancy.  Wow.  Lots to disagree with there.  My ears are still ringing from the sound of the steel gauntlet that was thrown down.

–  Phil Wainewright took a little of the edge of my ire with his response post to Josh, “Single Tenancy, the DEC Rainbow of SaaS.”  Basically, Phil says that any would-be SaaS vendor trying to create an offering without multitenancy is doomed as the DEC Rainbow was.  They have some that sort of walks and quacks like a SaaS offering but that can’t really deliver the goods.

–  Well of course Josh had to respond with a post that ends with:

I think the pricing and services pressure of the multi-tenant vendors will force single-tenant vendors to make their offerings as compatible as possible. But as long as they are compatible with the promises of multi-tenancy, they don’t need to actually be multi-tenant to compete in the market.

That’s kind of like saying, “I’m right so long as nothing happens to make me wrong.”  Where are the facts that show this counter case is anything beyond imagination?  Who has built a SaaS application that does not include multitenancy but that delivers all the benefits?

Meanwhile back at the ranch (we EI’s need a colorful name for our private community where the feathers really start to fly as we chew the bones of some good debates), still more fascinating points and counterpoints were being made as the topic of public vs private clouds came up (paraphrasing):

–  Is there any value in private clouds?

–  Do public clouds result in less lock-in than private clouds?

–  Are private clouds and single tenant (sic) SaaS apps just Old School vendors attempts to hang on while the New Era dawns?  Attempts that will ultimately prove terribly flawed?

–  Can the economics of private clouds ever compete with public?

–  BTW, eBay now uses Amazon for “burst” loads and purchases servers for a few hours at a time on their peak periods.  Cool!

–  Companies like Eucalyptus and Nimbula are trying to make Private Clouds that are completely fungible with Public Clouds.  If you  in private cloud frameworks like these means you have
to believe companies are going to be running / owning their own servers for a long time to come even if the public cloud guys take over a number of compute workloads.  The Nimbula guys built EC2 and they’re no dummies, so if they believe in this, there must be something to it.

–  There are two kinds of clouds – real and virtual.  Real clouds are multi-tenant. Virtual clouds are not. Virtualization is an amazing technology but it can’t compete with bottoms up multi-tenant platforms and apps.

Stop!  Let me off this merry go-round and let’s talk.

What It Is and Why Multitenancy Matters

Sorry Josh, but Multitenancy isn’t marketing like Intel Inside (BTW, do you notice Intel wound up everywhere anyway?  That wasn’t marketing either), and it matters to more than just vendors.  Why?

Push aside all of the partisan definitions of multitenancy (all your customers go in the same table or not).   Let’s look at the fundamental difference between virtualization and multitenancy, since these two seem to be fighting it out.

Virtualization takes multiple copies of your entire software stack and lets them coexist on the same machine.  Whereas before you had one OS, one DB, and one copy of your app, now you may have 10 of each.  Each of the 10 may be a different version entirely.  Each may be a different customer entirely, as they share a machine.  For each of them, life is just like they had their own dedicated server.  Cool.  No wonder VMWare is so successful.  That’s a handy thing to do.

Multitenancy is a little different.  Instead of 10 copies of the OS, 10 copies of the DB, and 10 copies of the app, it has 1 OS, 1 DB, and 1 app on the server.  But, through judicious modifications to the app, it allows those 10 customers to all peacefully coexist within the app just as though they had it entirely to themselves.

Can you see the pros and cons of each?  Let’s start with cost.  Every SaaS vendor that has multitenancy crows about this, because its true.  Don’t believe me?  Plug in your VM software, go install Oracle 10 times across 10 different virtual machines.  Now add up how much disk space that uses, how much RAM it uses when all 10 are running, and so on.  This is before you’ve put a single byte of information into Oracle or even started up an app.  Compare that to having installed 1 copy of Oracle on a machine, but not putting any data into it.  Dang!  That VM has used up a heck of a lot of resources before I even get started!

If you don’t think that the overhead of 10 copies of the stack has an impact on TCO, you either have in mind a very interesting application + customer combination (some do exist, and I have written about them), or you just don’t understand.  10x the hardware to handle the “before you put in data” requirements are not cheap.  Whatever overhead is involved in making that more cumbersome to automate is not cheap.  Heck, 10x more Oracle licenses is very not cheap.  I know SaaS companies who complain their single biggest ops cost is their Oracle licenses. 

However, if all works well, that’s a fixed cost to have all those copies, and you can start adding data by customers to each virtual Oracle, and things will be okay from that point on.  But, take my word for it, there is no free lunch.  The VM world will be slower and less nimble to share resources between the different Virtual Machines than a Multitenant App can be.  The reason is that by the time it knows it even needs to share, it is too late.  Shifting things around to take resource from one VM and give it to another takes time.  By contrast, the Multitenant App knows what is going on inside the App because it is the App.  It can even anticipate needs (e.g. that customer is in UK and they’re going to wake up x hours before my customers in the US, so I will put them on the same machine because they mostly use the machine at different times).

So, no, there is not some magic technology that will make multitenant obsolete.  There may be some new marketing label on some technology that makes multitenancy automatic and implicit, but if it does what I describe, it is multitenant.  It will age gracefully for a long time to come despite the indignities that petty competition and marketing labels will bring to bear on it.

What’s the Relationship of Clouds and Multitenancy?

Must Real Clouds be Multitenant?

Sorry, but Real Clouds are not Multitenant because they’re based on Virtualization not Multitenancy in any sense such as I just defined.  In fact, EC2 doesn’t share a core with multiple virtual machines because it can’t.  If one of the VM’s started sucking up all the cycles, the other would suffer terrible performance and the hypervisors don’t really have a way to deal with that.  Imagine having to shut down one of the virtual machines and move it onto other hardware to load balance.  That’s not a simple or fast operation.  Multi-tasking operating systems expect a context switch to be as fast as possible, and that’s what we’re talking about.  That’s part of what I mean by the VM solution being less nimble.  So instead, cores get allocated to a particular VM.  That doesn’t mean a server can’t have multiple tenants, just that at the granularity of a core, things have to be kept clean and not dynamically moved around. 

Note to rocket scientists and entrepreneurs out there–if you could create a new hardware architecture that was really fast at the Virtual Machine load balancing, you would have a winner.  So far, there is no good hardware architecture to facilitate a tenant swap inside a core at a seamless enough granularity to allow the sharing.  In the Multicore Era, this would be the Killer Architecture for Cloud Computing.  If you get all the right patents, you’ll be rich and Intel will be sad.  OTOH, if Intel and VMWare got their heads together and figured it out, it would be like ole Jack Burton said, “You can go off and rule the universe from beyond the grave.”

But, it isn’t quite so black and white.  While EC2 is not multitenant at the core level, it sort of is at the server level as we discussed.  And, services like S3 are multitenant through and through.  Should we cut them some slack?  In a word, “No.”  Even though an awful lot of the overall stack cost (network, cpu, and storage) is pretty well multitenant, I still wind up installing those 10 copies of Oracle and I still have the same economic disadvantage as the VM scenario.  Multitenancy is an Application characteristic, or at the very least, a deep platform characteristic.  If I build my app on Force.com, it is automatically multitenant.  If I build it on Amazon Web Services, it is not automatic.

But isn’t there Any Multitenant-like Advantage to the Cloud?  And how do Public and Private Compare?

Yes, there are tons of benefits to the Cloud, and through an understanding and definition of them, we will tease out the relationship of Public and Private Clouds.  Let me explain…

There are two primary advantages to the Cloud:  it is a Software Service and it is Elastic.  If you don’t have those advantages, you don’t have a Cloud.  Let’s drill down.

The Cloud is a Software Service, first and foremost.  I can spin up and control a server entirely through a set of API’s.  I never have to go into a Data Center cage.  I never have to ask someone at the Data Center to go into the Cage (though that would be a Service, just not a Software Service, an important distinction).  This is powerful for basically the same reasons that SaaS is powerful versus doing it yourself with On-prem software.  Think Cloud = SaaS and Data Center = On Prem and extrapolate and you’ll have it. 

Since Cloud is a standardized service, we expect all the same benefits as SaaS:

– They know their service better than I do since it is their whole business.  So I should expect they will run it better and more efficiently.

– Upgrades to that service are transparent and painless (try that on your own data center, buddy!).

– When one customer has a problem, the Service knows and often fixes it before the others even know it exists.  Yes Josh, there is value in SaaS running everyone on the same release.  I surveyed Tech Support managers one time and asked them one simple question:  How many open problems in your trouble ticketing system are fixed in the current release?  The answers were astounding–40 to 80%.  Imagine a world where your customers see 40 to 80% fewer problems.  It’s a good thing!

– That service has economic buying power that you don’t have because it is aggregated across many customers.  They can get better deals on their hardware and order so much of it that the world will build it precisely to their specs.  They can get stuff you can’t, and they can invest in R&D you can’t.  Again, because it is aggregated across many customers.  A Startup running in the Amazon Cloud can have multipe redundant data centers on multiple continents.  Most SaaS companies don’t get to building multiple data centers until they are way past having gone public. 

–  Because it is a Software Service, you can invest your Ops time in automation, rather than in crawling around Data Center cages.  You don’t need to hire anyone who knows how to hot swap a disk or take a backup.  You need peeps who know how to write automation scripts.  Those scripts are a leveragable asset that will permanently lower your costs in a dramatic way.  You have reallocated your costs from basic Data Center grubbing around (where does this patch cable go, Bruce?), an expense, to actually building an asset.

The list goes on.

The second benefit is Elasticity.  It’s another form of aggregation benefit.  They have spare capacity because everyone doesn’t use all the hardware all the time.  Whatever % isn’t utilized, it is a large amount of hardware, because it is aggregated.  It’s more than you can afford to have sitting around idle in your own data center.  Because of that, they don’t have to sell it to you in perpetuity.  You can rent it as you need it, just like eBay does for bursting.  There are tons of new operational strategies that are suddenly available to you by taking advantage of Elasticity.

Let me give you just one.  For SaaS companies, it is really easy to do Beta Tests.  You don’t have to buy 2x the hardware in perpetuity.  You just need to rent it for the duration of the Beta Test and every single customer can access their instance with their data to their heart’s content.  Trust me, they will like that.

What about Public Versus Private Clouds?

Hang on, we’re almost there, and it seems like it has been a worthwhile journey.

Start with, “What’s a Private Cloud?”  Let’s take all the technology of a Public Cloud (heck, the Nimbulla guys built EC2, so they know how to do this), and create a Private Cloud.  The Private Cloud is one restricted to a single customer.  It’d be kind of like taking a copy of Salesforce.com’s software, and installing it at Citibank for their private use.  Multitenant with only one tenant.  Do you hear the sound of one hand clapping yet?  Yep, it hurts my head too, just thinking about it.  But we must.

Pawing through the various advantages we’ve discussed for the Cloud, there are still some that accrue to a Cloud of One Customer:

–  It is still a Software Service that we can control via API’s, so we can invest in Ops Automation.  In a sense, you can spin up a new Virtual Data Center (I like that word better than Private Cloud, because it’s closer to the truth) on 10 minutes notice.  No waiting for servers to be shipped.  No uncrating and testing.  No shoving into racks and connecting cables.  Push a button, get a Data Center.

–  You get the buying power advantages of the Cloud Vendor if they supply your Private Cloud, though not if you buy software and build  your Private Cloud.  Hmmm, wonder what terminology is needed to make that distinction?  Forrester says it’s either a Private Cloud (company owns their own Cloud) or a Hosted Virtual Private Cloud.  Cumbersome.

But, and this is a huge one, the granularity is huge, and there is way less Elasticity.  Sure, you can spin up a Data Center, but depending on its size, it’s a much bigger on/off switch.  You likely will have to commit to buy more capacity for a longer time at a bigger price in order for the Cloud Provider to recoup giving you so much more control.  They have to clear other customers away from a larger security zone before you can occupy it, instead of letting your VM’s commingle with other VM’s on the same box.  You may lose the more multitenant-like advantages of the storage cluster and the network infrastructure (remember, only EC2 was stuck being pure virtual). 

What Does it All Mean, and What Should My Company Do?

Did you see Forrester’s conclusion that most companies are not yet ready to embrace the Cloud and won’t be for a long time?

I love the way Big Organizations think about things (not!).  Since their goal is preservation of wealth and status, it’s all about risk mitigation whether that is risk to the org or to the individual career.  A common strategy is to take some revolutionary thing (like SaaS, Multitenancy, or the Cloud), and break it down into costs and benefits.  Further, there needs to be a phased modular approach that over time, captures all the benefits with as little cost as possible.  And each phase has to have a defined completion so we can stop, evaluate whether we succeeded, celebrate the success, punish those who didn’t play the politics well enough, check in with stakeholders, and sing that Big Company Round of Kumbaya.  Yay!

In this case, we have a 5 year plan for CIO’s.  Do you remember anything else, maybe from the Cold War, that used to work on 5 year plans?  Never mind.

It asserts that before you are ready for the Cloud, you have to cross some of those modular hurdles:

A company will need a standardized operating procedure, fully-automated deployment and management (to avoid human error) and self-service access for developers. It will also need each of its business divisions – finance, HR, engineering, etc – to be sharing the same infrastructure.  In fact, there are four evolutionary stages that it takes to get there, starting with an acclimation stage where users are getting used to and comfortable with online apps, working to convince leaders of the various business divisions to be guinea pigs. Beyond that, there’s the rollout itself and then the optimization to fine-tune it.

Holy CYA, Batman!  Do you think eBay spent 5 years figuring out whether it could benefit from bursting to the Cloud before it just did it?

There’s a part of me that says if your IT org is so behind the times it needs 5 years just to understand it all, then you should quit doing anything on-premise and get it all into the hands of SaaS vendors.  They’re already so far beyond you that they must have a huge advantage.  There is a another part that says, “Gee guys, you don’t have to be able to build an automobile factory as good as Toyota to be able to drive a car.”

But then sanity and Political Correctness prevail, I come back down to Earth, and I realize we are ready to summarize.  There are 4 levels of Cloud Maturity (Hey, I know the Big Co IT Guys are feeling more comfortable already, they can deal with a Capability and Maturity Model, right?):

Level 1:  Dabbling.  You are using some Virtualization or Cloud technology a little bit at your org in order to learn.  You now know what a Machine Image is, and you have at least seen a server that can run them and swapped a few in and out so that you experience the pleasures of doing amazing things without crawling around the Data Center Cage.

Level 2:  Private Cloud.  You were impressed enough by Level 1 that you want the benefits of Cloud Technology for as much of your operation as you can as fast as you can get it.  But, you are not yet ready to relinquish much of any control.  For Early Level 2, you may very well insist on a Private Cloud you own entirely.  Later stage Level 2 and you will seek a Hosted Virtual Private Cloud.

Level 3:  Public Cloud.  This has been cool, but you are ready to embrace Elasticity.  You tripped into it with a little bit of Bursting like eBay, but you are gradually realizing that the latency between your Data Center and the Cloud is really painful.  To fix that, you went to a Hosted Virtual Private Cloud.  Now that your data is in that Cloud and Bursting works well, you are realizing that the data is already stepping outside your Private Cloud pretty often anyway.  And you’ve had to come to terms with it.  So why not go the rest of the way and pick up some Elasticity?

Level 4:  SaaS Multitenant.  Eventually, you conclude that you’re still micromanaging your software too much and it isn’t adding any value unique to your organization.  Plus, most of the software you can buy and run in your Public Cloud world is pretty darned antiquated anyway.  It hasn’t been rearchitected since the late 80’s and early 90’s.  Not really.  What would an app look like if it was built from the ground up to live in the Cloud, to connect Customers the way the Internet has been going, to be Social, to do all the rest?  Welcome to SaaS Multitenant.  Now you can finally get completely out of Software Operations and start delivering value.

BTW, you don’t have to take the levels one at a time.  It will cost you a lot more and be a lot more painful if you do.  That’s my problem with the Forrester analysis.  Pick the level that is as far along as you can possibly stomach, add one to that, and go.  Ironically, not only is it cheaper to go directly to the end game, but each level is cheaper for you on a wide scale usage basis all by itself.  In other words, it’s cheaper for you to do Public Cloud than Private Cloud.  And it’s WAY cheaper to go Public Cloud than to try Private Cloud for a time and then go Public Cloud.  Switching to a SaaS Multitenant app is cheaper still.

Welcome to crazy world of learning how to work and play well together when efficiently sharing your computing resources with friends and strangers!

Posted in amazon, cloud, data center, ec2, enterprise software, grid, multicore, platforms, saas, service | 15 Comments »

Salesforces Switches to Dell/Linus. What’s Next, MySQL Over Oracle?

Posted by Bob Warfield on July 15, 2008

Salesforce will be unplugging the last of their Sun Solaris servers from their SaaS operations this week, according to TechCrunchIT.  That’s quite a big change for Salesforce, and a bit of a PR blow for Sun.  It reflects some important operational realities that the rest of the industry and corporate IT should be watching carefully.

First, vertical scaling is hard in the multicore crisis era.  When cpus no longer get twice as fast with every Moore Cycle, scaling is harder to come by and hardware gets commoditized.  Future scaling has to come from software architecture changes.  Horizontal Scaling, in other words, not Vertical Scaling.  The multicore crisis brings us to an era of many small computers rather than fewer more powerful computers, and its up to the software guys to figure that out.

Second, for a SaaS company, the cost of service delivery is an absolutely critical factor.  Once you have software that runs well and scales horizontally on cheap commodity hardware, you’ve created a huge cost advantage for yourself.  As we speak, the cost to deliver service for the various public SaaS companies is all over the map, but Salesforce has always had one of the lowest if not the lowest cost on the map.  This allows them to either show greater profitability or reinvest the savings in faster growth.

This brings me to my other point.  How long can it be before they investigate swapping out Oracle for MySQL?  As the TechCrunchIT article mentions, Salesforce started with Oracle but there’s been no mention recently about the current status.  It would be a logical further development in reducing costs if they had chosen to eliminate or were working on eliminating the cost of Oracle licenses.  For many SaaS vendors, this is a huge piece of their Cost of Services. 

Can you build industrial grade software without Oracle?  In a word, yes.  Many highly scalable web sites have done so and lived to tell the tale.  It’s more work, but once you’ve done the work the payoff can mean huge savings.  At a prior employer we were actually quite surprised to test several Open Source DB’s and learn their performance was actually not that far off of Oracle’s.  My current employer, Helpstream, has built everything on an Open Source stack and the benefits have been enormous.

How long will it be before we’re hearing that Salesforce has dropped Oracle too?  What’s your company doing to leverage commodity hardware and Open Source databases?

Related Articles

Fellow Enterprise Irregulars Dennis Howlett, Vinnie Mirchandani, and Thomas Foydel (who first raised the point about brand comfort) make some excellent points on this subject, particularly the issue of switching off Oracle.

One key point that I have personally heard before is that SaaS vendors like to offer customers the comfort of knowing the solution runs on Oracle versus Open Source.  It’s a more conservative stance that plays well in the Enterprise.  Brand matters.  Sun is working hard on the MySQL brand, but they certainly haven’t caught Oracle yet.  As Vinnie puts it, the question is, “whether SaaS vendors benefit from at least the perception that Oracle is more “bullet proof” or do SaaS customers just want results (high uptime, performance etc) and don’t really care what the underlying  technology is – especially if the economics are more attractive?”

Dennis adds some other unique thinking.  If Salesforce wants to be acquired by Oracle then it should stick with the Oracle stack.  The only thing I’d add there is it’s pretty easy to switch from MySQL back to Oracle and much harder to do the reverse.  I think they’d be fine in an acquisition if they were simple careful not to emphasize a switch to MySQL.  They did work reasonably hard to keep the Dell switch quiet, so they may already be on that path.

The second thought Dennis has is that licensing is very complex on these things.  Here again I have to agree.  Just dealing with the legals and other aspect of a relationship with a large Enterprise vendor/technology partner is expensive for a startup.

In both cases, Vinnie’s post title fits:  Does SaaS need Oracle more than Oracle needs SaaS?

Good insights, guys!

Posted in multicore, platforms, saas | 8 Comments »

Are Custom Chips An Answer to the Multicore Crisis?

Posted by Bob Warfield on March 20, 2008

Stacey Higginbotham wrote an interesting piece that made me wonder.  Apparently there are lots of less than cutting edge chip fabs out there that people want to keep running.  It got me to wonder.  In an age where smaller features on chips translate only to more transistors, but not neccesarily faster transistors, is the ability to have more transistors as economically valuable?  Particularly if we can’t put them to use?  The problem is Moore’s Law these days translates to more cores, not faster clock speeds.  Nearly all the software out there can’t make use of the extra cores yet, and there is a lot of discussion about how the world may have to completely retool software to make use of lots of cores.  Meanwhile, Intel sails on with 6 core chips on the horizon, more to come, and not a very good idea what to do with them.

What if what separates the latest “good” fabs from older “obsolete” fabs is not longer that valuable?  Maybe the value is less from ever smaller feature sizes and more from new chip types?  That would shift the economics from favoring giants like Intel capable of building ever more expensive fabs to those with the IP to design more new chips on fab processes that are “good enough” for lots of interesting applications.  As big as Intel is, maybe driving faster CPU’s is not the most lucrative pasttime at this point in the technology curve.

It used to be that a general purpose CPU that constantly doubled in speed every 18 months (Moore’s Law) was the place to invest.  It was a truly general purpose device capable of running all sorts of software.  There have been special purpose chips built too for maximum performance in various areas:  graphics coprocessors and various network chips are good examples.  Suppose we can make special purpose chips for almost any purpose.  I once talked to a startup that had built a hardware search accelerator for example.  They vanished into the Government spy world never to be heard from again, but it is intriguing. 

If you could create a dirt cheap special purpose chip, what would it do?  What would be the market for it?

Before the dawn of RISC there was much interest in hardware accelerators for specific languages.  Lisp machines were one such.  I remember reading a quote from Alan Kay that modern machines don’t run dynamic languages like Lisp and Smalltalk as much faster than the old machines like the Dorado as their newfound clockspeeds would imply they should.  He hinted that radically different hardware architectures could greatly benefit such languages.  I couldn’t find more on that than this quote that says the new generation is only about 50x faster than the old machines, which is a pretty poor showing indeed.

Today we have a renaissance in the interpreted and scripting languages that are descendants of languages like Lisp and Smalltalk.  Languages like Ruby on Rails, Python, and PHP are very mainstream and might benefit.  One wonders whether even Java might benefit.  Would a chip optimized to run the virtual machines of one of these languages without regard to compatibility with the old x86 world be able to run them a lot faster?  Would a chip that runs Java 10x faster than the fastest available cores from Intel be valuable at a time when Java has stopped getting faster via Moore’s Law? 

It seems to me it would.

Posted in multicore | 1 Comment »

Google Reports iPhone Usage 50x Other Handsets; Amazon S3 Goes Down: Low Friction Has a Cost

Posted by Bob Warfield on February 15, 2008

As I write this post there are two articles that caught my eye.  For most, the iPhone and Amazon’s Web Services have little to do with one another, but I see a bit of a pattern here that’s interesting.

Slash Lane of Apple Insider reports that Google was shocked that is was seing 50 times more search requests coming from Apple iPhones than any other mobile handset — a revelation so astonishing that the company originally suspected it had made an error culling its own data.  It’s an amazing statistic, really.  But I can attest to hitting Google quite a lot myself whenever I’m out and about and killing time before the next meeting.  In fact, I am very pleased to have my bookmarks out on a web page rather than in my browser so I can easily access all of my favorite sites from whatever device is at hand.  The iPhone is quite a credible web browser.  I can’t wait for the 3G version and higher speeds.

Following closely on my read of the iPhone piece is Nick Carr’s article about an Amazon S3 outage.  Nothing all that earth-shattering or unexpected, just that S3 was out for several hours this morning, beginning at 7:30am EST.  The gist of the article is that while the outage was to be expected, Amazon did a poor job keeping users informed of what was going on and providing explanations after the fact.  Carr is right, of course, but business is always embarassed when things go wrong and the first (and wrong) human instinct is to be shy about details.

Why do these two go together?  I’ll give you a hint:  the tales of Facebook applications reaching millions of users in an incredibly short time also goes with the theme I’m thinking of.  That theme has to do with friction.  Friction is my word for all the factors that slow adoption.  The time needed for word of mouth, decisionmaking, purchase, installation, getting through the learning curve, and finally being a first class citizen of whatever community results is governed by the degree of friction.

One of the things the Internet does is reduce friction.  In its most extreme, friction actually reverses and becomes a propelling force.  We call that viral marketing.  Most of the innovations in this second Internet round (post-bubble) have been focused on reducing friction.  Social Networks, for example, dramatically reduce the friction of networking.  Twitter dramatically reduces the friction of blogging, right down to limiting the article length to 140 characters so you don’t have to labor over the wordsmithing.

While it’s harder, the web is also a powerful means of reducing friction for more physical things.  The iPhone and Amazon Web Services are two great examples.  In an extremely short time the iPhone has racked up 50x the usage of other competing handsets for the Internet.  The traffic to AWS in approximately the same short time now exceeds the combined traffic for all other Amazon properties.

While the web itself helped to spread the word, I think it is no coincidence that these two have a lot to do with the web and offer a lot of value back to the web.  It’s what some folks call a virtuous circle.  Look for more of these as time goes on.

Now that cost side.  These growth rates are not predictable.  Nobody would have guessed that either business would get so big so fast.  In fact, many guessed just the opposite.  Even if you did guess it could happen, it would only be a guess that it could, not that it would.  A prudent business would not invest in infrastructure built to the level and assumption that it would happen.  That means there will be painful outages from time to time.  Hopefully, the infrastructure owners will take those outages as signs that its time to double down and extend their projections of what might happen much further up and to the right.  Those that succeed in keeping hold of the Tiger by the Tail will survive and prosper.

Posted in amazon, data center, grid, Marketing, multicore, Web 2.0 | 5 Comments »

Software Testing in the Multicore Cloud Computing Era With Replay Solutions

Posted by Bob Warfield on February 11, 2008

I had the opportunity to visit Jonathan Lindo, CEO and co-founder of Replay Solutions last week and I came away impressed.  This Hummer Winblad and Partech backed startup has some fascinating new technology to help with software testing and debugging.  I like to think of their software as a time machine for complex software.  With it, you can go back and recreate the circumstances that led to a bug, and thereby figure out what has happened.  Their software works by turning your J2EE application into a black box, and monitoring everything that comes and goes into or out of the box.  Using their proprietary algorithms, the data required to do this is actually kept very small.  So small, that the company got its start helping game companies to monitor their software using the same technology.  They’re still doing a business in that market, and you can imagine the software has to be pretty unintrusive if its not going to interfere with a game.  And so it is. 

It works this magic by tapping into and instrumenting the Java code.  This sounds a lot like what my old alma mater Pure Software did with their memory leak detection.  What’s nice about it is that no access to source code is required.  In the demo, Jonathan fired up an app server (they support Tomcat and JBoss, and soon WebLogic), lit up their instrumentation module, and from that point on just used the software being tested normally.  Of course in the demo, using the software “normally” eventually led to a crash.  It was the classic ugly Java stack dump that tells you very little about what actually happened–just the thing to annoy both the user and the developers.

Replay Solutions to the rescue.  Jonathan likes to think of it as “Tivo for Software.”  Looking at the screen one sees a screenshot of every HTML rendering to the screen.  This makes it easy to tell where in the recorded dump you are and what the user was doing at the time.  The developer can set breakpoints in their code and then use ReplayDIRECTOR (that’s what the software is called) to bring the program up to the point of failure.  This can be done over and over until the programmer has figured out what went wrong.

Sounds cool, but why is this software an essential tool for the Multicore Cloud Computing Era?  Think about it.  In the old days, reproducing bugs was hard enough.  It could take days to find the exact set of steps needed to make a bug reproducible.  And until the bug is reproducible, it’s nearly impossible to fix.  Now fast forward to the Multicore Cloud Computing Era.  You’ve got hundreds or even thousands of simultaneous users running against a hundred or more CPU’s.  There are many many processes running.  Developers recognize this as a nightmare situation, because it becomes impossible to reproduce bugs in such a world.  How would you ever get all of those users to do exactly the same thing twice?  Add to that all the other crazy timing-related issues and it’s darned near impossible to track down many kinds of bugs on such software.

I talked over with Jonathan what I thought was a really cool scenario.  Would it be possible to set up ReplayDIRECTOR to continuously monitor a big SaaS or Web 2.0 system?  The answer, surprisingly, is that it is completely possible.  Suddenly, we can make these kinds of bugs reproducible.  But it gets even better.  ReplayDIRECTOR will reproduce the problem on far less hardware than the original system.  That’s another big issue to be faced with such systems–the cost of providing a duplicate environment for testing.  With Replay, the “black box” can be just the J2EE server.  All of the other pieces are simulated.

If I were currently involved with a J2EE-architecture piece of Enterprise Software, I would definitely be trying to get into Replay’s Beta Testing program.

Posted in multicore, saas, Web 2.0 | Leave a Comment »

Apple, MacWorld, User Experience, and the Multicore Crisis

Posted by Bob Warfield on January 16, 2008

Looking over the parachute drops of information from MacWorld, I was struck by some underlying themes.  I won’t bore you with a recitation of the huge amount of surface level activity: plenty of better more firsthand places to get that.  But some of those first hand sources excited some patterns I’m familiar with.

First, the multicore crisis bit.  I’ve written about it before, but let me recap.  What is the multicore crisis?  It is a wave of change that is being unleashed by virtue of the fact that microprocessors have stopped getting faster every 18 months.  Instead of gaining a faster clock speed with free benefits for all at scarcely any effort, we get more cores.  That ain’t bad, but it takes considerable effort at the software end to take advantage of the additional cores.  For the most part, we are far from keeping up with the availability of those cores.  For emphasis, here is a graph of Intel clock speeds that vividly shows just how long the curve has been flattened out:

Clock Speed Timeline

We’ve had another year in 2007 while the curve remained flat.

What does this have to do with Apple and MacWorld?  Well, on a simple vein, it was the multicore crisis checking in that caused Mathew Ingram to write, “Hey, Steve–you broke the Internet.”  He was remarking about how Twitter was virtually unusable for hours.  Twitter has become somewhat of an unwilling canary in the coal mine: if something is hot and getting traffic, Twitter seems bound to go down.  Why?  Because it is a victim of the Multicore Crisis.  The system’s architecture isn’t scaling.  It may be a software problem, i.e. it is not designed to take advantage of enough cpu’s, or an infrastructure problem, i.e. it can only take advantage of the cpus Twitter has physically bought and installed in their data center.  These can both be overcome.  Software can be made to take advantage of lots more processors.  Services like Amazon and others offer let you scale up to many more cpu’s on short notice without having to buy physical hardware.  Failure to provide for both these contingencies is succumbing to the Multicore Crisis.

Twitter was not unique.  Mathew’s blog was very slow to come up when I tried to access the article, having been Techmemed.  He mentions Fake Steve Jobs got creamed and couldn’t make CoverIt Live work (Zoli mentions CoveritLive was CoveritDead).  The Apple store was down at one point too.

Scoble tells a similar story:  Engadget was up but very slow, Qik’s macworld channel was up and down, Mogulus was slow to unreachable.  Live video was hard to come by.  TUAW fairly unreachable.  There were a couple sites that passed muster including TechCrunch (bravo!) and MacRumorsLive.  TechCrunch hammers Twitter for being down.  Again.  If, as its pundits like to think, Twitter will play a signficiant role in reporting events, it needs to work all the time.  It is, after all, a communication channel.  Moreover, it’s a communication channel under constant scrutiny.

This brings me to a point I want to make about the Multicore Crisis and The Big Switch (what Nick Carr calls the trend to move to Cloud Computing).  These two megatrends are combining to change what the important core competencies are to succeed.  Once upon a time, it was enough just to be able to lash together all the myriad pieces needed to create a web application with a good user design.  You could count on Moore’s Law to make machines faster and your customer growth was slow enough that scalability could be comfortable pushed out into the future as a high quality problem to deal with if you succeeded.  That’s no longer the case.  The ability for new ideas to catch on has become viral on the web for a variety of reasons, not the least of which is that so many more people are on the web and they’re interconnected in so many more ways than simple e-mail, search, and web browsing.

There is another, more subtle manifestation of all this.  The new MacBook Air personifies this.  In the Multicore era: user experience is the new black for hardware.  Why?  Well, in the old days, everyone wanted to upgrade every two years.  For a while, I bought a new PC every year.  And it was worth it.  The new machines were significantly faster than the old.  In a world where the upgrade cycle is so short, you want to buy cheap hardware.  Result?  Dell wins big.  They’re the best at building their hardware cheap, so you can buy it more often, so you can get that speed.  Dell was driven by the Need for Speed, and the relative ease with which Moore’s Law delivered it.

Times have changed.  In an era when you probably won’t upgrade every two years, let alone every year, it makes sense to look at something other than speed.  I have an idea, how about looking at the User Experience?  Is the machine sexier?  Does it do cool things?  I love the Air’s ability to “borrow” a disk drive via WiFi from a nearby machine as well as its ability to handle iPhone-like gestures on its touch pad.  Combining Apple’s trademark radically uber-cool Industrial Design with genuine usability innovation is a winning formula.  If it gets you to buy a new machine when you otherwise would be happy to stand pat, they win.  The fact that so much of what one does on a computer is via the Internet combined with the rise of very effective virtualization software has radically lowered the barriers to PC/Windows users buying a Mac as well.  The latter is the Big Switch component.

That’s two significant changes brought on by the Multicore Crisis and The Big Switch.  What is your company doing to get ahead of these trends before some competitor uses them to ride right over your business?

Posted in data center, multicore, saas, strategy, user interface, Web 2.0 | 3 Comments »

Scalability is a Requirement for Startups

Posted by Bob Warfield on December 6, 2007

Dharmesh Shah wonders whether startups should ignore scalability:

You’re worrying about scalability too early. Don’t blow your limited resources on preparing for success. Instead, spend them on increasing the chances that you’ll actually succeed.

It’s an interesting question: should startups worry about scalability, or does that get in the way of finding a proper product/market fit?  If you’ve read my blog much you’ll know that I view achieving that product/market fit as the highest priority for a startup, and I’m not alone, Marc Andreesen says it too.  I think this is so important that I have advocated some relatively radical architectural ramifications to help facilitate the flexibility of a product so it can evolve towards that ideal even faster.

But where does scalability fit in?  Can you achieve that product/market fit without it?  For most startups, I think it is either difficult to verify a true product/market fit without it, or worse, you may achieve it only to immediately fall to earth a victim of poor user experience.  There are certainly plenty of examples of companies that started out great, seemed to have that product/market fit, but got into persistent hot water because they couldn’t scale out a good user experience when their site began to take off.  Fred Wilson writes recently about his love/hate relationship with Technorati, which has been a good example of this.

Here is another question, “How much success do  you need to verify product/market fit?”  Signing up a few customers to a beta, or even having a large beta is not really enough in my opinion.  It’s pretty easy to get a ton of people to try something that sounds sexy and is promoted well.  The question is whether that really takes well enough.  Marc Andreesen’s Ning is a good example.  When they launched their original product it required a fair amount of custom programming to create a custom Social Network.  They had 30,000 social networks created even so, but the service wasn’t taking off.  Michael Arrington was calling it R.I.P.  Then they released a version that eliminated the need for programming and suddenly the product/market fit was there and it took off like a rocket, crossing 100,000 social networks in record time.  Clearly Ning had to deal with scalability before they could learn much about their product/market fit.

Google is another great example of this.  They had to scale from day one because of the problem they were solving.  Om Malik says their infrastructure and ability to scale is actually their strategic advantage.  Certainly the nature of the problem Google wanted to solve required scalability from day one.  This is how Aloof Schipperke wants to view the question when he says, “Scalability is a requirement, not an optimization.”  It’s a bit of a double entendre.  One could say it is a requirement that all startups deal with it, or one could say startups need to evaluate whether scalability is a requirement in their domain.  I’m in that latter camp.  Figure out what success really looks like.  When do you know you have product/market fit?  Be conservative.  What are the requirements to get there?  Aloof lumps scalability in with other “ilities”.  Can your startup reach product/market fit without security, for example?  The answers may surprise you if you’re really honest about it.

Chances are, you may have to do more to be sure about product/market fit than you are comfortable with in release 1.0.  You’ll need a phased plan for how to get there.  Lest you use this as an excuse to ignore scalability until the last minute, keep in mind that these phased plans should have short milestones.  Quarterly or six month iterations at most.  Scaling a really poorly architected application can amount to a painful rewrite.  So do a phasing plan for scaling.  What are the big ticket items you’ll need to enable early so that scaling later is not too hard?  There are a few well-known touchpoints that can make scalability easier.  I’m not going to go over all of them, you know what I’m talking about:  statelessness, RESTful web services, and beware the database.  If you don’t know about these things, get some people on your team who do!  It’s not hard to start out with a plan in mind about your eventual scalablity and just make sure that along the way you don’t inadvertently shoot yourself in the foot.  It usually boils down to securing the two ends of the puzzle with good scalability:

– How will the client side web servers scale?

– How will the database back end scale?

Make a plan for what it will look like when it’s done, and put phased milestones in place to get there over time. 

Here’s another key issue.  Dharmesh’s original question assumes scalability and user/experience compete for scarce resources.  Ed Sim somewhat follows this path too when he writes that it’s hard to sell Scalability.  Aren’t we talking about the tradeoffs between UI/Features and Infrastructure (web or DB)?  Are the same engineers really doing both things?  It seems to me a lot more common to have a “front end” or application group and a “back end” or infrastructure group, even if “group” is a bit grandiose for a couple of people.  Take the opportunity to map out how the modules produced by these two groups will communicate.  Make that communication architecturally clean so the groups are decoupled.  Make the communication work the way it will when you build out scalability, but then don’t build it out at first.  This will enable the infrastructure group’s agenda to decouple from the user experience guys. 

BTW, if you’re thinking the true competition between the two is you want to hire all user experience people with your capital and no infrastructure, that just sounds like a bad idea to me.  It’s hard to deliver good user experience if your infrastructure is lousy, buggy, and doesn’t perform.  There are ample studies that show the speed with which your application serves up pages is a big contributor to user experience as well. 

I’ve gone down this path before of having essentially two small teams and making sure there was clean communication between their code from the start.  My company PriceRadar was lucky enough to land a partnership with AskJeeves early on.  Part of the deal was we had to pass a load test that showed we could handle 10,000 simultaneous users hammering our application.  At the time, most of my experience and developers were from the Microsoft world, so we were .NET all the way.  I remember meeting with the advisory board for a company called iSharp.  It was an all-star cast of web application CTO’s and VP’s of Engineering.  We went around the table to hear what everyone was doing.  I was the only Microsoft guy in the room, and the Unix crowd just laughed when I told them we had to pass this big load test.  AskJeeve’s CTO was there as well as the fellow in charge of AOL Instant Messenger and about 10 others.  They flat said it was impossible on Unix.  In less than a month we had it all working with a distributed grid architecture.  The front end guys were never even involved and changed little or no code.  The back end guys didn’t sleep much, but they emerged triumphant.  And the entire team was about 10 developers, per my small team mentality.

Yes Virginia, you should worry about your scalability, but it need not be all consuming.  You can handle it.

Posted in data center, grid, multicore, platforms, saas, strategy, Web 2.0 | 2 Comments »

A Pile of Lamps Needs a Brain

Posted by Bob Warfield on October 28, 2007

Continuing the discussion of a Pile of Lamps (a clustered Lamp stack in more prosaic terms), Aloof Schipperke writes about how such a thing might manage its consumption of machines on a utility computing fabric:

Techniques for managing large sets of machines tend to either highly centralized or highly decentralized. Centralized solutions tend to come from system administration circles as ways to cope with large quantities of machines. Decentralized solutions tend to come from the parallel computing space where algorithms are designed to take advantage of large quantities of machines.

Neither approach tends to provide much coupling between management actions and application conditions. Neither approach seems well adapted for any form of semi-intelligent dynamic configuration of multi-layer web application. Neither of them seem well suited for non-trivial quantities of loosely coupled LAMP stacks.

Aloof has been contemplating whether a better approach might be to have the machines converse amongst themselves in some way.  He envisions machines getting together when loads become too challenging and deciding to spawn another machine to take some of the load on.

Let’s drop back and consider this more generally.  First, we have a unique capability emerging in hosted utility grids.  These range from systems like Amazon’s Web Services to 3Tera’s ability to create grids at their hosting partners.  It started with the grid computing movement which sought to use “spare” computers on demand, and has now become a full blown commercially available service.  Applications can order and provision a new server literally on 10 minutes notice, use it for a period of time, and then release the machine back to the pool only paying for the time they’ve used.  This differs markedly from stories such as iLike’s, who had to drive around in a truck borrowing servers everywhere they could, and then physically connect them up.  Imagine how much easier it could have been to push a button and bring on the extra servers on 10 minutes notice as they were needed.

Second, we have the problem of how to manage such a system.  This is Aloof’s problem.  Just because we can provision a new machine on 10 minutes notice doesn’t mean a lot of other things:

  • It doesn’t mean our application is architected to take advantage of another machine. 
  • It doesn’t mean we can reconfigure our application to take advantage in 10 minutes.
  • It doesn’t mean we have a system in place that knows when it’s time to add a machine, or take one back off.

This requires another generation of thinking beyond what’s typically been implemented.  New variable cost infrastructure has to trickle down into fixed cost architectures.  For me, this sort of problem always boils down to finding the right granularity of “object” to think about.  Is the machine the object?  Whether or not it is, our software layers must take account of machines as objects because that’s how we pay for them.

So to attack this problem, we need to understand a collection of questions:

  1. What is to be our unit of scalability?  A machine?  A process?  A thread?  A component of some kind?  At some level, the unit has to map to a machine so we can properly allocate on a utility grid.
  2. How do we allocate activity to our scalability units?  Examples include load balancing and database partitioning.  Abstractly, we need some hashing function that selects the scalability unit to allocate work (data, compute crunching, web page serving, etc.) to.
  3. What is the mechanism to rebalance?  When a scalability unit reaches saturation by some measure, we must rebalance the system.  We change the hashing function in #2 and we have a mechanism to redistribute without losing anything while the process is happening.  We also must understand how we measure saturation or load for our particular domain.

Let’s cast this back to the world of a Pile of Lamps.  A traditional Lamp stack scaling effort is going to view each component of the stack separately.  The web piece is separate from the data piece, so we have different answers for the 3 issues on each of the 2 tiers.  Pile of Lamps changes how we factor the problem.  If I understand the concept correctly, instead of independently scaling the two tiers, we will simply add more Lamp clusters, each of which is a quasi-independent system.

This means we have to add a #4 to the first 3.  It was implicit anyway:

    4.  How do the scaling units communicate when the resources needed to finish some work are not all present within the scaling unit?

Let’s say we’re using a Pile of Lamps to create a service like Twitter.  As long as the folks I’m following are on the same scaling unit as me, life is good.  But eventually, I will follow someone on another scaling unit.  If the Pile of Lamps is clever, it makes this transparent in some way.  If it can do that, the other three issues are at least things we can go about doing behind the scenes without bothering developers to handle it in their code.  If not, we’ll have to build a layer into our application code that makes it transparent for most of the rest of the code.

I think Aloof’s musings about whether #3 can be done as conversations between the machines will be clearer if the Pile of Lamps idea is mapped out more fulling in terms of all 4 questions.

Posted in grid, multicore, platforms, strategy, Web 2.0 | 1 Comment »