SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Archive for the ‘Open Source’ Category

Bumps in My Internet Journey (Links): August 27, 2007

Posted by Bob Warfield on August 27, 2007

This post introduces a new weekly feature for the SmoothSpan blog where I’ll list noteworthy links I came across during the prior week.  Call them “Bumps in My Internet Journey” because they made me stop and think!  If they made me think enough, eventually you’ll see a blog post about it.

Why Mahalo, TechMeme, and Facebook are going to kick Google’s butt in four years:  Scoble’s got an interesting take.  Everyone hates search engine Spam!

Attention Economy:  All You Need to Know:  Nice overview of Read/Write Web’s Attention Economy concept.  Eventually I’ll blog about this…

Denormalizing Your Way to Speed and Profit:  Because databases are an agent of the Devil when it comes to massive scalability.

Yahoo Pig and Google Sawzall:  Wherein MapReduce and Hadoop become Languages.  Don’t worry, to become a Languages is the most exalted thing in computing.  Adobe even made printers languages!

Profiting from the Content Delivery Network Wars:  They haven’t seen anything yet.  Amazon and the other big guys will roll Akamai et al up as part of the hosting package  you get when you buy their utility computing service.

What Makes an Idea Viral:  Seth Godin is always worth listening to.

Werner Vogels Tells Us About the Amazon Technology Platform:  As well as interesting glimpses into their culture.

By 2014, We’ll Have 1000-core Chips:  The amazing Tile64 has shipped with 64 cores.  Available today lest you thought the Multicore Crisis was far in the future!

Posted in business, grid, Marketing, multicore, Open Source, Partnering, platforms, saas, software development, Web 2.0 | Leave a Comment »

Scoble and I Are On a Similar Wavelength About Social Graph Based Searching

Posted by Bob Warfield on August 26, 2007

I couldn’t believe the serendipity.  Not long after writing my post about Searching Blogs Instead of Google to avoid Spam, I read Robert Scoble’s excellent piece about Social Graph Based Search (or here for the meat).  We’re very much on the same wavelength here!  Scoble’s videos do a great job of explaining why the blog search method works.  In essence, Page Rank (Google’s search algorithm) is just too easy for SEO’s (Search Engine Optimizers) to cheat on.  In essence, create a ton of links to your page, populate it with the right keywords, and you can trick Google into sending you a zillion people no matter what trash you may have put there.

Scoble uses the term “SEO Resistant Search”.  Ironically, SEO Resistance, or rather, better relevance, is the reason most people use Google.  But this whole approach is ideal for the Open Search Engine Initiative I’ve proposed already.

Good reading here, thanks Scoble: it is indeed the basis for a whole new kind of search engine!

Posted in Marketing, Open Source, Partnering, platforms, saas, Web 2.0 | Leave a Comment »

Why Don’t Search Startups Share Data, Part 2

Posted by Bob Warfield on August 22, 2007

I mentioned in an earlier post that search startups ought to look into a divide and conquer approach when crawling the web.  After all, one of the biggest complaints about a lot of interesting search services is they don’t find as much as Google does.  TechCrunch, for example, complains that Microsoft’s new Tafiti produces search results that are “not as relevant as Google or Yahoo“.  And yet, they also admit Tafiti is beautiful (as an aside, it is very cool and worth a look to see what Microsoft’s Flex killer, Silverlight, can do for a web site).  If the Alt search sites band together to do the basic crawling and crunching using Google’s MapReduce-style algorithms (possible based on the Open Sourced Hadoop Yahoo is pushing), they could share one of the bigger costs of being in business and ameliorate the huge advantage in reach that the biggest players have over them.

ZDNet bloggers Dan Farber and Larry Dignan ask whether Open Sourced Hadoop can give Yahoo the leverage it needs to close the gap with Google.  Their first words are that “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.”  I don’t think Hadoop by itself is enough, but if Yahoo were to create a collaborative search service, maybe it would be.  In fact, what if search was much more like Facebook only more open (Hey, if Scoble can do it with a hotel, I can do it with a search engine!)?  In a manner similar to my “Web Hosting Plan for World Domination“, Yahoo could undertake a plan for “Search Engine World Domination”.  Here’s how it would work:

–  Yahoo builds up the Hadoop Open Source infrastructure for Web Crawling.  Alt Search engines can tie back into that to get the raw data and avoid doing their own crawling.  Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.”  Let’s share those costs and further defray them by having a big player like Yahoo help out.

–  Yahoo can also offer up the Hadoop scaffolding to do any massively parallel processing these Alt Search Engines need to compute their indices.  Think of it as being like Amazon’s EC2 and S3, but purpose-built to simplify search engines.  People are already asking Amazon for Search Engine AMI’s, so there is clearly interest.

–  Now here is there Facebook piece of the puzzle:  Yahoo needs to turn this whole infrastructure play into a Social Networking play.  That means they offer Search Widgits to any Social Network that wants them, and they let you personalize your own search experience by collecting the widgits you like.  Most importantly, Yahoo creates basic widgits that reflect their current search offering, but they allow the Alt Search Engines to make widgits that package their search functionality.  Take a look at Tafiti and see how it let’s you select different “views”.  Those views are widgits!

–  Yahoo gets a big new channel for its ads, and it gracioulsy shares the revenues with the Widgit builders because that’s what makes the world go round.  Perhaps they even have virtual dollars that can be used to pay for the infrastructure using ad revenue, although I personally think they should give away as much infrastructure as possible to attract the Alt Search crowd to their platform. 

Don Dodge, meanwhile, is wondering what the exit strategy is for the almost 1,000 startups out there trying to peddle alternative search engines.  It sure seems to me that creating this search widgit social network world solves a big problem for Yahoo and at the same time creates a lot of new opportunity for the exit strategy of these engines.  Suddenly, they have access to large volumes of data they couldn’t afford and a distribution channel in which to build an audience. 

Open Source Swarm Competition in the Search Engine Space is Born!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, business, ec2, grid, Marketing, Open Source, Partnering, software development, user interface, venture, Web 2.0 | 3 Comments »

How Does Virtualization Impact Hosting Providers? (A Secret Blueprint for Web Hosting World Domination)

Posted by Bob Warfield on August 16, 2007

I’ve written in the past about data centers growing ever larger and more complex in the era of SaaS and Web 2.0.  My friend Chris Cabrera, CEO of SaaS provider Xactly, recently commented along similar lines  when asked about the VMWare IPO. 

Now Isabel Wang who really understands the hosting world has written a great post on the impact of virtualization (in the wake of VMWare’s massive IPO) on the web hosting business.  I took away several interesting messages from Isabel’s post:

          Virtualization will be essential to the success of Hosters because it lets them offer their service more economically by upping server utilization.  It’s an open question whether those economies are passed to the customer or the bottom line.

          These technologies help address the performance and scalability issues that keep a lot of folks awake at night.  Amazon’s Bezos and Microsoft’s Ray Ozzie realize this, and that’s why they’re rushing full speed ahead into this market.  They’ve solved the problems for their organizations and see a great opportunity to help others and make money along the way.

          The market has moved on from crude partitioning techniques to much more sophisticated and flexible approaches.  Virtualization in data centers will be layered, and will involve physical server virtualization, utility computing fabric comprised of pools of servers across multiple facilities, applications frameworks such as Amazon Web Services, and Shared Services such as identity management.  This complexity tells us the virtualization wars are just beginning and VMWare isn’t even close to looking it all up, BTW.

          This can all be a little threatening to the established hosting vendors.  Much of their expertise is tied up in building racks of servers, keeping them cool, and hot swapping the things that break.  The new generation requires them to develop sophisticated software infrastructure which is not something they’ve been asked to do in that past.  It may wind up being something they don’t have the expertise to do either.  These are definitely the ingredients of paradigm shifts and disruptive technologies!

We’re talking about nothing less than utility computing here, folks.  It’s a radical step-up in the value hosting can offer, and it fits what customers really want to achieve.  Hosting customers want infinite variability, fine granularity of offering, and real-time load tracking without downtime like the big crash in San Fran that recently took out a bunch of Web 2.0 companies.  They want help creating this flexibility in their own applications.  They want billing that is cost effective and not monolithic.  Billing that lets them buy (sorry to use this here) On-demand.  After all, their own businesses are selling On-demand and they want to match expenses to revenue as closely as possible to create the hosting equivalent of just in time inventory. Call it just in time scaling or just in time MIPS.  Most of all, they want to focus their energies on their distinctive competencies and let the hoster solve these hard problems painlessly on their behalf.

When I read what folks like Amazon and the Microsofties have to say about it, I’m reminded of the Intel speeches of yore  that talked about how chip fabs would become so expensive to build that only a very few companies would have the luxury of owning them and Intel would be one of those companies.  Google, for example, spends $600 million on each data center.  Big companies love to use big infrastructure costs to create the walls around their very own gardens!  Why should the hosting world be any different?

The trouble is, the big guys also have a point.  To paraphrase a particular blog title, “Data centers are a pain in the SaaS”.  They are a pain in the Web 2.0 too.  Or, as Amazon.com Chief Technology Officer, Werner Vogels said, “Building data centers requires technologists and engineering staff to spend 70% of their efforts on undifferentiated heavy lifting.”

Does this mean the big guys like Amazon and Microsoft (and don’t forget others like Sun Grid) will use software layers atop their massive data centers to massively centralize and monopolize data centers?  Here’s where it gets interesting, and I see winning strategies for both the largest and smaller players.

First, the big players worry about how to beat each other, not the little guys.  Amazon knows Microsoft will come gunning for them, because they must.  Can Amazon really out innovate Microsoft at software?  Maybe.  The world needs an alternative to Microsoft anyway.  But the answer when competing against players like Microsoft and IBM has historically been to play the “Open System vs. Monolithic Proprietary System” card.  It has worked time and time again, even allowing the open system to beat better products (sorry Sun, the Apollo was better way back when!).

How does Amazon do this to win the massive data center wars?  It’s straightforward:  they place key components of Amazon Web Services into the Open Source community while keeping critical gate keeping functions closed and under their control.  This lets them “franchise” out AWS to other data centers.  If you are a web hoster and you can offer to resell capacity that is accessible with Amazon’s API’s, wouldn’t that be an attractive way to quit worrying so much about it?  Wouldn’t it make the Amazon API dramatically more attractive if you knew there would be other players supporting it? 

Amazon, meanwhile, takes a smaller piece of a bigger pie.  They charge their franchisees for the key pieces they hold onto to make the whole thing work.  Perhaps they keep the piece needed to provision a server and get back an IP and charge a small tax to bring a new server for EC2 or S3 online in another data center.  How about doing the load balancing and failover bits?  Wouldn’t you like it if you could buy capacity accessed through a common API that can fail over to any participating data center in the world?  How about being able to change your SaaS datacenter to take advantage of better pricing simply by reprovisioning any or all of the machines in your private cloud to move?  How about being able to tell your customers your SaaS or Web 2.0 offering is that much safer for them to choose because it is data center agnostic?

BTW, any of the big players could opt to play this trump card.  It just means getting out of the “I want to own the whole thing” game of chicken and taking that smaller piece of a bigger pie.  Would you buy infrastructure from Google or Yahoo if they offered such a deal?  Why not?  Whoever opens their system gains a big advantage over those who keep theirs monolithic.  It answers many of the objections raised in an O’Reilly post about what to do if Amazon decides to get out of the business or has a hiccup.

Second, doesn’t that still mean the smaller players of less than Amazon/Google/Microsoft stature are out in the cold?  Not yet.  Not if they act quickly, before the software layers needed to get to first base become too deep and there are too many who have adopted those layers.  What the smaller players need to do is immediately launch a collaborative Open Source project to develop Amazon-compatible API’s that anyone can deploy.  Open Source trumps Open System which trumps Closed Monoliths.  It leverages a larger community to act in their own enlightened self-interest to solve a problem no single one of these players can probably afford to solve on their own.  Moreover, this is the kind of problem the Uber Geeks love to work on, so you’ll get some volunteers.

Can it be done?  I haven’t looked at it in great detail, but the API’s look simple enough today that I will argue it is within the scope of a relatively near-term Open Source initiative.  This is especially true if a small consortium got together and started pushing.  One comment from that same O’Neil blog post said, “From an engineering standpoint, there’s not much magic involved in EC2.  Will you suffer for a while without the nifty management interface? Sure. Could you build your own using Ruby or PHP in a few days? Yep.”  I don’t know if it’s that easy, but it sure sounds doable.  By the way, the “nifty management interface” is another gatekeeper Amazon might hold on to and monetize.

But wait, won’t Amazon sue?  Perhaps.  Perhaps it tips their hands to Open Source it themselves.  Legal protection of API’s is hard.  The players could start from a different API and simple build a connector that lets their different API also work seamlessly with Amazon and arrive at the same endpoint—developers who write to that API can use Amazon or any other provider that supports the API.

You only need three services to get going:  EC2, S3, and a service Amazon should have provided that I will call the “Elastic Data Cloud”.  It offers mySQL without the pain of losing your data if the EC2 instance goes down.  By the way, this is also something a company bent on dominating virtualization or data center infrastructure could undertake, it is something a hardware vendor could build and sell to favor their hardware, and its something some other player could go after.  The mySQL service, for example, would make sense for mySQL themselves to build.  One can envision similar services and their associated machine images being a requirement after some point if you want to sell to SaaS and Web companies.  Big Enterprise might undertake to use this set of API’s and infrastructure to remainder unused capacity in their data centers (unlikely, they’re skittish), help them manage their data centers (yep, they need provisioning solutions), use outsourcers to get apps distributed and hardened for disaster recovery, and the like.

So there you have it, hosting providers, virtualizers, and software vendors:  a blueprint for world domination.  I hope you go for it. I’m building stuff I’d like to host on such a platform, and I’m sure others are too!

Note that the game is already afoot with Citrix having bought XenSource.  Why does this put things in play?  Because Amazon EC2 is built around Xen.  Hmmmm…

Some late breaking news: 

There’s been a lot of blogging lately over whether Yahoo’s support of Open Sourced Hadoop will help them close the gap against Google.  As ZDNet points out, “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.”  That’s basically my point on the Secret Blueprint for Web Hosting World Domination.

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

Posted in amazon, business, data center, ec2, grid, multicore, Open Source, Partnering, saas, venture, Web 2.0 | 8 Comments »