Why Don’t Search Startups Share Data, Part 2

August 2007
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Posted by Bob Warfield on August 22, 2007

I mentioned in an earlier post that search startups ought to look into a divide and conquer approach when crawling the web. After all, one of the biggest complaints about a lot of interesting search services is they don’t find as much as Google does. TechCrunch, for example, complains that Microsoft’s new Tafiti produces search results that are “not as relevant as Google or Yahoo“. And yet, they also admit Tafiti is beautiful (as an aside, it is very cool and worth a look to see what Microsoft’s Flex killer, Silverlight, can do for a web site). If the Alt search sites band together to do the basic crawling and crunching using Google’s MapReduce-style algorithms (possible based on the Open Sourced Hadoop Yahoo is pushing), they could share one of the bigger costs of being in business and ameliorate the huge advantage in reach that the biggest players have over them.

ZDNet bloggers Dan Farber and Larry Dignan ask whether Open Sourced Hadoop can give Yahoo the leverage it needs to close the gap with Google. Their first words are that “Open source is always friend to the No. 2 player in a market and always the enemy of the top dog.” I don’t think Hadoop by itself is enough, but if Yahoo were to create a collaborative search service, maybe it would be. In fact, what if search was much more like Facebook only more open (Hey, if Scoble can do it with a hotel, I can do it with a search engine!)? In a manner similar to my “Web Hosting Plan for World Domination“, Yahoo could undertake a plan for “Search Engine World Domination”. Here’s how it would work:

– Yahoo builds up the Hadoop Open Source infrastructure for Web Crawling. Alt Search engines can tie back into that to get the raw data and avoid doing their own crawling. Even GigaOm says “The biggest hindrance to any search start-up taking on Google (or Microsoft, Ask or Yahoo for that matter) is the high cost of infrastructure.” Let’s share those costs and further defray them by having a big player like Yahoo help out.

– Yahoo can also offer up the Hadoop scaffolding to do any massively parallel processing these Alt Search Engines need to compute their indices. Think of it as being like Amazon’s EC2 and S3, but purpose-built to simplify search engines. People are already asking Amazon for Search Engine AMI’s, so there is clearly interest.

– Now here is there Facebook piece of the puzzle: Yahoo needs to turn this whole infrastructure play into a Social Networking play. That means they offer Search Widgits to any Social Network that wants them, and they let you personalize your own search experience by collecting the widgits you like. Most importantly, Yahoo creates basic widgits that reflect their current search offering, but they allow the Alt Search Engines to make widgits that package their search functionality. Take a look at Tafiti and see how it let’s you select different “views”. Those views are widgits!

– Yahoo gets a big new channel for its ads, and it gracioulsy shares the revenues with the Widgit builders because that’s what makes the world go round. Perhaps they even have virtual dollars that can be used to pay for the infrastructure using ad revenue, although I personally think they should give away as much infrastructure as possible to attract the Alt Search crowd to their platform.

Don Dodge, meanwhile, is wondering what the exit strategy is for the almost 1,000 startups out there trying to peddle alternative search engines. It sure seems to me that creating this search widgit social network world solves a big problem for Yahoo and at the same time creates a lot of new opportunity for the exit strategy of these engines. Suddenly, they have access to large volumes of data they couldn’t afford and a distribution channel in which to build an audience.

Open Source Swarm Competition in the Search Engine Space is Born!

Submit to Digg | Submit to Del.icio.us | Submit to StumbleUpon

This entry was posted on August 22, 2007 at 6:05 pm and is filed under amazon, business, ec2, grid, Marketing, Open Source, Partnering, software development, user interface, venture, Web 2.0. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses to “Why Don’t Search Startups Share Data, Part 2”

The Software Abstractions Blog said

September 2, 2007 at 2:56 pm
Reactions: The Future of Alternative Search Engines

I recently wrote an article on this blog about the exit strategies for alternative search engines, that highlighted the recent and growing trend of publishers acquiring search engines; I also speculated about Charles Knight’s quest to get these Alts to

Reply
Google Patents Search Engine Partnerships (and My Idea, Doh!) « SmoothSpan Blog said

October 17, 2007 at 9:07 pm
[…] on October 17th, 2007 Google has patented the idea of search add-ons, which is eerily similar to an idea I was touting as a way for Yahoo to catch up. My thought was to have Yahoo partner with the 1000 or so […]

Reply
Alt Search Engines » Blog Archive » The Future of Alternative Search Engines said

November 29, 2007 at 7:02 pm
[…] In essence, he argues that the Alts could get together to share the costs and burdens of web crawling and the underlying infrastructure. This would help them to reduce the gap with the big players, which have a huge advantage in terms of resources. You can find his post here. […]

Reply

	Camels to Cars, Arti… on A Picture of the Multicore Cri…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	LinkedIn shuts down… on Get Ready to Give Up on Linked…
	Start With an Audien… on The Very First Thing a Foundin…
	Breaking through the… on Reflections on Six Years of Co…

SmoothSpan Blog

For Executives, Entrepreneurs, and other Digerati who need to know about SaaS and Web 2.0.

Blog Tools

Archives

Recent Comments

Pages

Top Posts

Recent Posts

Meta