Leaderboards Are Not Best Seller Lists

Posted by Bob Warfield on October 10, 2007

I read with interest Tim O’Reilly comparing Blog Leaderboards to Best Seller Lists

While it is tempting to view the Leaderboards as Best Seller Lists, they’re really not.  Perhaps it would be closer to call them “Most Talked About Lists”, which I see more as what you find in the Supermarket Tabloids than the NY Times.  I don’t invoke the tabloids to be nasty, but to reflect the key differences.  In this case, gossip is cheap and the tabloids are cheaper than many hardcover best sellers.

What does that comparison have to do with leaderboards?  Because it takes a greater investment from readers before a book makes it onto the Best Seller List.  It takes longer to read a book, it costs money to buy a book, you may have to visit a bookstore, you deal with whether the book is displayed prominently there, and all the rest that as a publisher you know only too well.

In other words, there is more friction involved.  This is the promise and the curse of the Internet.  It radically reduces the friction.  But in so doing, it lowers the cost of making a bad choice, which makes it harder to discern bad choices from good. 

These friction forces are at work in creating the punctuated equilibrium I suggest drives interesting memes through the web.  My reaction to punctuated equilibrium is to shun the leaderboards and look more deeply to find the interesting stuff.  I’m adding some friction back into the process in order to cull the processed and repetitive sameness that happens if you only look at the pieces of the Internet Iceberg that are above the surface and readily visible.  The water level on that iceberg represents the Moore Chasm that ideas must cross to become mainstream, and it also demarks the long tail. 

Do you simply want to deal with mainstream ideas, or is the promise in the web that it gives access and leverage below the mainstream?

The Science of SPAM (Hint: It’s an Arms Race | And: A Better Mahalo)

Posted by Bob Warfield on October 10, 2007

I read an interesting article in Grey Hat SEOblog about some of the techniques SEO spammers use.  Before I go further, let me be absolutely crystal clear:  I. Hate. Spam!

But, it is interesting, and sometimes quite useful to know your foe.  So it is with this Grey Hat SEO article.  It answered a question I’d wondered about:  Why do spammers send email or contruct web pages that have seemingly random collections of words on them?  Yes, they may trick a search engine into going there, but it seems that real content would be more like to cause someone to actually stay there and do something monetizable.  It all seemed like a colossally misguided waste of effort that needlessly annoys people.  But there is a method to this madness, as it turns out.

Those machine-generated spammy messages are the equivalent of Star Wars Imperial Probe Droids.  They are performing reconaissance prior to sending in heavier forces.  Here is how it works.  They’re taking candidate lists of terms and combining them together in sophisticated ways and then checking how far up the search result page (what they call a “SERP”) that gets them.  When they reach the highest possible level using these techniques, they have identified a chink in the defenses that becomes the starting point for the next level of effort. 

That next level involves creating real content around those keyword combinations that will pass muster by humans and presumably be a little stickier for those that land there.  From that base, they then look to create as many linkages as possible to the page to get it even higher in the search results based on the PageRank algorithm.

Amazingly ingenious and methodical, isn’t it?  Who would have thunk. 

Now here is the next piece of the puzzle: it’s an arms race.  If you have a web site and monitor how many Google hits there are on say, your company name, you will notice there is a tremendous ebb and flow.  “SmoothSpan”, as I write this, fluctuates from about 5,000 hits all the way up to 20,000 hits.  My first reaction on seeing that was to wonder why people were adding and removing references on so many sites so often.  It didn’t make sense.  For a little while watching the daily behaviour I wondered whether it didn’t reflect a common trait of massively scaled sites like Google that wind up emphasizing availability over consistency.  Perhaps my search was being handled by some nodes that just didn’t have all the info and couldn’t return a full set of links.

Eventually I went out looking at some of these and discovered, low and behold, tons of machine-generated pages.  I now realize that many were spammer probe droids.  Others were odd artifacts of various web sites.  For example, lots of sites seem to create pages associated with each tag that have links relating to the tag. These pages change constantly depending on what you write about on your blog.  It’s hard for me to see finding them in Google as a very useful search result, but I presume reaching them as part of another application is viewed as a good thing.  It’s always intriguing to find parts of the machine exposed for general viewing.

The reason the hit counts fluctuate so much is that Google is constantly adding heuristics to try to eliminate these probe droid pages.  It’s literally an arms race.  We read recently about how they’re penalizing sites that sell links, for example.  If you wind up on that list, your search rank will be permanently lowered.  If that’s not open warfare, I don’t know what is!

FWIW, I continue to find that searching the blogosphere using Google Blog Search is a better starting point for most of my searches than searching the whole web.  Try it some time.  It’s an easy habit to pick up and it really works well.  I think of it as the poor man’s Mahalo.  Why would I need Mahalo to pre-process my search with humans when there are so many bloggers doing it already?

Perhaps this is the answer to Scoble’s lament that great content is now a commodity.  He also talks about how hard it is to get a lot of link juice in the blogosphere.  Scoble takes all this and translates it to boredom with blogging and the difficulties of getting ahead in the blog world.  But ask yourself how you would expect things to behave if the content in the blogosphere was of radically higher quality on average than the great unwashed web? 

I think it explains the situation pretty well.  Information friction is low, quality is very high.  I know from my own experience of searching the blogosphere first that I am much more productive.  So what does that say for link juice?  That it will be less a function of quality (which seems to be common in the blogosphere) and more a function of whether you’re talking about what people are interested in at the moment.  That’s why the services being discussed are creating such spikes–they are transient interest points.

Getting back to our original theme, it must be perplexing to the SEO world.  Eventually they will target the blog world, anyone who has a blog sees them trying, yet it is hard for various reasons.  They haven’t yet nailed the science of spam in this world.  I hope it takes a  long time yet before they do!

