Genii Weblog

Lies and double-speak - Google and the trillion pages

Mon 28 Jul 2008, 10:40 AM

by Ben Langhinrichs
It is interesting to see how good companies have gotten at "spin", learning perhaps from the politicians.  Google announced that they had indexed one trillion web pages.  Or did they?  If you believe Google's Index Reaches a Trillion URLs or any of a number of similar stories, you'll think they do.  Even if you see Google's official blog, there is a post titled "We knew the web was big..." which seems to imply this, with the cleverly worded quote:
The first Google index in 1998 already had 26 million pages, and by 2000 the Google index reached the one billion mark. ,,, Recently, even our search engineers stopped in awe about just how big the web is these days -- when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!
But look at that carefully again.  They are talking apples and oranges.  The first two statistics talk about how many pages are in their index.  The latter quote is about how many unique URLs are on the web.  Later in the article, they even 'fess up, once they are comfortable that most people won't keep reading:
We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers.
Well, you might ask yourself, what difference does that make?  The clue is found in the CNN story, Ex-Googlers launch rival search engine, which talks about Cuil (pronounced "cool"), which is a new search engine which boasts that its index "spans 120 billion Web pages".

So, which index is bigger?  The court of public opinion will now say "Google has a trillion pages, while Cull only has a tenth of that", but we have absolutely no way of knowing.  Cull may well have more, which its owners imply, but they are constrained by confidentiality agreements to not say what they know, and even they don't know for certain since they left Google a while ago.  But Google has managed to start a meme that will be hard to beat.

OK, maybe they didn't learn from politicians.  Perhaps they learned from the "seat wars" between Microsoft and IBM.  

Copyright 2008 Genii Software Ltd.

What has been said:

No documents found