Daniel Brandt argues that Google is dying: its index is failing to keep up with the growth of the web. And he thinks he knows why—Google hit the 4,294,967,296 limit on 4-byte ID numbers in C. (Perhaps coincidentally, perhaps not, Google claims to index more than 4 billion web pages.) If this is true, fixing it isn't trivial when you need to fix a large number of machines that are working in parallel
On sites with more than a few thousand pages, Google is not indexing anywhere from ten percent to seventy percent of the pages it knows about. These pages show up in Google's main index as a listing of the URL, which means that the Googlebot is aware of the page. But they do not show up as an indexed page. When the page is listed but not indexed, the only way to find it in a search is if your search terms hit on words in the URL itself. Even if they do hit, these listed pages rank so poorly compared to indexed pages, that they are almost invisible. This is true even though the listed pages still retain their usual PageRank.
…this became a problem that I first noticed in April 2003. That was the month when Google underwent a massive upheaval, which I describe in my Google is broken essay. When that essay was written two months after the upheaval, it would have been speculative to claim that the listed URL phenomenon was a symptom of the 4-byte docID problem described in the essay. It was too soon. But sixteen months later, the URL listings are beginning to look very widespread and very suspicious. It's a major fault in Google's index, it is getting worse, and it is much more than a mere temporary glitch.
Google is dying. It broke sixteen months ago and hasn't been fixed. It looks to me as if pages that have been noted by the crawler cannot be indexed until some other indexed page gives up its docID number. Now that Google is a public company, stockholders and analysts should require that Google give a full accounting of their indexing problems, and what they are doing to fix the situation.
If it turns out that google is missing huge quantities of stuff there will be a lot of angry IPO buyers. And I will have to change my one-stop-search habits.