Inktomi announced Tuesday that it has indexed one billion unique Web pages. But users shouldn't expect to have that many URLs at their fingertips when conducting searches. Why? Because Inktomi is weeding out a majority of pages, saying they will most likely not be relevant in most searches.
To most users seeking general-interest information, this won't matter much.
They'll still be able to find the Britney Spears fan Web site or flower delivery sites. But consumers seeking more obscure information that might not attract attention from most surfers, this means they may have better luck guessing the URL than relying on Inktomi's search database.
"I always get nervous when somebody else is telling me what's relevant to me," says Greg Notess, whose Search Engine Showdown site offers a quantitative analysis of search engine technologies. "It probably is relevant to 99 percent of the searchers, but there's always that one percent who are looking for information that isn't relevant to everyone else."
Inktomi announced its one-billion milestone Tuesday in an apparent attempt to counter criticism leveled six months ago that only about 15 percent of Web pages were being indexed by search sites. At NEC's Research Institute, research scientist Steve Lawrence estimates that there are at least 1.5 billion Web pages on the Internet now, as opposed to about 800 million in July.
Inktomi called upon the institute to help validate the company's latest work.
Lawrence, who was an author of the previous study, points out that Inktomi's large "Web map" gives it ample information about what's available on the Internet. That information, he says, helps the company make more informed decisions about relevancy.
"Inktomi may not be as good in some aspects as some of these other search engines because they're not making as much information available. But they do have advantages because they pulled so many more pages and have a better view of the web," Lawrence says. In addition, it's easier to update an index if it's smaller, he added.
In limiting its searchable index to about 170 million pages (110 million for North America), Inktomi says it is saving users time when they search. It analyzes text, links that connect to pages, and previous users' actions when they've gotten the same pages in their search results to determine how relevant a page will be for future searches, says Dennis McEvoy, senior vice president of Inktomi.
"The biggest catalog (index) does not equal the best search experience," says McEvoy. "Relevance matters much more because people don't want to page through thousands of results."
Apparently, some of the major search sites which pay to use Inktomi's index - including Yahoo, AOL, ExciteAtHome, Hotbot and Microsoft Network - don't mind.
"It's not necessarily about the volume indexed; it's about making sure that that volume contains the highest quality of content," says ExciteAtHome spokeswoman Kris Carpenter, who worked in the search division for three years before moving to the commerce unit. ExciteAtHome offers a searchable index of about 250 million web pages.
Some other sites and industry watchers don't agree with that premise, arguing that search sites can still offer relevant results without sacrificing breadth.
Limiting the searchable index means users may miss Web pages that might be relevant to them, says David Burns, president of U.S. operations for Fast Search & Transfer of Oslo, Norway. "It's a subtle form of search engine censorship," he says. "It's like the library. Why should I limit the search to only a portion of the library?"
Fast Search has reason to gripe. It lays claim to having the largest (and fastest) search engine on its site and doesn't want Inktomi's announcement to steal its thunder. Inktomi's announcement could lead people to believe it allows users to search one billion Web pages because it doesn't state that it winnows its searchable index down to about 170 million. Fast Search announced this week that it has crawled a total of 700 million Web pages to create its searchable index of 300 million URLs.
"Up [un]til now Inktomi had a monopoly in the private-label search market," says Burns. "As of yesterday, the game changed. Now there's somebody else out there who is not trying to be a portal."
Burns also complains that Inktomi wasn't refreshing its index often enough, which can result in old, outdated links. To ensure that its index includes the most recent URLs, Fast Search crawls an average of 40 million Web pages per day. McEvoy says he couldn't say how many pages Inktomi crawls per day, but asserts it refreshes its searchable index every three to four weeks and its entire Web page map every 90 days.
AltaVista, one of the top three search sites in terms of size - 270 million web pages, according to industry watchers - also argues that size matters. Search sites should not focus just on popular sites in determining relevance, says AltaVista spokeswoman Tracy Roberts. "We believe that we do index 90 percent of the Web sites out there," Roberts adds.