Google and other search engines face far more than just a new rival in Wikia, they face the prospect of hundreds, even thousands of new competitors.
The entire search engine project Wikia is working on will enter the open source domain, drastically reducing the cost for just about anyone to make a search engine, said Gil Penchina, CEO of Wikia. Instead of paying millions of dollars to index the Web, create the software to build a search page, a filter for empty or spam pages, and an algorithm to calculate and rank findings, new search companies will find these items free online thanks to the open source and free software communities.
"In search, it still costs about US$5 million to US$10 million to build a site," said Penchina during an interview in Taipei. "We want to make it possible for anyone to build a search site for US$500. We don't view Google as the competition, we view cost as the competition."
The project, which was started by Wikipedia co-founder Jimmy Wales, consists of four components, the indexing of the Web, developing a search engine application, an algorithm, and using people to help filter sites and rank results.
One of the most expensive components of a search engine is the effort needed to index the Web. Companies have to buy servers and software to crawl the Web looking at what's on every page, in order to create a comprehensive list of what's on the Web.
"Your average search start-up will spend over US$1 million buying servers and collecting data. That's bad for a couple of reasons. One is that everyone spends millions of dollars doing what is essentially the same work, which is like writing an encyclopedia all over again. Well, what if all of that data was available over the GNU Free Documentation License, which is the free content license? So our goal is to make a crawl of the Web publicly available," said Penchina.
The cost of indexing the Web is one of the main hurdles to starting a search engine, and for-profit companies have raised the bar year after year by indexing the Web more and more often. It used to be catalogued once a week, or once a day. Now it's once an hour, or even more often. The high cost of running these crawls has become a competitive weapon.
Wikia believes its crawl of the Web will cost nearly nothing, because it's asking Internet users to help out by downloading Web crawling software from Grub, which will use their computers during idle time to crawl the Web, and send results back to Wikia for the index. So far a thousand people have downloaded the application, and Penchina is hoping for 100,000 or more. The goal is to post the entire index online, as well as regular updates, so anyone can use them.
Asking the Internet community for help this way is reminiscent of the Search for Extraterrestrial Intelligence (SETI) project, which asks users to run a free application that downloads and analyzes radio telescope data and sends the results back to a computer operated by the SETI@home group.
Another essential search engine element is an application that provides a place to type in searches, a button to say 'go', and the ability to view results. Right now, Wikia is looking at using Lucene, an open source application. Wikia plans to either invest more into the Lucene project to make sure the software works well, or build its own software to serve the same purpose.
"That's sort of an ongoing fight right now, either building into Lucene or building our own application. We sort of have two projects going right now, we're sort of doing both," said Penchina.
Another key piece of search technology is an algorithm to determine search rankings, ensuring users find what they're looking for. Such algorithms are the secret recipes of search engine companies, never to be revealed for fear of abuse by hackers or other people. But at Wikia, the idea is to create an algorithm and post it on the Internet for free, so anyone can see how results are determined by the search engine. People who don't like the search results can offer Wikia tips, or create their own algorithms, download all of the free search engine material made available by Wikia, and start their own search company, Penchina said.
The collaboration part of the whole project is having users sort through and filter Web pages, as well as tweak search engine results by their own work. It adds a human touch to the search engine process, and Wikia's founders hope it will lead to better Internet searches.
Money may come from the Wikia search engine when it's finally done, but that's not the point of the project.
"We believe you can do good and make money. In search we haven't figured out how yet," said Penchina. But if you build something that millions of people use and find value in, somebody's going to pay you for something. I mean, maybe we'll get consulting fees, maybe we'll sell advertising, maybe we'll sell some premium software or something. I don't know. I don't care. And I'm not even really thinking about it because I've got to get a million people to use it first."
Read up on the latest ideas and technologies from companies that sell hardware, software and services. How to improve employee productivity in small and medium businesses
Email Archiving Implementation: Five Costly Mistakes to Avoid
The state of Middleware
Refresh your AUP: Top tips to ensure your acceptable use policy is fit for purpose
Making the Business Case for IT Consolidation
Best Practice in Building an Integrated Information Management Strategy
Taking On Demand CRM Integration to the Next Level
Delivering the Power of Choice with Microsoft Dynamics CRM
Zones provide focussed content from Computerworld and leading technology partners.Discover how SOA can create smarter outcomes for your business.
Attend and learn:
- How SOA is helping leading companies to become more agile
- Where you should be applying SOA processes in your company
- The top SOA implementation mistakes to avoid
Click here for more information.
- +
Computerworld Live Podcast #98: The Future of Datacentre IP 18/12/2008 10:33:00
CW Live speaks withLin Nease, Director of Emerging Business for HP ProCurve, to discuss the future of networks, including the effect of IP-based storage on datacentres, new capacity requirements generated by the use of 10Gb Ethernet, and how an efficient network design can slash energy and cooling costs, and help enterprises build a "green" image. - +
Computerworld Live Podcast #97: The Future of Enterprise Networking 25/07/2008 09:45:36
This week CW Live chats with Mark Thompson, global sales and marketing manager for HP ProCurve, on the future of the enterprise networking. Mark discusses the trends we can expect to see in the near future and how the right infrastructure can ensure your enterprise network is secure. - +
Computerworld Live Podcast #96: Security at the Edge 11/06/2008 09:22:22
CW Live speaks with Amol Mitra, HP ProCurve Director of Marketing for Asia Pacific and Japan. Today's topic: how enterprises are starting to shift away from simply controlling security via server logins, firewalls and moving to more adaptive security frameworks. - +
Data Management Edition #10: Multi-Petascale Systems 02/05/2008 09:12:33
This week we look at sustainability and the development of multicore technologies to build multi-petascale systems. - +
IT Security Edition #11: How to poison the Storm botnet 01/05/2008 08:51:55
This week CW Live presents a case study on how to poison the notorious Storm botnet . Plus we take a look at Cisco's plans for Ironport.
F-Secure Warns About a Worm Affecting Corporate Networks 2009-01-08 16:42:00+11
Research software developer appoints Susan Dart to new Business Development Director role 2009-01-08 09:08:00+11
Research software developer appoints Susan Dart to new Business Development Director role 2009-01-08 09:08:00+11
Anyware Introduce Two Powerful PCI TV Tuner Cards with S5 Power Up and Windows Media Center Remote 2009-01-07 17:30:00+11
Fortinet Cures Mobile Phone “Curse of Silence/CurseSMS” Attack 2009-01-07 16:30:00+11
Strategies for Eliminating .PST Files
Join industry expert Martin Tuip to discover best practice strategy for the archival and removal of .PST files using email archiving. Learn how to ensure long-term email records are there when needed, and reduce the risk to your business and clients.





