When you've got a big job - like searching the universe for signs of intelligent life - you need all the help you can get. That was the idea behind the May 1999 launch of SETI@home, an imaginative application of distributed computing that could have far-reaching implications - for business.
SETI@home (setiathome.ssl.berkeley.edu), a project supported by the nonprofit SETI Institute in California, and other groups, has harnessed the Internet - and people's imaginations - to organise almost two million volunteer PCs into a virtual massively parallel computer.
The task: analysing radio signals picked up by the Arecibo radio telescope in Puerto Rico - the one featured in the 1997 movie Contact. The goal: detecting the kind of deep-space radio signals that could indicate communication by other intelligence in the universe. The strategy: to use as many of the world's computers as possible together to accomplish the goal.
"The Internet lets us do that for thr first time in the history of computers," said David Anderson, the SETI team's distributed computing guru. "It lets us, in effect, make them into one big parallel supercomputer."
Moreover, the SETI@home software runs in the background or as a PC screen saver, so it doesn't interfere with users' normal computing tasks.
The search for extraterrestrial intelligence (SETI) may or may not find ET, but it has helped spur a change in thinking about the potential for distributed computing. Proponents say that linking computers through the Internet could enable long-term, computation-intensive tasks in aerodynamics, pharmacology, geophysics, biotechnology and manufacturing to be done in relatively little time.
Using the Internet as a massively parallel computer suddenly makes goals that were once tabled because they were deemed impractical possible, Anderson said.
"There may be some analysis you want to do, and you see it will take 100,000 years of computer time, so you would throw away that idea," he explained. But in one year, SETI@home has used more computer time than that. "So those ideas can be taken out of the wastebasket and reconsidered," he said.
Potential users include energy companies that need to do seismic or geographic analyses before they start drilling for oil or digging for coal, manufacturers that do structural analysis or study fluid dynamics before transforming a design from a computer model into the real equipment, and engineering firms that stress-test everything from bridges to aircraft.
The basic idea is simple, says Dave McNett: "It's all based on not wasting the resource - running distributed software on your machine and letting it use whatever resources you aren't using."
McNett is president of Distributed.net, a nonprofit research foundation based in Alabama, founded in 1997 to compete in an encryption-breaking contest. The group has grown to 20 developers and has rallied a 190,000-machine network (93 per cent are PCs) to break code and solve mathematical puzzles for fun and prizes.
These kinds of networks can accomplish a great deal, McNett said, because 90 per cent of most computers' processing power goes unused.
"During the day, most PCs spend most of their time flying tiny toasters around," he said. Even when computers are in use, most tasks aren't CPU-intensive. Working in a spreadsheet, for example, is CPU-intensive only when the columns are computed. "CPUs are used only in short bursts," McNett said. "And that's not even mentioning 6pm to 9am and weekends and holidays."
Massively parallel computing "does make sense for use in the oil industry, and we have used the technique [internally] for some of our computationally intensive problems," said John Old, director of information management for worldwide exploration and production at Texaco.
But distributed computing isn't for every job. "The SETI project lends itself to breaking the data into small, independent chunks, which makes the parallel computing fairly simple," Old explained. Unfortunately, not all data can be segmented that way, and many projects require complex communication among processors.
McNett acknowledged that there are plenty of things an IBM RS/6000 can do that a distributed network can't. "We can't do anything that's more data-intensive than CPU-intensive," he said. For example, weather prediction is difficult because the data is interrelated. Distributed computing is better at jobs such as animation rendering, in which each of the 30 frames per second that go into a movie like Toy Story are separate tasks that can be distributed among thousands of computers.
With those kinds of jobs in mind, the folks at Distributed.net are considering a commercial spin-off. At present, Distributed.net's machines are equivalent to 42 144-node RS/6000s, the fastest computers on the market, at a net cost of about $US120 million (based on the floating-point speed of the RS/6000 and the Pentium II/266 PC, the average computer on the distributed network). "We're proud of that," McNett said, "but the potential number of machines dwarfs what we have now."
If the SETI project rallied two million computers by word of mouth, imagine what a company that was willing to pay for your PC's time might accomplish. That's exactly what Jim Albea, chief operating officer at ProcessTree Network was thinking in January when he set up a Web site soliciting computers for the April launch of what he claims is the first commercial venture in the field (www.processtree.com).
Security's the party pooper
But despite the potential, there are problems that have to be solved before massively parallel Internet computing can work commercially, McNett said. The biggest hurdle is security. An oil exploration company considering the mineral rights to some land might gain a lot of efficiency by divvying up the analysis of the geologic data across the Internet. But what's to stop a competitor from setting up machines in the network and gleaning some insights from the data?
And what about would-be saboteurs in the network, bent on ruining a project for competitive or malicious reasons?
"There has to be a security model that is very easy, that doesn't allow a client machine to gain more insight than it should on the nature of a task and that can assure that no one client machine has enough grasp of the project that it can adversely affect the result," McNett said.
Another concern is that if people can modify the software's behaviour, they can affect the project's integrity. SETI@home ran into this problem when some volunteers tweaked the software to improve its speed. Despite the users' good intentions, SETI scientists had to throw out the resulting radio-wave analyses because they couldn't vouch for their accuracy.
Finally, McNett said, massively distributed computing calls for a business model that has yet to gel. "Are you going to send 18-cent cheques to 100,000 people every month?" he asked.
Albea said he thinks ProcessTree has solved most of the technical and business problems. For security, he plans to combine encryption with pieces of data so small that they would yield no useful information even if they were decoded. It may also randomly duplicate jobs and check for identical results. A discrepancy would indicate an error or sabotage.
Despite these precautions, Albea said security concerns will probably scare off some potential customers initially. He also noted that computer owners may have concerns of their own, but he pointed to SETI's ability to overcome user misgivings. "It gets down to trusting that we're a viable business with no interest in rifling their files," he said.
Meanwhile, even though ProcessTree hasn't yet set a pricing plan, CEO Steve Porter offers a ballpark figure of about $US1000 for the equivalent of a year's worth of CPU power from a Pentium II/400.
The company may pay in the range of $US10 to $US20 a month per computer - and even more for large-volume volunteers such as businesses. Payment will likely be in credits with an online retailer or service. For example, a participant might get discounts on his Internet service in exchange for running the software.
"They're not going to be able to retire on this," Albea said, "but it's a resource just doing nothing, and instead they can be getting credits."
Since its site started in January - with virtually no advertising - ProcessTree has lined up more than 35,000 users representing more than 70,000 machines. "We are the largest body of available commercial computing power in the world right now," Porter said. "You can't get anything that can go faster than we can, and we get faster every day."