Microprocessors giants such as Intel and Advanced Micro Devices (AMD) have been performing a risky technological high-wire act ever since their chips - some as fast as 1GHz - began exceeding the system bus speed, according to several industry experts.
The problem involves chip testing. Because internal processor speeds now outrun the surrounding system bus by such a huge margin, some experts say it is practically impossible to thoroughly test the logic within a high-speed processor without incurring costs and delays that would put the fully tested chip out of reach. Maximum bus speeds range from 133MHz to 200MHz.
"We've hit the speed wall. You could crank out a 2GHz chip today but you couldn't tell if it was any good," said Joe Jones, the CEO of Bridgepoint, a semiconductor testing company with clients such as Texas Instruments and Philips Electronics. "We've painted ourselves into a corner."
Jones warns that if a new chip architecture that allows for more thorough internal testing is not developed, more recalls of high-performance chips can be expected. And, he added, a huge increase in network and Internet communication errors will start to occur as even minor miscalculations multiply.
The SIA (Semiconductor Industry Association) agrees, and has identified a built-in self test for processors as one of five objectives the processor industry must resolve if it is to continue to follow Moore's Law of ever-increasing clock speeds, an industry precept.
Achieving the SIA's objective, however, will require a radical and expensive redesign that chip makers are not prepared to shoulder, Jones said.
"It will take a serious amount of resources to see any change," Jones said.
Currently, high-speed chips such as Intel's Pentium III and AMD's Athlon processor are tested by submitting the chips to a series of known algorithms that yield predictable results. Jones said this "black box" approach does not reveal everything that's happening in the chip, but just confirms a narrow set of tests based on prior knowledge.
"We are at the breaking point of the current algorithms," he said. "If you introduce the least amount of unreliability, you wind up shipping computers that are grossly unreliable."
Dave Ranhoff, COO of Credence Systems, a test equipment maker, concurs. "Suppliers will hit a wall trying to meet next-generation device test pressures."
Officials for Intel and AMD each responded by saying they were confident of their testing procedures.
Yet Nathan Brookwood, an analyst at Insight 64, said that Intel and AMD science have indeed pushed the limits of physics.
"The fact that the engineering and testers do as well as they do is really quite awesome."
But "The industry tends to make evolutionary changes, not revolutionary changes. And the changes they make to accommodate this extra complexity will not dramatically change things, but incrementally change things," Brookwood said.
In fairness, Jones credits the technological savvy of Intel and AMD engineers for being able to crank up the speed of their processors to the rate they have, and compares them with the engineers who developed supersonic jets in the pre-transistor age.
"The market has done the right thing. It has the most performance with the lowest cost, but the time is now for a change," Jones said.
Jones said that at ITC 2000, an international testing conference in the US recently, "the talk was all about high-speed testing and how to fix it.
"There was no clear solution - the entire theme of the conference was finding forward solutions for high-speed tests, but the people in the hallways were shaking their heads asking, how the heck are we going to do this?"
Jones believes a software approach to a high-speed processor self-test has the best chance for success. By adding additional intelligence to high-speed processors at the software level, precise testing may be feasible, he said.
Two specific problems plague the hardware components in high-speed processors, according to Jones. The first he called a "race condition", where data moving at vastly different speeds within the processor fails to accurately synchronise, causing a miscalculation. This condition can cause "addition to get there faster than multiplication, for example, mixing up the logic and yielding a bad result", Jones said.
Second, "there are chip arbiters that decide when and where things occur on a branch instruction inside the processor, but when you speed them up, things that occur in a certain order in the slower surrounding architecture can wind up being in reverse order, also bringing a bad result," Jones said.
Using a group of slower processors in parallel processing could resolve the problem, but if this fix was applied to PCs, Jones said "you would then be facing super computing problems with PCs."
"Servers went multiprocessor a few years ago to avoid the problems PCs are hitting today," Jones said.