Putting two or more processor cores on a single silicon chip has been one of the most important milestones in computing in recent years. It allows users to continue to reap the benefits of Moore's Law while sidestepping the extreme difficulty of manufacturing, powering and cooling single microprocessors beyond 4 GHz. Chip multiprocessors (CMP) also offer the opportunity to significantly boost the performance of applications that are able to share them.
But the benefits of parallel processing don't come easily. Programmers have to behave differently, as do compilers, languages and operating systems. If application software is to reap the benefits of CMPs, new skills, techniques and tools for designing, coding and debugging will be needed. Fortunately, both hardware and software vendors are developing tools and methods to make the job easier.
"Multicore chips are going to be a challenge for software developers and compiler writers," says Ken Kennedy, a computer science professor at Rice University in Houston who specializes in software for parallel processing. "If you look at chip makers' roadmaps, they are doubling cores every couple of years, sort of on a Moore's Law basis, and I'm worried we are not going to be able to keep up."
Desktop applications that traditionally have been written for one processor will increasingly be written to exploit the concurrency available in CMPs. Meanwhile, server applications that have for years been able to use multiple processors will be able to distribute their workloads more flexibly and efficiently. Virtualization, another important trend in computing today, will be made easier by CMPs as well.
Keeping up with CMPs is the focus of intense activity at a number of companies, including Microsoft Corp.Researchers there who are developing CMP tools are focusing on two broad areas: how to find errors in code written for multiple processors, and how to make it easier to write reliable software in the first place.
"A lot of the techniques we have used with sequential code don't work as well, or at all, with parallel programs," says Jim Larus, manager of programming languages and tools at Microsoft Research. "In testing, you typically run your program with a lot of data, but with parallel programs, you could run your program 1000 times with the same data and get the right answer, but on the 1001st time, an error manifests itself."
This ugly trait results from "race" conditions in parallel code, in which one process is expected to finish in time to supply a result to another process -- and usually does. But because of some anomaly such as an operating system interrupt, occasionally it does not. Such bugs can be extremely hard to find because they are not readily reproducible.
The tools Larus' group is developing allow more controlled testing so a programmer can, for example, vary the timing of two threads to check for race errors. The tools will eventually be offered commercially as part of Visual Studio, Larus says, "but we have a long way to go".
Microsoft Research is also trying the KISS -- or "keep it strictly sequential" -- model. KISS transforms a concurrent program into a sequential one that simulates the execution of the concurrent program. The sequential program can then be analyzed and partially debugged by a conventional tool that only needs to understand the semantics of sequential execution.
Microsoft and others are also working on a new programming model called software transactional memory, or STM. It's a way to correctly synchronize memory operations without locking -- the traditional way to avoid timing errors -- so that problems such as deadlocking are avoided. STM treats a memory access as part of a transaction, and if a timing conflict occurs with some other operation, the transaction is simply rolled back and tried again later, similar to the way today's database systems work.
"The idea is that the programmer, instead of specifying at a very low level how to do this synchronization, basically says, 'All the code between this point in the program and this other point, I want to behave as if it were the only thing accessing data at this time. System, go make that happen,'" says Larus.