The processors in today's computers have grown tremendously in performance, capabilities and complexity over the past decade. Clock speed has skyrocketed, and size has dwindled, even as the number of transistors packed on them has soared. A processor from 1983 made do with 30,000 transistors, while some current CPUs have upwards of 40 million transistors.
Any computer program consists of many instructions for operating on data. A processor executes the program through four operating stages: fetch, decode, execute and retire (or complete).
The fetch stage reads a program's instructions and any needed data into the processor.
The decode stage determines the purpose of the instruction and passes it to the appropriate hardware element.
The execution stage is where that hardware element, now freshly fed with an instruction and data, carries out the instruction. This might be an add, bit-shift, floating-point multiply or vector operation.
The retire stage takes the results of the execution stage and places them into other processor registers or the computer's main memory. For example, the result of an add operation might be stored in memory for later use.
An important part of a microprocessor is its built-in clock, which determines the maximum speed at which other units can operate and helps synchronize related operations. Clock speed is measured in megahertz and, increasingly, gigahertz. Today's fastest commercial processors operate at 2 GHz, or 2 billion clock cycles per second. Some hobbyists speed it up (a practice called overclocking) to get more performance. However, this raises the chip's operating temperature considerably, often causing early failure.
Parts Is Parts
Processor circuitry is organized into separate logic elements - perhaps a dozen or more - called execution units. The execution units work in concert to implement the four operating stages. The capabilities of the execution units often overlap among the processing stages. The following are some of the common processor execution units:
Not all CPU elements execute instructions. Considerable effort goes into ensuring that the processor gets its instructions and data as fast as possible. A fetch operation that accesses main memory (i.e., somewhere not on the CPU chip itself) will use many clock cycles while the processor does nothing (stalls). However, the BPU can do only so much, and eventually, more code or instructions must be fetched.
Another way to minimize stalls is to store frequently accessed code and data in an on-chip cache. The CPU can access code or data in the cache in one clock cycle. The primary on-chip cache (called Level 1, or L1) is typically only about 32KB and can hold only part of a program or data. The trick to cache design is finding an algorithm that gets key information into L1 cache when it's needed. This is so important to performance that more than half of a processor's transistors may be used for a large on-chip cache.
However, multitasking operating systems and a bevy of concurrent applications can overwhelm even a well-designed L1 cache. To address this problem, vendors several years ago added a high-speed dedicated bus interface that the processor could use to access a secondary Level 2 cache (L2) at a very high speed, typically half or one-third of the processor's clock rate. Today's newest processors, the Pentium 4 and PowerPC 7450, go further and place the L2 cache on the CPU chip itself, providing high-speed support for a tertiary Level 3 external cache. In the future, chip vendors may even integrate an on-CPU memory controller to speed things up even more.