DEFINITION: Cache memory is high-speed random access memory used by a computer processor for temporary storage of information. It boosts performance by keeping the most frequently used data and instructions physically close to the processor, where they can be retrieved more quickly.
Cache memory is all about speed and efficiency. It's a clever technique to help a computer processor work more smoothly.
You can think of memory as being organised a little like your office. Small amounts of frequently used information, such as the departmental phone list, are put on the bulletin board above your desk. Similarly, you keep information on your current projects close at hand. Less frequently used information, say the city phone directory, sits on the bookshelf next to your desk. Rarely used information is in a filing cabinet.
Computers store data using a similar hierarchy. When applications start, data and instructions are moved from the slow hard disk into main memory (dynamic RAM, or DRAM), where the CPU can get them more quickly. DRAM acts as a cache for the disk.
Levels upon levels
Although DRAM is faster than the disk, it's still pokey. So data that's needed more often is moved up to the next faster memory, called the Level 2 (L2) cache. This may be located on a separate, high-speed static RAM chip next to the CPU, but new CPUs usually incorporate the L2 cache directly on the processor chip.
At the highest level, the most frequently used information - say, the instructions in a loop which execute repeatedly - is stored directly on a special section of the processor chip, called Level 1 (L1) cache. This is the fastest memory of all.
Intel's Pentium III processor has 32Kbytes of L1 cache on the processor chip and either 256Kbytes of L2 on-chip or 512Kbytes of L2 off-chip. The L2 cache on the CPU chip can be accessed four times faster than if it were on a separate chip.
When the processor needs to execute an instruction, it looks first in its own data registers. If the needed data isn't there, it goes to the L1 cache and then to the L2 cache. If the data isn't in any cache, the CPU calls out to the main RAM. It might not even be there, in which case the system has to retrieve it from the disk.
When the CPU finds data in one of its cache locations, it's called a 'hit'; failure to find it is a 'miss'. Every miss introduces a delay, or latency, as the processor tries a slower level. In a well-designed system with software algorithms that prefetch data before it's requested, the hit rate can reach 90 per cent.
For high-end processors, it can take from one to three clock cycles to fetch information from L1, while the CPU waits and does nothing. It takes six to 12 cycles to get data from an L2 on the processor chip, and dozens or even hundreds of cycles for off-CPU L2.
Caches are more important in servers than in desktop PCs because servers have so much traffic between processor and memory generated by client transactions. Intel turned a 50MHz, 80486-based PC into a server in 1991 by adding a 50MHz cache to the processor chip. Although the bus connecting processor and memory ran only at 25MHz, this cache let many programs run entirely within the 486 chip at 50MHz.
This hierarchical arrangement of memory helps bridge a widening gap between processor speeds, which are increasing at roughly 50 per cent a year, and DRAM access rates, which are climbing at the rate of only 5 per cent a year.
As this performance mismatch grows, hardware makers will add a third and possibly fourth level of cache memory, said John Shen, a professor of electrical and computer engineering at Carnegie Mellon University.
Indeed, later this year, Intel will introduce Level 3 (L3) cache in its 64-bit server processors, called Itanium. The 2Mbyte or 4Mbytes cache will connect to the processor over a bus that runs as fast as the processor - 800MHz.
IBM is also developing its own L3 cache for 32- and 64-bit Intel-based Netfinity servers.
At first, it will be placed on the memory controller chip and will be available toward the end of next year, said Tom Bradicich, director of Netfinity architecture and technology.
IBM's L3 will be a system-level cache available to the server's four to 16 processors. Intel's L3 can help only the processor to which it's attached, but IBM says its L3 can improve throughput for the whole system.
Bradicich said IBM's L3 will also aid high-availability computing for e-commerce by enabling main memory swap-outs and upgrades as the system is running.
Bigger isn't necessarily better
The frequency of cache misses can be reduced by making caches bigger. But big caches draw a lot of power, generate a lot of heat and reduce the yield of good chips in manufacturing, Shen said. One way around these difficulties may be to move the cache-management logic from hardware to software.
"The compiler could potentially analyse program behaviour and generate instructions to move data up and down the memory hierarchy," Shen said.
Software-managed caches are currently confined to research labs. Potential obstacles include the need to rewrite compilers and re-compile legacy code for every new CPU generation, Shen said