Multithreading weaves its way into nets

Network systems increasingly need to be application-aware to control access, allocate resources and prioritize traffic. Maintaining stateful packet flow information at gigabit/second line speeds requires a rate of random memory access that is beyond the capability of today's traditional processors. And application-specific integrated circuits (ASICs), while fast, can't keep pace with constant changes in network protocols and applications.

A new architectural approach for application-aware networks has demonstrated tangible benefits: massive multithreading (MMT). Understanding this technology is key to evaluating the next wave of network infrastructure.

In the current generation of MMT processors, software threads typically correspond one-to-one to hardware threads, or streams. Threads are often organized into clusters, or tribes, to optimize resource utilization, and multiple tribes can be implemented in the same chip. Each tribe has access to its local external dynamic RAM (DRAM), as well as to a shared internal memory. The term pipeline (or core) refers to the physical circuitry that executes software instructions.

Networking differs fundamentally from desktop computing because processing stateful packet flows requires frequent access to data with low locality. Locality involves the likelihood of having the required data or instruction available in the processor's current memory location. Because packets in a stateful flow arrive at random intervals, networking equipment benefits little from PC-oriented multi-processors that depend on a high degree of locality for better performance. Low locality results in a high rate of requests for data that is not in cache, which increases latency beyond acceptable limits.

MMT maximizes memory throughput by letting a greater number of memory requests to be active simultaneously. Because of this, MMT is able to perform sophisticated protocol processing in software at throughput levels that previously required one or more dedicated ASICs. This optimization of RAM access also enables MMT to overcome the stateful packet throughput limitations of traditional multiprocessors.

Each memory operation introduces processing latency. To maintain low latency and high throughput in the face of demanding memory access requirements, network-oriented multiprocessing architectures need to support a very high number of simultaneous threads and execution pipelines, each with its own dedicated processing resources. By pushing packets in parallel through 100 or more threads, deep packet inspection can be sustained at 10Gbps data rates with a latency of less than 1 millisec -- an impossible task for two (or even two dozen) threads operating at today's high-end clock rate of 4 GHz. This allows MMT to accommodate VoIP and other delay-sensitive applications on high-speed backbones.

As bandwidth continues to increase, an even greater number of simultaneous threads may become necessary. Initially, the demand will be met with higher thread counts implemented in tribes of multiple streams served by separate cores. Such advances are occurring already for network-access control and identity-based network applications in LANs, where 128 threads is the state of the art. Over time, advances in technology will permit higher levels of protocol processing parallelism with a greater number of streams and execution pipelines.

Mario Nemirovsky is chief scientist for ConSentry Networks.

Join the newsletter!

Error: Please check your email address.

More about SpeedTribe

Show Comments