Please wait while the page is being loaded Skip this advertisement >
Friday | 5 December, 2008
Hard cores

STM -- "a really hot research topic these days" -- may someday be implemented in a combination of hardware and software, says Larus. In the meantime, programmers will have to use fine-grained locking -- in which individual rows or elements of a table are locked, rather than the whole table -- to ensure correct synchronization in parallel programs. The more parallel threads there are, the more difficult that becomes.

Microsoft products won't require significant changes to scale from two processors (or processor cores) to four or eight processors, other than perhaps some performance tuning, according to Larus. "But when you start getting to bigger-scale machines, the question becomes, 'What are the bottlenecks?'," he says. "If you have more processors, you have to have increasingly fine-grained locking."

Germany-based MainConcept develops software for encoding and decoding signals such as high-definition video, and it is an accomplished practitioner of such fine-grained locking. Video processing is a computational challenge; high-definition movies have to be processed in real time, and each frame takes up 1MB of memory, with each slice of the frame requiring extensive mathematical manipulation.

MainConcept has tuned its software to run on systems with dual-core chips, with the cores working on frame slices in parallel. The company has seen performance improve by a factor of 1.8, says MainConcept CEO Markus Moenig. He says performance has improved by another factor of 1.8 through the use of two dual-core processors. Such near-linear speedups are very close to ideal, with any gain over 1.5 considered good.

The software uses and searches "huge areas of memory", Moenig says. If the software is carefully constructed, it can use on-chip cache memory for much of its work, speeding processing. MainConcept uses performance-tuning tools from Intel to tune the software to the hardware architecture. Intel's VTune Performance Analyzer helps optimize the code, and its Thread Profiler and Thread Checker help balance the work of multiple threads and identify bottlenecks in multithreaded codes.

But Moenig worries that he won't be able to boost performance linearly as the number of processor cores increases. "We don't expect this for eight-core, 16-core and beyond," he says. "The faster and the more cores there are, the more the memory access is the bottleneck."

Code writers trying to exploit multiple processors or processor cores face three challenges, says James Reinders, director of business development for Intel's software development products. The first is scalability -- how to keep each additional processor busy. A threefold performance boost on a four-processor system is "darn good," he says; anything more is "exceptional."

The second challenge is "correctness" -- how to avoid race conditions, deadlocks and other bugs characteristic of multiprocessor applications. Intel's Thread Checker can find threads that share memory but do not synchronize, which, he says, "almost always [indicates] a bug".

The third challenge is "ease of programming", Reinders says, modern compilers can help by finding and exploiting opportunities for parallel processing in source code. The programmer can help the compiler by including "a few little hints" in the code, he says.

These "hints" are available in a new standard called OpenMP, specifications for compiler directives, library routines and environment variables that can be used to specify parallelism in Fortran, C and C++ programs. "The alternative to using these extensions is to do threading by hand, and that takes some clarity of thought," Reinders says. "So OpenMP can be tremendously helpful."

Kennedy agrees. "My philosophy is the programmer should write the program in the style that's most natural, and the compiler should recognize the properties of the chip that have to be exploited to get reasonable performance," he says.

Tom Halfhill, an analyst for In-Stat's Microprocessor Report in San Jose, says some software developers are "tearing their hair out" over the new CMP systems. "Rewriting the software for multithreading is a lot of work, and it introduces new bugs, new complexities, and the software gets bigger, so there is some resistance to it."

Computerworld Buyer's Guide - Vendors Matched to this Article
Computerworld Buyer's Guide - Vendors Matched to this Article
Additional Resources
Executive Guides
Whitepapers
Zones
Zone logoZones provide focussed content from Computerworld and leading technology partners.
Newsletter Subscription
Sign up for our Computerworld newsletters!
RSS Feeds
Market Place

 

Smart SOA World Tour

Discover how SOA can create smarter outcomes for your business.

Attend and learn:

  • How SOA is helping leading companies to become more agile
  • Where you should be applying SOA processes in your company
  • The top SOA implementation mistakes to avoid

Click here for more information.
Whitepaper

Radicati Market Quadrant 2008 on Corporate Web Security

An Analysis of the Market for Corporate Web Security Solutions, revealing Top Players, Mature Players, Specialists and Trail Blazers. Read on to discover who makes the grade.

Enterprise IT Buyer's Guide
Find Technology Vendors Fast
 
Find vendors by name | Find by category
Sponsored Links