Sun fixes server problem, but controversy persists

Sun Microsystems Inc. has fully resolved what for months were persistent problems caused by a defective memory component on its UltraSPARC II servers, said users who were affected by the troublesome glitch.

The problem, first reported by users in 1998, was publicly acknowledged by Sun in August last year and created havoc for exasperated users as Sun tried for months to find fixes. Now, postings on the subject on a popular Sun user forum have abated, with the last sporadic messages appearing in July. But Sun's handling of the issue, and CEO Scott McNealy's claim last week that defective IBM memory components were to blame for Sun's woes, are sore points for some.

"In my opinion, Sun is fully to blame," said a user who had been affected by the problem a manager at a large systems integrator who asked that his name not be used.

"It doesn't really matter who supplied the chips. Shouldn't the company who builds the server take responsibility for issues with it?" he asked in an e-mailed response to questions from Computerworld last week.

The defect was in an external memory cache on Sun's UltraSPARC II microprocessors. Under certain conditions, the problem triggered system failures and frequent reboots at dozens of customer locations worldwide. In an interview with Computerworld last week (see story below), McNealy said the problems stemmed from defective IBM static RAM (SRAM) chips that Sun used in its servers.

"They were the biggest source of the problem for us," McNealy said, stressing that Sun is no longer buying IBM SRAM. "We designed IBM out of that and put ECC [Error Checking and Correcting memory] across the entire cache architecture."

William O'Leary, director of communications at IBM's microelectronics division, refused to respond to McNealy's comments about IBM's culpability. But he denied that Sun no longer uses SRAM from IBM and insisted that Sun continues to be a "major and important" customer of IBM's high-performance SRAM technology.

Unforgiven

In any case, some Sun users are unforgiving.

"Sun is responsible" for the problem, said a project manager at a large European bank that was affected by it. The manager spoke on condition of anonymity. "Their architecture was fundamentally flawed because there was no ECC checking on the cache memory. This is something you get in even the lowliest Intel processor that costs a few dollars," he said in an e-mail message.

The bank's problems were finally resolved after a series of fixes that included moving servers to different environmental conditions, installing kernel patches and swapping out processors. "After two years, our environment is finally stable again," the manager said.

If the problems were caused by the IBM components, "I could see how Sun would have problems finding the error," conceded another user at a large consulting firm, whose clients include several major airlines. The company had to battle for more than a year to resolve its problems.

"I'm still not happy with how Sun handled this particular problem, but Sun has been a good reliable vendor for our company since," he said in an e-mail to Computerworld.

Join the newsletter!

Error: Please check your email address.

More about IBM AustraliaIntel

Show Comments

Market Place