IETF speeds transfer of huge files

After five years of research and development, the Internet Engineering Task Force has completed a framework to support the distribution of massive files to multitudes of users.

The framework, designed to make one-to-many bulk data communications on the Internet or private IP networks practical, already is supported in products that Siebel Systems Inc. and PeopleSoft Inc. use to distribute gigabytes of software code between far-flung development shops. Other potential enterprise uses of the framework include oil and gas companies distributing seismic maps to remote locations, global construction companies sharing technical documents among subcontractors and IT shops backing up distributed databases for disaster-recovery purposes.

Network vendors that helped develop the framework within the IETF's Reliable Multicast Transport working group include Cisco Systems Inc., Microsoft Corp., Motorola Inc. and startup Digital Fountain, which started shipping its products earlier this year.

"If you want to distribute large files to multiple recipients, you have to think about how you're going to do that and how you're going to spare the Internet and how you're going to deal with heterogeneous receivers,'' says Allison Mankin, a director of the IETF's Transport Area who oversaw development of this framework. "This approach has a lot of promise.''The IETF's new framework targets a real-world problem for corporations, says Lucinda Borovick, director of data center networks with IDC.

"You have your data center and distributed sites, and you need to find a way to get large files and applications from headquarters to remote locations,'' Borovick says. "Most IT managers just increase their network bandwidth, and hopefully that's not too cost-prohibitive. But the files are getting so big that increasing bandwidth doesn't work.''Borovick says IT managers can use data compression or bandwidth-management techniques to address this problem. However, she is optimistic about the multicast approach taken in products from Digital Fountain and its reseller, Cisco.

"There's a clear need for this technology,'' Borovick says.

Digital Fountain is the leader

Digital Fountain, a leader in developing products to support the IETF's Reliable Multicast Transport framework, has several customers, including Seibel and PeopleSoft. These companies use Digital Fountain's specialized servers to ensure timely and predictable delivery of software builds to development teams in Europe, Asia and South America.

PeopleSoft sends between 60 and 70 software builds per week to its 14 offices worldwide, and those files range in size from 50M to 400M bytes, says Graham Begg, manager of environment global operations. These files are sent over the company's private frame relay network, which has connections ranging from 256K to 2M bit/sec.

PeopleSoft wanted an automatic system for sending these files in the most efficient way over low-bandwidth connections. More important than speeding up the delivery time for these files was guaranteed delivery within a specified time frame regardless of the network going down or being congested.

"It's the reliability and the predictability that we were looking for,'' Begg says. "We need to know that on a particular day at 9 a.m. local time the file will be available.''So PeopleSoft purchased Digital Fountain's Transporter Fountain, which has operated in production mode for four months. So far, the Transporter Fountain has successfully delivered all of PeopleSoft's files within one hour and 45 minutes as programmed.

Begg says the Transporter Fountain cost about $50,000 and required no more than 20 manhours to set up.

"The return on investment will come when we experience some catastrophic event on the network that would have prevented a file [from] getting where it needs to go,'' Begg says. "The ROI comes when we don't lose development time.''Begg set up the Transporter Fountain in unicast or point-to-point mode, but plans to migrate to multicast mode soon.

"That will further reduce our hit on the network and improve our WAN performance,'' Begg says. "That's when we'll get the real benefit.''Luby's encoding techniqueIn addition to supporting the IETF's Reliable Multicast Transport work, Digital Fountain's servers also boast a patented mathematical encoding technology developed by the company's co-founder and CTO, Michael Luby. This encoding technique breaks up large data files for delivery across a network.

In traditional data delivery, a large file is chopped up into small pieces and sent over a WAN with an acknowledgement of delivery sent back for each piece. All the pieces have to get to their destination in the right order for the data to be delivered. If the transmission is interrupted, the whole process starts over.

Luby's algorithm turns data into a series of equation-like packets called MetaContent. Each packet has mathematically encoded information about the entire original file. So it doesn't matter which Meta-Content packets arrive or in what order, as long as enough of them are received. What constitutes "enough" varies by application. If the network connection is interrupted, the transmission of the Meta-Content packets will start from where it left off.

"Meta-Content packets are like little soldiers,'' Luby explains. "It doesn't matter which ones get through as long as enough get through.''Luby's innovation gets high marks from IETF officials who have worked with him on the Reliable Multicast Transport framework.

"Digital Fountain is really the creator of this class of multicast applications,'' Mankin says. "Luby has created a very successful piece of mathematics that's proprietary to Digital Fountain. . . . But he's also done well with the protocol work.''Luby and Digital Fountain have been involved in the IETF's Reliable Multicast Transport work from the beginning.

The focus of the IETF's Reliable Multicast Transport working group was to prevent congestion problems from the use of multicast communications for large file distribution. Multicast employs the User Datagram Protocol (UDP) running on IP, rather than the more common TCP running on IP. Unlike TCP, UDP does not have built-in congestion control.

"You can't really do multicast with TCP because [multicast requires] two peers talking to each other,'' Mankin explains. "You have to find a way to do congestion control with something other than TCP. The job of the Reliable Multicast Transport working group was to find acceptable congestion control for these applications.''IETF seeks requests for commentsThe IETF on Aug. 27 approved the first two documents from the Reliable Multicast Transport working group as experimental requests for comments - an interim phase before full standardization.

The documents together make up a flexible framework for the distribution of large, static files across the Internet. The Layered Coding Transport (LCT) document describes how the sender can split a large file over multiple multicast channels to speed up delivery and avoid network congestion. The Asynchronous Layered Coding document describes how to use any forward error-correction encoding technique - such as Luby's Meta-Content packets - over the LCT protocol. Luby and engineers from Cisco and Microsoft co-authored these documents.

Jim Gemmell, a Microsoft researcher and co-author of the two documents, has tested these protocols to distribute a 600M-byte Windows 2000 software build to Microsoft's developers.

"Every night, we put out the latest software build, and hundreds of developers need it the next morning. It loads up our network,'' Gemmell explains. "I can [distribute the build] off my notebook computer using a one-half-megabit multicast stream and satisfy everybody quicker than if they all pounded on our servers.''Other applications Gemmell has tried include distribution of PowerPoint presentations and software fixes.

"The first time there was an Internet Explorer upgrade, we wiped out Washington state's Internet infrastructure with the demand,'' Gemmell says. "We could pretty easily use multicast, and it would scale easier and use fewer network resources.''What's next?

Next, the Reliable Multicast Transport working group plans to develop protocols that will support the distribution of large data files that are created on the fly, such as live video feeds and stock quotes, vs. the static files that the group's first two documents handled.

The initial working group documents might be a shot in the arm for multicast, which has yet to gain popularity despite built-in support in most routers and network hardware. Originally designed for streaming video, multicast has few carriers that support it and there is little use within enterprise networks. Distribution of large files might be the "killer'' application for multicast, observers say.

"What's going to push multicast for this application is the size of the content and how fast it's increasing and how pervasive the problem is,'' Borovick says. "The pain is going to have to be greater than the cost for enterprises to adopt this approach.''Gemmell says the multicast approach to large-file distribution will save companies money on bandwidth and server capacity.

"The beauty of it is that it lets somebody become a source of data to a huge community without having to scale up to a massive server,'' Gemmell says. "Any group without even a fast link or a big server could support tons of users.''Gemmell says the multicast approach is ideal for companies that use satellite communications for data transfer because it requires no acknowledgements coming back from the receivers as the message is sent. Another good application, he says, would be for large data centers that regularly repurpose servers.

"If you have to blast a disk image out that's 80 gigabytes to multiple machines, if you don't multicast, you're in big trouble," he says.

Join the newsletter!

Error: Please check your email address.

More about CiscoDigital FountainIDC AustraliaIETFInternet Engineering Task ForceMegaBitMicrosoftMotorolaPeopleSoftSECSiebel Systems

Show Comments