At investment management firm Bridgewater Associates, access to real-time data is measured in market ticks. Data feeds containing quote and trade activity are expected to stream in at 124,000 messages per second this year, so even subsecond delays in the arrival of data can affect trading decisions and put the U.S.-based organization at a disadvantage.
Monitoring high volumes of data that have very low latency requirements is beyond the capabilities of transactional databases, which must write each transaction to disk, so financial services firms traditionally build their own custom applications to keep up.
"There is a lot of effort required to build a framework that could perform and deal with lots of data concurrently," says Ed Thieberger, head of training technology at Bridgewater.
Recently, however, Bridgewater and other financial services firms have found an alternative in stream processing tools. Stream processing software goes by a variety of names, including streaming databases and event stream processing. The technology includes an engine that monitors data as it flows into and out of databases and other applications and can easily tap into external data feeds or internal message queues. All the data the engine gathers is held in memory to speed processing.
With data volumes increasing, organizations are running out of options for real-time processing. Financial services firms have little choice but to pursue stream processing because data quantities are starting to outstrip the capabilities of even custom-developed tools.
"At these volumes, traditional techniques won't scale," says Mike Stonebreaker, co-founder and chief technology officer at Massachusetts-based StreamBase Systems. Bridgewater's custom C++ program could handle 18,000 messages per second -- more than the 900 a relational database could support, but far short of the data volumes it faces this year. In contrast, the StreamBase engine handles 140,000 messages per second, Stonebreaker says.
Having gained a following in financial services, the emerging technology is beginning to spread to other industries that need to monitor operational data and interpret and respond to events in real time. Businesses are using it in areas as diverse as compliance management, network monitoring and real-time fraud detection in telecommunications, retail and e-commerce.
Stream processing software is also ideally suited to leverage message-based data flows within a service-oriented architecture (SOA). "If your organization already has MQ or other message-oriented middleware, then this is relatively straightforward," says Charles Nichols, CEO of SeeWhy Software. Users set up rules- or time-based queries that tell the stream processing engine what to look for. The software then monitors one or more data streams and triggers the appropriate response when one of those conditions is detected.
To keep latency low, stream processing systems place data that must be retained in memory and discard everything else. Nothing is stored on disk.
"Streaming databases say, 'Let's not try to store everything. Let's just watch everything as it flies by and keep running totals,'" such as the total number of transactions per second, says Eric Rogge, an analyst at Ventana Research.
At Bridgewater, Thieberger uses StreamBase's streaming technology to watch for delays in data feeds coming in from providers of market data. If one feed falls behind, StreamBase immediately issues an alert and splices in the missing data from another source. "The tool is very well suited to represent all of the rules we want to implement that lead to decisions about how we are trading," Thieberger says.
He measures the success of stream processing both in reduced development costs and faster time to market. "We haven't had to build a framework that does what StreamBase does," he says. In addition, once StreamBase is pointed at the data streams to be measured, business analysts can construct queries using a drag-and-drop user interface rather than rely on programmers, Thieberger says.
Stream processing also matches up well with another emerging technology: radio frequency identification. "Streaming is the only technology that can handle large volumes of RFID data that need to be analyzed on the fly," says Diaz Nesamoney, founder and CEO of Celequest, a business intelligence tool vendor.
The challenge with RFID tags is that they broadcast the same data continuously, says Jan Vink, IT director at Boekhandels Group Nederland, a Netherlands-based chain of 42 bookstores. When a pilot bookstore recently began checking in more than 1,200 books per day using RFID tags and a tag reader "tunnel," Vink used Progress Software's Apama tool to filter out the repetitive messages and ensure that each book was received in the system just once. The 45 to 50 boxes a day the store receives now take a total of 125 seconds for incoming processing rather than the 125 minutes required before, says Vink.