Jim Shelton looked the data warehouse monster in the eye and blinked. And he says he's glad he did. In late 1998, Shelton, the director of information systems at Global Companies in the US, says he considered building a data warehouse to share data between the Lawson Software financial applications his petroleum-distribution firm was phasing out and the new Oracle financial applications it was embracing.
But he says he knew how expensive and time-consuming it could be to pluck just the data his users needed from the proprietary flat-file database associated with the applications from Lawson and to move that data to a data warehouse in a form that could be accessed by the newer Oracle relational database. In addition, updating all that data fast enough to reflect volatile shifts in petroleum pricing and inventory would be a logistical nightmare.
By using a new type of middleware - call it data cataloguing - to link the two systems, Shelton figures he's spent one-fifth of the time it would have taken to build a data warehouse. What's more, "Within a couple of days, we were producing data and producing reports that did joins between the Oracle data and the mainframe", compared with the months or years it would have taken to build a data warehouse.
Shelton used Metagon Technologies DQbroker, one of a growing number of tools that sidestep the cumbersome, time-consuming and politically charged process of extracting data from live production databases and storing it in warehouses where it can be queried and analysed. Rather than moving the required data to a physical warehouse, software such as DQbroker creates a 'catalogue' of the data and lets users 'order' just the data they need when they need it.
Share and share alike
Sharing data among incompatible databases is important enough when, say, accounts receivable needs a daily update from sales to send out the bills or when accounting needs sales data from four subsidiaries to close the quarterly books. It becomes even more important when companies need to buy and sell instantly over the Web.
While different vendors take different approaches, the aim is to let business users find the data they need in a data catalogue and order it. The catalogue consists of meta data about a database - identifying a data source, describing which tables and columns are available to be queried and even details such as whether customers are identified by number or by a string of text such as a name.
The meta data is used to direct queries to the proper database and translate queries into the specific format the target database can understand. While some data still must be moved from production databases to users and some may be stored along with the meta data catalogue, this approach is often less expensive and time-consuming than building and restocking a full-fledged warehouse.
Peers helping peers
While most middleware tools create a central hub to catalogue information and translate among the various database dialects, Metagon scatters these functions among object brokers, one of which runs on every database platform in the system. Each broker is a peer, meaning it can do data-retrieval work for other nodes or ask other nodes to retrieve data for it.
By dividing up the jobs, DQbroker works more quickly than competing approaches, delivering small amounts of data from some nodes to the user while other nodes are still performing their queries, says David Brower, director of research and development at Metagon. This approach also reduces the amount of data moving over the network, he says, making it easier to scale the system by simply adding more brokers to a single node or creating new databases with an associated broker.
Perhaps Metagon's most important claim, though, is its ability to provide seamless access to any database, from mainframe Cobol applications to popular relational databases such as those of Oracle and Sybase, and even Excel spreadsheet files.
Once installed on the database, the broker begins collecting information about the associated database and maintains that metadata locally. It also translates queries between database formats, says Brower.
How well does it work? Well enough that Shelton was able to link not only his Unisys and Oracle databases, but also a database server from Sybase, and an in-house custom application for less than $US100,000. It took three days of consultant time to get the links up and running. Shelton's needs are fairly modest: a maximum of 15 concurrent users, accessing several hundred predefined reports processed each night. But the process of giving users access to new tables using DQbroker "is almost akin to drag-and-drop", he says. "When we look at places where we need to do integration between systems, we look at DQbroker first."
The same challenge confronted an IT manager at a large distribution company. The IT manager, who declined to be identified, needed to link a legacy Cobol enterprise resource planning system running on a Tandem Computers fault-tolerant computer with a new SAP R/3 implementation running on an IBM AS/400 and DB2.
"When you run an [inventory-based] business, you need to keep your systems in sync" during such a transition, says the manager.
His company also needed timely information so it could do business over the Web, giving its partners browser-accessible updates about such things as order status. As for data warehouses, the manager snorted, "We don't have time to sit down and redesign a data warehouse for every special need" identified by business managers.
After considering several approaches, including IBM's MQSeries messaging middleware and Microsoft's BizTalk server, the IT manager chose Attunity Connect from Attunity, formerly ISG International Software Group, which had sold the product as ISG Navigator. He says he was sold on the combination of Attunity Connect, the "plumbing" to link the actual databases, and Attunity B2B software, which holds the business rules that determine how data is transformed between the Tandem and R/3 systems.
MQSeries lacked the extensive business-rules engine that Attunity B2B offered, and it wasn't as compatible with other platforms as Attunity, says the manager. BizTalk lost out, he says, because "it's Windows-oriented" and would require a lot of extra development to meet his needs.
The manager says he planned to complete prototype in the middle of the year and launch a pilot system in the third quarter. His biggest surprise so far? "That we found a product that can connect and talk to all those different technologies in a really, really easy way," he says.
Don't look for Garlic in any IBM product list - it's middleware technology built into Version 7 of IBM's DB2 database and Version 2 of its DataJoiner database-connectivity software.
In Garlic, the meta data catalogue takes the form of 'wrappers' - software that describes the data in the remote database, translates commands between the DB2 query engine and the target databases and helps the query engine determine the fastest way to execute the query.
Because the wrappers also know how each database processes queries, such as the number of tables that can be accessed per query, DB2's query-processing engine "can decide the best way to process that request", says Jim Kleewein, a member of the senior technical staff at IBM's Silicon Valley Laboratory in San Jose. Like other competing technologies, Garlic can also cache some of the data itself to speed the retrieval process.
Currently, within DB2, IBM provides wrappers only for its own DB2 database and Oracle. With DataJoiner, users can view several other relational databases, says Laura Haas, a research manager at IBM's research centre. In the next major release of DB2, expected sometime next year, many of the data-access capabilities now found in DataJoiner will be incorporated into DB2, she says.
For now, IBM is doing most of the wrapper-writing itself, as with its Discovery Link package of products and services for helping scientists track and categorise human genes. IBM also plans to publish an application programming interface to let outside developers write their own wrappers.
If Garlic is currently limited in its support for outside databases, its strength "is the extent to which the wrapper participates in the query-planning process", says Kleewein, "and the extent to which our wrappers have correctly modelled the behaviour of the underlying [data] sources".
As a result, says Haas, "We do a much better job of query optimisation than any of the competition."
For business users, though, the real value of these new tools lies in that they let business users analyse their data with off-the-shelf query tools, without having to know or care where the data resides. Which is just what Shelton and other IT managers say they want.
* Freelance writer Robert Scheier, formerly US Computerworld's technology editor, can be reached at email@example.comA crop of data catalogue toolsAttunity's Attunity Connect: Application-level middleware aimed at synchronous (real-time) information sharing. Provides meta data catalogue and native links to target databases for greater performance and flexibility.
IBM's Garlic: Database middleware available only in IBM's DB2 Version 7 and DataJoiner Version 2. Uses "wrappers" to describe data types and specific database implementations to the DB2 query engine. IBM claims Garlic does better query optimisation than competitors. Limited support for databases other than DB2 and Oracle.
Metagon Technologies' DQbroker: Peer-to-peer, object-oriented middleware distributes meta data storage, query translation and query processing among 'nodes' on each database. Metagon claims that it can access any data from mainframe Cobol applications to Excel spreadsheets.
Oracle's Transparent Gateways: Based on Attunity Connect, the gateways support access to more than 40 non-Oracle data sources including DB2, Sybase and Microsoft SQL Server. Among other functions, the gateway translates queries among SQL dialects and caches frequently used queries for greater performance. Built into Oracle8i.
Sybase's Enterprise Connect Data Access: Provides a catalogue of data and application objects available within the enterprise. In addition to accessing data, users can launch applications and stored procedures from the catalogue. Because of the object-oriented nature of the catalogue, Sybase claims, developers can work with the objects using a variety of development tools including C, C++ or Java.
* Source: Interviews with company officials