A new open source data query system combining elements of the PostgreSQL relational database with adaptive query technologies has been released by researchers at the University of California (UC) Berkeley campus.
Developed by a core group of 10 Computer Science Division faculty members, students and collaborators, the TelegraphCQ system has been touted by the Berkeley team as the next generation of database query engine technology, through its revolutionary use of adaptive dataflow technologies. The team is headed by UC Berkeley professors Michael Franklin and Joseph M. Hellerstein.
According to Hellerstein, the goal of the Telegraph project is to allow query processing engines to continuously adapt dataflow to the most unpredictable environments, something which can’t be done with current databases.
This is because traditional database solutions employ query engines which are built on an “optimise then execute” model, Hellerstein said.
“Traditional databases have two phases for a query: first a query ‘optimiser’ examines the query and picks a good execution plan, and second a query ‘executor’ runs the plan chosen by the optimiser from start to finish,” he said.
The static nature of these processes means there is no ability for the system to react to changes that occur while the query is running, such as another process chewing up memory or unexpected pauses while the related query data is streaming.
To combat this, “adaptive dataflow technology merges the optimiser and executor phases, so that while a query is running it is constantly being observed and re-optimised in an organic fashion,” he said. In other words, an adaptive system constantly re-evaluates its environment while processing a query to ensure it is working effectively – despite any runtime fluctuations.
Originally developed using Java, Hellerstein said his team made the decision to rewrite the TelegraphCQ system using the programming language C/C++ about eight months ago. A paper from Telegraph researchers discussing Java states the overhaul was mainly due to limitations with Java’s programming interface.
Rather than rebuild TelegraphCQ from scratch, researchers decided to leverage the PostgreSQL code base and incorporate a range of its features, such as the front-end components for access to the client-side interface, its main query processing modules and its cataloguing functions.
The team experienced challenges while modifying the PostgreSQL framework to support processes it wasn’t designed for.
These new processes included streaming data and continuous query support, which involve storing continuous queries in the database, rather than the data itself. Another key feature of TelegraphCQ which differs from PostgreSQL is its use of shared processing and adaptive query optimisation based on ‘eddies’, or constantly changing processing modules, instead of a static query optimiser.
The Telegraph project, which began in 2000, is mostly funded by the US National Science Foundation. Hellerstein said the group also receives additional funding from both Microsoft and IBM, matched by funding from the state of California. TelegraphCQ represents the first full system release from the team. Hellerstein said one of the target applications for TelegraphCQ is giving users SQL access to remote data sources, including the “deep web”. On the Telegraph Web site, "deep web” is described as information on the Internet that is not available by simply following hyperlinks. In contrast, most of this information is “located in free Web-based databases that require a person to fill out a form in order to submit a query”, the Web site states.
TelegraphCQ could also be used as an infrastructure for querying streaming data from sensors, logs and peer-to-peer systems, Hellerstein said.
Although there is no formalised development strategy for the TelegraphCQ system, Hellerstein says the research group is working on improving two related software tools, TinyDB and PIER, with the intention of eventually tying the two to TelegraphCQ in the next 12 months. TinyDB is a streaming query engine designed to run on wireless networks. PIER is designed as a query engine for distributed applications, such as distributed network monitoring, filesharing and service composition.
A third system slated for possible future integration with Telegraph is called Yfilter. This system takes streams of XML documents and sets of XPath subscription profiles and efficiently publishes personalised documents to the subscribers, Hellerstein said.
The alpha version of TelegraphCQ system is now available on the Telegraph Web site: http://telegraph.cs.berkeley.edu.