Degrees of freedom

OpenLink Software Inc.'s newest iteration of the Virtuoso database engine is geared to take the wild innovation of its predecessor to new heights. Adding to Version 2.7's support for XML, Web services, and Internet standards -- a list longer than your arm -- the forthcoming Version 3.0 will also host ASP.Net pages and user-defined functions written in .Net languages on both Windows and Linux, more than proving it worth a look.

OpenLink CEO Kingsley Idehen, Virtuoso's mastermind, abides by two basic principles: No. 1, Data management is the foundation of IT; and No. 2, developers who build on that foundation need choices.

One dimension of choice is data flavor. To that end, Virtuoso's SQL core embraces XML, has a validating parser and XSLT (Extensible Stylesheet Language Transformation) processor embedded in it, can index and search within XML blobs, can express SQL queries as XML (optionally producing an XML Schema), and supports the still-evolving XQuery 1.0 standard.

Another dimension of choice is data access. In terms of data consumption, Virtuoso, which is rooted in OpenLink's suite of data-access drivers, can augment its own SQL tables and stored procedures by linking to foreign ones such as those from Oracle or Microsoft SQL Server. Its PL (procedure language) also includes primitives used to fetch data by way of HTTP Get, raw SOAP calls, or WSDL-described SOAP calls.

As a data provider, Virtuoso supports database-style clients with ODBC, JDBC, and OLE-DB drivers; SOAP clients by means of PL wrappers around local or remote procedures; and WebDAV (Web-based Distributed Authoring and Versioning) clients, including Windows Explorer and Office, by projecting file-system-like views of query results.

Yet another dimension of choice is platform -- a vague term that encompasses OS, application server, and programming environment. Virtuoso has many of the OS bases covered: Microsoft Windows, Linux, Sun Solaris, AIX, Hewlett-Packard HP-UX, and Mac OS X. It runs as a self-contained application server, using PL code embedded in Virtuoso Server Pages to manipulate the SQL, XML, and DAV resources controlled by the engine.

Whereas Version 2.7 is equipped with the option to embed a PHP (Hypertext Preprocessor) run time and a Java run time -- which can in turn host JSP (Java Server Pages) using, for example, Tomcat -- Version 3.0 extends hosting support to .Net, using ASP.Net on Windows, and to Mono -- the Ximian-led open-source .Net project -- on Linux.

But there's more to this story than just JSP or ASP.Net support. User-defined functions written in Java or .Net languages are directly available to SQL statements and PL procedures. Java or .Net objects can be serialized, stored in the database, and invoked from the database. Creating this object/relational/XML hybrid in a single environment is an accomplishment; doing it for Java and multiple flavors of .Net is truly inspiring.

Working Web services

There's so much going on under Virtuoso's hood that you really need a concrete example to see how the pieces fit together. Although it seems that everybody wants to write an RSS (Rich Site Summary) aggregator nowadays, I picked that project anyway because it nicely illustrates Virtuoso's diverse powers. The starting point for an RSS aggregator is a list of news feeds. My feeds live in a Frontier database and are published to the Web as an OPML (Outline Processor Markup Language) file. There are a dozen ways to use Virtuoso to acquire that list of feeds. I could, for example, use Virtuoso's built-in Web client to fetch the OPML file as a database column and then use its built-in XSLT processor to reduce the XML to a list of URLs. Instead, because I had already written a SOAP service that returns the list of feeds, I decided to call it from Virtuoso.

Radio UserLand doesn't produce WSDL wrappers for SOAP services, so I used Virtuoso's low-level SOAP client to retrieve the data as a PL vector -- essentially, a list of lists. Once I had a PL routine that completed this task, the data could in turn be wrapped up as a SOAP service. That enabled me to use Virtuoso's WSDL auto-generator to make the raw Radio service WSDL-consumable -- a neat bootstrapping trick! To pull this off, I wrote an XML Schema fragment that defines an ArrayOfString and fed it to Virtuoso's soap_dt_define function to define the new type. Finally, I used Virtuoso's auto-generated test harness to invoke the service through the WSDL wrapper.

The next step was to define the database schema for the news feeds. Because they are by definition valid XML files, I chose a simple two-column structure:

CREATE TABLE feeds (feedname VARCHAR, feeddata LONG VARCHAR)

For the BLOBs in the feeddata column, I created an XML-aware text index, like so:

CREATE TEXT INDEX XML on feeds (feeddata)

One way to populate this table would be to fetch the news feeds using Virtuoso's native Web client. Virtuoso PL is a capable scripting language, but it's nice to know that other, more familiar languages are an option. This was a great opportunity to try out the new C# support. On a Windows XP machine equipped with Version 1.0 of the .Net Framework, I wrote a C# method to fetch an URL and to ensure that its XML is well-formed. Then I compiled the method into the server's directory, used a PL procedure to import it, and used another PL procedure to invoke it once per feed.

Experiments in XML

On my first try I ran into a glitch in the beta software. Some XML files validated by the C# code were nevertheless rejected by Virtuoso's parser and could not be added to the database. OpenLink has since verified and fixed the problem, but it did serve to illustrate an interesting point. A C# (or Java) method called from a Virtuoso stored procedure is transacted, so the failures caused automatic rollbacks.

While OpenLink was analyzing the bug I found, I switched to the simpler Web-client approach to populate the database and try some XML queries. I've sometimes wondered about the distribution of RSS versions across the 100 news feeds I subscribe to; what's the breakdown among the RSS 0.9x, RSS 1.0, and RSS 2.0 versions? With the XML text index in place, I was able to select the union of two XPath-enabled queries -- one targeting the version attribute of RSS 0.9x and 2.0 feeds, the other isolating the RDF element of RSS 1.0 feeds -- and to group the results by frequency of version.

I could have piped these results through Virtuoso's XSLT processor and displayed them using its built-in Web server. But I decided to chart the data. To do that, I used Virtuoso's SQL-to-XML converter -- essentially the same as SQL Server's FOR XML AUTO data-shaping method -- to render the results as XML. I routed the results of the FOR XML query to Virtuoso's WebDAV repository, which can cache such results or regenerate them on demand. Then I pointed Excel 11 at Virtuoso's WebDAV server, imported the "file" into an XML data range, and charted it. Once it was all working, I repeated the same procedure on Red Hat Linux 8.0.

Think about the pathways involved here. Data moves from a SOAP service in Radio UserLand, through an auto-generated WSDL wrapper, into a database stored procedure, which calls out to the Web through a C# extension and stores results in an indexed XML database. Then an XPath-enabled SQL query gathers results, converts them to XML, and virtualizes them as a WebDAV resource, which Excel finally reads and analyzes. Too many degrees of freedom? Not to my way of thinking. "It's just data," says Apache officer and IBM senior developer Sam Ruby. Of course data takes many forms and flows in many ways. I can't think of another product that supports as many data pathways -- or does so as intelligently -- as Virtuoso.

Join the newsletter!

Error: Please check your email address.

More about ApacheHewlett-Packard AustraliaIBM AustraliaMicrosoftOpenLinkOpenLink SoftwareOracleRed HatXimian

Show Comments

Market Place