Topic maps

Computers have so overloaded us with data, it's become increasingly difficult to find the information we seek. Beginning in the 1990s, powerful search engines like Yahoo, AltaVista and Google made the Web an incomparably valuable information resource, but the growth of available information has rendered even those remarkable tools far less useful. Google currently indexes more than 4 billion pages, and queries often return tens of thousands of pages, but they are arranged in no discernable order.

One promising approach, still in its infancy, is called topic mapping.

Consider the traditional nonfiction book. The care with which its index is created can make the difference between it functioning as a reference work or being a nearly useless compilation of facts. A good index shows what topics are covered, where to find them and how they are organized; offers subcategories and cross-references; and provides pointers to related topics.

But even the best such indexes have limitations. Each covers just one work, and books' very nature restricts the types of information an index can reference. If we want to encompass more than the ideas in a single book -- say, a company's accumulated store of documents and its knowledge base -- we need to include more than words on paper. We can find pieces of this knowledge in e-mail messages and headers, individual calendars and schedules, spreadsheets, and structured and unstructured documents in a variety of formats. It can also be found in databases and data warehouses of various types; libraries of images, including audio and video; and data and business rules contained inside application programs and data files. And we must always be aware of security and privacy concerns -- who can access what information? Where do we begin?

In contrast, a topic map is a kind of data structure, just as an outline or a set of categories is. In practice, topic maps were standardized by the International Standards Organization in 2000 (ISO/IEC 13250) as XML Topic Maps, or XTM. XTM provides a basic model using XML tags to represent the structure of information resources, concepts and the relationships between them.

How It Works

Let's start with a subject, a real-world entity or an idea that we're representing in our map by topic. A subject can be almost anything, from an abstract concept to a specific document section, and the terms subject and topic are often used interchangeably.

The topic map model lets us attach three elements (called characteristics) to any given topic: its names, its associations with other topics and its occurrences (also called resources).

Names are mainly useful to people in dealing with topics, and a topic doesn't actually need a name: A typical cross-reference (e.g., "see page 12") points to an unnamed topic. Also, we typically group topics according to some notion of type.

For example, if we're mapping an IT installation, we likely have topics for specific pieces of equipment, homegrown and purchased applications, data storage information and the like. Thus, our map would also include categorical topics such as hardware, software and data structures.

Associations are the conceptual heart of topic maps, indicating how one topic relates to another. For example, Book A (a topic) is written by (association) Author B (another topic).

Occurrences are the actual references -- pointers to relevant information resources. Occurrences could include articles, books, images, audio and video segments, application code routines or even people. Typically, we refer to occurrences with uniform resource identifiers (URI), an Internet Engineering Task Force standard for addressing and referencing resources. Web address URLs are a type of URI.

These characteristics of topics aren't universal. They exist within a limited context (called scope), where they are regarded as valid.

The final concept is identity. Ideally, there should be one topic for each subject, and vice versa. In practice, multiple topics can represent a single subject, as when different topic maps are merged. And in a single topic map, we might find "William F. Bonney" and "Billy the Kid" as separate topic names referring to the same subject, a historical person.

But the topic name "Billy the Kid" might also refer to the ballet about the outlaw's life for which Aaron Copland composed the music. To get around these problems, we can unambiguously define the identity of a subject through resources called subject indicators.

The promise of topic maps is clear. Charles Goldfarb, one of the creators of Generalized Markup Language, the progenitor of XML and all of today's markup languages, has called topic maps "the GPS of the information universe."

Unfortunately, the idea of topic maps is still well ahead of its time. Tools for creating topic maps do exist, along with some implementations in specific subject areas, but these are primarily oriented toward representing and organizing content, and they don't yet adequately address the task of content creation.

The biggest job in building a topic map lies in defining the set of topics and relationships, finding the relevant occurrences and then examining the data for cross-references, aliases and other helpful tools. While some pieces of this job, as with book indexing, can be automated (especially for structured data), the biggest part still requires a human imagination to sort out.

But in a few more years, as Moore's Law continues to expand our computing capabilities, we may well see topic maps come into their own. An application programming interface specification for topic maps (http://xml.coverpages.org/ni2004-04-09-a.html) was released in April, so development in this area is proceeding. For now, topic maps are something to be aware of, even if they're not quite ready for prime time.

Join the newsletter!

Error: Please check your email address.

More about AltavistaGoogleiECInformation ResourcesInternational Standards OrganizationInternet Engineering Task ForceISOYahoo

Show Comments

Market Place