Content categorization tool vendor Semio Corp. this week announced its new Semio Tagger product that includes XML support, allowing customers to integrate their own applications more easily.
Semio's other products, Semio Taxonomy and Semio Map, allow users to semi-automatically classify unstructured content (documents) based on a pre-defined taxonomy of categories. However, the data is stored in Semio's own format and difficult to extract for use in other applications. With the addition of XML, users will be able to export the structured data that Tagger generates for use in any XML-capable application.
"XML allows us to componentize what we do best," says Jim Nisbet, vice president of research and development at San Mateo, Calif.-based Semio. "It enables the use of content categorization in ways that we haven't thought of."
Stanford University's HighWire Press Division will be using the XML capability extensively. The group publishes some 120 scientific and medical journals online from a variety of publishers. HighWire uses Semio to create indexes of the documents they publish to make it easier to search.
"Keyword search is not effective as the database of content grows larger," says John Sack, director of HighWire. "We've got over 1 million documents each about 10 pages long and containing dense scientific text."
Highwire is adding between 1,000 and 5,000 online pages, increasing the need for categorization. Sack says Semio Tagger allows him to build a Yahoo-like directory without the human intervention.
Sack says his group will now be able to run content through their Solaris-based Semio Tagger engine, export the content in XML and present it to publishing partners via a Web interface using Java. HighWire is piloting the new system with two of its broader publications in order to build a based taxonomy before digging down into some of the more narrow-subject publications. They hope to have the first two XML-based systems up and running by mid-October.
Pricing starts for Semio Tagger starts at $26,000 for 250 users.