Metcalfe's column: Web father shares vision of the semantic web

There is not just one next-generation Internet, or one nextgen Web. Progress is made rapidly in parallel at many levels of protocol. Even successive generations at the same level can crowd and sometimes overtake one another.

So it may have been confusing, especially to telephone company executives, when Tim Berners-Lee presented his vision of a nextgen Web at the Eighth International World Wide Web Conference ( in Toronto.

Berners-Lee fathered the Web in 1989 and is now director of the World Wide Web Consortium (W3C).

I hasten to disclose that this year not only is the Web turning 10 years old but also my former company, 3Com, is turning 20. To celebrate, founders and friends have endowed a $US2 million chair at MIT. Berners-Lee is now the 3Com Founders principal research scientist at the MIT Laboratory for Computer Science.

In Toronto, Berners-Lee told us that the old Web shares information among people using documents, whereas the new Semantic Web shares information among computers using data.

For the old Web, he devised HTML to publish information for human consumption, carefully separating content from presentation. For The Semantic Web, Berners-Lee leads work on the Extensible Markup Language (XML), carefully separating content, presentation, and meaning (semantics) for software consumption.

XML 1.0 was approved by W3C in 1998.

There is a lot of pressure today for Web search and commerce engines to grovel through HTML pages intended for human eyeballs. In desperation, they have resorted to a new kind of screen scraping. We've had software scraping 3270 screens lamely extracting data trapped in legacy mainframe applications. Now we have software scraping HTML screens lamely extracting data trapped in legacy Web applications.

XML is a metalanguage -- a language for defining languages. Its purpose is not only to standardise software to software data exchanges on the Web but also to establish a platform on which to weave the meaning of data.

Such a platform, Berners-Lee says, will give all our computing power and intelligent software something to climb around on. There's no telling what will happen after this metadata revolution, and he can't wait to find out.

XML does not replace HTML. W3C is now working on the Extensible Style Language (XSL) to address data presentation and XHTML to modularise HTML using XML.

Early among XML languages intended for software consumption is W3C's object-oriented Resource Definition Framework (RDF). HTML has tags that international organisations define for standard presentations -- for example, B and I for bold and italic, respectively. But RDF has tags that can be defined by anyone, for example PRICE and INVOICE. RDF tags annotate data for software so it does not have to scrape screens.

Data in RDF is assigned tags that can be defined in separate files called Document Type Definitions (DTDs). There is a growing list of industry-specific DTDs.

IBM has an XML-based language, Yoda, for electronic data interchange (EDI). Think of EDI as a weak meta description of electronic-commerce data. (See attended various XML sessions at WWW8, I'll say that for now we'll have to give Berners-Lee the benefit of the doubt. He is selling XML and RDF hard as long-term solutions to looming short-term problems.

But for languages that are about semantics, they sure have a lot of syntax, with angle brackets (TAG ... /TAG) galore. They are like some bad mathematics I've done -- lots of definitions and no theorems.

Or, as one wag yelled at WWW8, "Take two angle brackets and call me in the morning."

For the latest on The Semantic Web, go to

Internet pundit Bob Metcalfe invented Ethernet in 1973 and founded 3Com in 1979. Send e-mail to or visit

Join the newsletter!

Error: Please check your email address.

More about 3Com AustraliaIBM AustraliaMetcalfeMITNextGenW3CWorld Wide Web Consortium

Show Comments