Let's get organized

I have a theory about why we aren't recovering from the dot-com implosion as quickly as we ought to be. Perhaps it's just myopia, but I blame at least a portion of our economic woes on the disorganization of information on the Web and the fact that the current state of technology doesn't deal well with this chaos.

There's plenty of information on the Web. Once we pass the threshold where information is truly accessible to the masses, the Internet will become the indispensible foundation of our future economy.

Tim Berners-Lee has attempted to help create the kind of information infrastructure that would support such a future. If you want a glimpse of what he has in mind, read the Scientific American article titled "The Semantic Web," by Berners-Lee, James Hendler and Ora Lassila (www.sciam.com/2001/0501issue/0501berners-lee.html).

The article begins with a bit of a science fiction story in which people converse with Internet appliances the way the astronauts talked to the HAL 9000 computer in 2001: A Space Odyssey. Why is it science fiction? We have adequate speech-recognition and synthesis technology. It's just not affordable yet. What we really lack is an intelligent structure for the way we store information on the Web and an intelligent way to interpret and retrieve that information.

If you want to see just how far we are from the goal, try this sort of test on any of a number of Web search engines that accept natural-language queries. I consider the site Ask Jeeves (www.ask.com) to be reasonably good, so I asked it, "What is the Semantic Web?" It came up with an excellent list of links.

The question "What are the long-term side effects of phentermine (a weight-loss medication)?" may have produced some starting points, but it didn't give me a link with a direct answer.

But Ask Jeeves failed miserably when I asked questions like "How can I turn on TCP Syn Cookie support in the Linux kernel?" or "What was the name of the ship in the movie 2001: A Space Odyssey?" Yet these are extremely specific questions containing all the information necessary to find precise answers.

The Semantic Web addresses this very issue. It applies standards like XML and the Resource Description Framework (RDF) to the way we store and categorize information on the Web in order to make it possible to interact intelligently with the Web.

Now don't get me wrong. I'm 100 percent gung-ho behind XML, RDF or any other acronym that might make Web information more accessible. But all one has to do to sprinkle some reality dust on this fantasy is to browse through a few XML files. What you'll find are the limitations of the standards and of the humans who apply them.

For example, the program Evolution by Ximian Inc. uses an XML configuration file that includes this line: "." If you have your secret programmer decoder ring on, you'll know that the string beginning with "2f" is the hexadecimal representation of the ASCII string "/usr/bin/gpg." But if you didn't have a clue, why would you expect a search engine to do any better?

Perhaps that is a poor example, because a well-designed engine should understand that "0" and "False" are the same Boolean value. And it might even discern the difference between text strings and hexadecimal ASCII. But if there is ambiguity among simple data types, how can we expect XML to make it easier to share complex data?

The problem is that the Extensible in XML means we get to make up stuff. If we all agreed on what we made up, the metatag keywords in the HTML header on your Web site might actually mean something. But they usually don't. That's mostly due to innocent differences of opinion.

And it can only get worse if some hypothetical monopolistic company exploits the extensibility of XML to make its data more accessible to some software than to others.

So, is there any hope? Enter the RDF, another piece of the Semantic Web. We'll examine RDF in my next column to see if it can do what XML alone can't. In the meantime, assuming your particular Linux kernel supports the feature, you can turn on Syn Cookies with the command "echo 1 > /proc/sys/net/ipv4/tcp_syncookies." And the name of the ship is Discovery. Sorry, but I don't know anything about the long-term side effects of phentermine.

Nicholas Petreley is a computer consultant and author in Hayward, Calif. He can be reached at nicholas@petreley.com.

Join the newsletter!

Error: Please check your email address.

More about Ask JeevesExtensibilityXimian

Show Comments

Market Place