This week I've got a great brain-teasing puzzle for you. It's famous on the Internet as the Monty Hall Paradox. Anyway, I get to be Monty Hall and offer you a choice of three doors, behind which are two goats and a car, respectively.
This brain-teaser was passed to me by Dr. Michael Lynch, whose considerable expertise is in adaptive pattern recognition. Lynch is an expert in the application of Bayesian methods, named for the 18th-century reverend Thomas Bayes, who developed them to assess the probabilities of zero or more gods.
Lynch is founder of Neurodynamics and its 1996 spin-off, Autonomy, both out of Cambridge, England. Lynch drove into London from his farm, where he keeps rare otter hounds. We met over dinner at the Ritz to talk about Autonomy, which is now based in the San Francisco's multimedia gulch.
Lynch says Autonomy, with $US15 million in venture capital and early profitability, will be the "Oracle" of unstructured information. Somebody please warn Larry Ellison.
Autonomy offers software to help us sort through all that wonderful content now clogging the Internet. See http://www.agentware.com.
So, imagine I'm a game show host -- not so hard. And imagine you're a game show contestant. I show you three doors behind which are two goats and a car. We assume you want to win the car -- my irresistible all-wheel-drive Volvo station wagon.
Now, I ask you to choose -- but not open -- one of the doors, and you do. Then, aiming to improve your chances of winning, I open one of the doors you didn't choose and show you a goat. Then, I ask if you want to take what's behind the door you've chosen, or switch to take what's behind the other unopened door?
Do you think you should open the door you've chosen, switch to the other unopened door, or it doesn't matter. Pause here to think about this puzzle.
Here's what the puzzle is trying to illustrate. Autonomy's Bayesian-based software reads documents looking for word probabilities. With these it is able to cluster documents on similar subjects. It is able to hyperlink and summarise them. It is able to reject goats.
While you're reading an article, Agentware is able to suggest related reading. All automatically. No human operators tagging contents. And with impressive accuracy.
Agentware does not work just by counting words. It looks for unlikely groupings to deduce what documents are really about. For example, if a sentence read, "down a street comes a car," there's not much going on there. But if instead it read, "down a street comes a goat," that's more likely what the document is about.
Here's another example: Cats have tails. Cats have maybe nine lives. But, a document containing "cat o' nine tails" is probably not about cats, but pirates. This is something Agentware can deduce. Using statistics and conditional probability methods developed by, and since, Reverend Bayes, Agentware captures concepts rather than just raw word frequencies.
OK, ready for my solution to our game show puzzle?
Your chances of winning the car are much higher if, after I show you the goat, you switch to the other unopened door. You have a two-thirds chance of winning if you switch vs. one-third chance if you don't.
Many people think it doesn't matter if you switch. They think there's no difference between the door you chose and the other unopened door, even after seeing that goat. They are wrong.
Hint: Seeing the goat tells you nothing new about the door you've already chosen but quite a lot about the unopened door you did not choose.
Look at the puzzle in a Bayesian way. If you originally chose the car, with probability one third, would it be good to switch? Obviously not. But, if you originally chose a goat, with probability two-thirds, then the unopened door would have the car, so switching wins. Therefore, because it is twice as likely that you first chose a goat, it is twice as likely that you'll win by switching.
Do you agree? Let me know. And please think before firing off a nastygram -- thinking is the point.
I have written about collaborative filtering and channel push technologies for personalising infoglut. Lynch is annoyed at all the fruitless attention these goats have gotten. He's asking us to switch to a powerful statistical approach.
Technology pundit Bob Metcalfe invented Ethernet in 1973 and founded 3Com in 1979, and today he specialises in the Internet. Send e-mail to firstname.lastname@example.org.