In a classic example of how the high-tech industry works, on the same day that Microsoft Corp.'s chief architect Bill Gates announced support one for speech recognition technology known as SALT (Speech Application Language Tags), the WC3 (World Wide Web Consortium) announced support for another, VXML.
Microsoft this week at its Professional Developer's Conference (PDC) released what it called a ".NET Speech SDK technology preview," a speech specification for its .NET initiative and for more powerful handhelds.
The specifications for Web developers uses SALT to allow developers to create speech tags for HTML, xHTML, and XML markup languages. SALT will make it easier for Web developers to incorporate speech and it will be supported in Internet Explore, Pocket IE, ASP.net, and Visual Studio.net. According to Kai-Fu Lee, vice president of the Natural Interactive Services Division for Microsoft.
While Gates announced support for SALT during his keynote address at PDC and Microsoft released preliminary specs, the W3C announced formal acceptance of Voice XML as the standard for adding speech recognition to make Web-based applications accessible over the telephone network.
The W3C will now take control of VXML specifications created by the VXML Forum. W3C also released the working draft of VXML Version 2.
On the face of it, the two technologies inhabit different spaces. VXML was designed to allow Web developers without IVR (Interactive Voice Recognition) skills to create a one-time application for both desktop and telephony based platforms.
SALT, which is also targeted at Web developers, is meant to create a voice-activated user interface as part of a larger "multi-modal UI for handheld devices. On handhelds, voice is expected to be one of many ways to access information.
Although VXML and SALT are targeted at two different platforms, a turf war appears inevitable and Microsoft is being accused of rubbing SALT into an industry already wounded by high expectations and poor follow-through.
"The danger here with SALT is we get into a situation where we have another approach targeted at the same set of capabilities. We need one single unified approach," said Nigel Beck, director of voice systems at IBM and a founding member of the VXML Forum.
Unfortunately, opinions on whether SALT is needed or not are both varied and contradictory.
"SALT benefits Microsoft, but I don't see where it adds values to voice applications, and in a way it confuses the market," said Elizabeth Herrel, a speech analyst at Giga Information Systems in Cambridge, Mass.
If SALT is only a small set of lightweight tags as its proponents claim, then it cannot be used for speech applications, nor does it have an inherent advantage over VXML for multi-modal devices, Herrel said. In Herrel's capacity as the speech technology analyst at Giga, she issued a statement advising developers incorporating speech into applications to use VXML.
Speaking as a member of the SALT Forum Glen Shires, director of media servers (telephony) at Intel, said he believes both languages have different strengths, VXML for telephony and SALT for multi-modal. However, when asked if developers would then need to learn two development environments to have a complete voice-enabled application, he said, "It is possible to do everything in SALT."
This opinion is backed by James Mastan, group product planner for Microsoft .NET Speech Technologies, who also said VXML was created for IVR-based services. He admitted it remains problematic whether VXML could be used for handheld devices.
"Technically it is extremely difficult to go from the voice area [VXML] and extend that to the multi-modal space. It is much easier to take existing HTML markup language and add a few small elements," Mastan said.
Nigel Beck, a member of the VXML Forum, said that the WC3 is investigating creating multi-modal extensions for VXML.
The initial strategy behind the VXML initiative rested on the simple fact that cell phone growth is increasing by an order of magnitude faster than any other segment of the wireless market. Thus, VXML's goal is to make that lucrative channel available for current Web services. What is unclear is if SALT proponents will eventually want to target the same lucrative market.
In reality, the technology companies finally deploy is more likely to be based on what's available, according to Bill Meisel, president of TMA Associates in Tarzana, Calif. Meisel says that because the small device market is still evolving, Redmond, Wash.-based Microsoft might easily wait three or four years for SALT to mature, but companies in the telephony business can't.
For example, AT&T currently licenses its voice technology for voice-activated directory assistance from Tellme Networks in Mountain View, Calif. The service handles more than 1 million calls a day and is written in VXML.
Tellme's CTO, John Giannandrea, believes that the technologies solve two different problems, eventually SALT specifications will also be submitted to the W3C, and the cores of both technologies will be merged.
"SALT is about turning Windows devices into a phone [voice over IP], but unless Microsoft wants to be a carrier and not a device-based company, advanced services will live in the networks and you will pay your carrier for them," Giannandrea said.