A preview of a photo management application and speech recognition research were among the surprise highlights at an AT&T Labs open house here this week.
The company best known for its telephony and telecommunications products is clearly venturing onto new turf. The most surprising demonstration from the open house was a nifty photo management application called Shoebox.
The Shoebox software was developed by a U.K.-based AT&T research division that was purchased earlier this year from Oracle Corp.
"We started developing Shoebox when it was clear there would be a burgeoning market for digital photography and no clear solution for managing the tons of photos that consumers would likely accumulate," says Ken Wood, a research manager at AT&T's U.K. division. "This is very sophisticated software based on an object-oriented database, but it's all under the hood to make it easy to use."
Wood dodged questions about when Shoebox might ship or in what form, saying that AT&T has not yet decided the software's future. The application goes several technical levels deep in terms of its sophistication, and could be potentially split into separate consumer applications or even an Internet service.
"The application can scale to handle hundreds of thousands of photos," Wood says.
Annotate Your Art
Shoebox lets you sort digital photos by name and keyword labels which you can preassign. You can also add audio notes to each picture. Once you've appended a note by speaking into a microphone, the software runs it through a speech-to-text engine, automatically adding that text to the searchable database.
Shoebox also runs images through a proprietary "auto-segmentation" program based on color, orientation, and shape, and places them a searchable index to match up photos. To demonstrate, Wood pulled up a photo of a white polar bear and clicked to find the next image that best matched. Voilá, Shoebox produced a different photo of a polar bear. When he clicked again, a photo of white cloudy skies came up -- which might not be particularly useful, unless you trust the results to mean no other polar bear images are in the database.
Shoebox is also being adapted for wireless applications. Wood demonstrated a Sony mininotebook running Windows 98, with a tiny, built-in video camera, a Lucent wireless Wavelan modem card, and Shoebox software. Within seconds he snapped a photo of me and successfully sent it to my e-mail address from a completely wireless device.
"I like it, the voice annotation is great," says Greg Stikeleather, a software entrepreneur attending the event. "But it's not clear the average consumer needs all these capabilities, and it might be best set up as a Web service where you can pick and choose what you want to use.
Speak to Me
Speech recognition technology isn't new, but AT&T is doing extensive research on how to effectively emulate human speech instead of the common, mechanical "computer voice." One research group has created a 150M-byte database of recorded speech segments to enable realistic text-to-speech applications. You can try it yourself at the division's Web site. You enter any combination of words or phrases up to 30 characters, and play it back in your choice of male, female or a child's voice.
"There are countless different voices, but for demonstration we have the three choices at our Web site," says Juergen Schroeter, an AT&T researcher.
The software may be used to read e-mail, either from a PC or another device such as a cell phone.
"We also see this being used in applications like customer care, intelligent agents and many other ways," Schroeter says.
AT&T clearly expects broader use of speech technology, such as listening to faxes, stock quotes, weather reports, movie reviews, and the like, over the phone, anytime, anywhere. The text-to-speech technology will be used in AT&T's upcoming OneComm service, which provides Web and telephone access to e-mail in combination with several calling services.
Implementing an accurate text-to-speech program is no small matter. For example, more than two million unique surnames are used in the United States, and no one likes to hear one's name pronounced incorrectly, Schroeter notes.