Australia’s Defence Science and Technology Organisation (DSTO) is developing a networked voice recognition software system to produce a form of transcripts on the fly, according to a presentation delivered to the DSTC Evolve conference in Sydney last week.
Based on so-called commercial, off-the-shelf technology (COTS), voice recognition technology such as the Dragon Systems range of products, DSTO says it has successfully networked such applications, even deploying them to document meetings between Australia’s military planners and “collaborators” during operations in East Timor.
The efforts form part of greater research project called "Extending Interactive Intelligence Workspace Architectures with Enterprise Services", which is experimenting with new forms of operating environments to coordinate and manage between people, multiple display surfaces, personal information devices and workspace applications - or LiveSpaces.
“DSTO is working on an intelligent listening transcription technology that will automatically transcribe meetings into minutes,” DSTO research leader, command and control division, Dr Rudi Vernik told the conference.
Vernik said that while the current development is essentially focused on military usage, transferring it to myriad commercial civilian applications - such as effectively automating the notation of networked meetings or teleconferences and distributing transcriptions in real time to participants or stakeholders - is conceivable.
DSTO is the Australian Defence Force’s key research and development agency and also has a mandate to commercialise and license the technologies it develops.
In the military context, the DSTO's networked voice recognition solution has been used to quickly and effectively annotate structured meetings where reports are delivered and their content, including decisions, disseminated down the line to appropriate parties.
Meeting participants have the voice recognition software customised to their individual vocal delivery, and effectively carry their vocal profile with them as part of their network or user identity, thus allowing the recognition software to map vocal output to a given user.
Vernik says that like most automatic recognition technologies, DSTO’s networked intelligent listening offering works best in regulated and structured environments that are consistent.
“Obviously it doesn’t like when people talk over each other during teleconferences or meetings,” Vernik says, emphasising that it helps when vocabulary is established and consistent.
If successful, DSTO’s new technology has the potential to effectively attribute text to speech in real time, thus further opening the possibilities of so-called babelfish translators, or computers capable of cross-translating a variety of spoken languages.
Researchers at the US Defense Advanced Research Projects Agency are currently working on a project codenamed ‘Babylon’ that aims to build a platform to allow soldiers in the field to translate and understand a range of foreign languages.
In the meantime, monolingual text output from conference speeches, progress meetings and answering machines will have to suffice. Just watch what you say.