The researchers accomplished their aims partly by integrating contextual data directly into the app; for example, in order to process the voice command, “Text Kendrick, ‘On my way,’” the system would need access to the user’s Contacts information—so they imported that data directly into the app. Meanwhile, they helped to compress the app’s size by combining voice command and dictation algorithms into a single language model, ultimately attaining a memory footprint of only 20.3 MB.
According to the researchers, that small size doesn’t compromise the system’s speed or accuracy—and those metrics are impressive, too. Performing a natural speech dictation task, the system has a word error rate of only 13.5 percent, and its median speed “is seven times faster than real-time,” the researchers note.
It’s a further demonstration of Google’s continuing—and perhaps growing—interest in speech recognition technology, with the company having invested heavily in speech recognition R&D and recently introduced voice dictation for its Google Docs app. This technology could have growing applications in the emerging Internet of Things, and Google is likely betting that these investments will have considerable pay-offs down the line.