Tag: Speech Recognition

Historic Breakthrough: Microsoft Reaches Virtual Parity With Human Speech

In an historic breakthrough, Microsoft’s AI team has developed technology that recognizes speech as well as humans. Their research team published a paper (PDF) showing that their speech recognition system makes errors at the same rate as a professional transcriptionists, which is 5.9%.

The IBM Watson research team published a word error rate (WER) of 6.9% earlier this year. They noted that their previous WER was 8%, announced in May 2015 and that was 36% better than previously reported external results.

Clearly, artificial intelligence technology is on a pace that will make machine word recognition superior to human word recognition in just a matter of months. Of course WER is only one way to measure and the technology must continue to improve for perfect comprehension and to prompt human level responses.

Microsoft, IBM, Apple, Google, Amazon and a host of other companies are on a mission to use AI to integrate speech recognition technology into virtually every device. In order to truly make the IoT meaningful to people, we will need to be able to communicate with them in our language. By 2020, there will be over 30 billion things connected to the internet, according to Cloudera.

“We’ve reached human parity,” said Xuedong Huang, who leads Microsoft’s Microsoft’s Advanced Technology Group and is considered their chief speech scientist. “This is an historic achievement.”

Microsoft says that the milestone will have broad implications for consumer and business products including consumer devices like Xbox and personal digital assistants such as Cortana.

“This will make Cortana more powerful, making a truly intelligent assistant possible,” notes Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group. “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible.”

“The next frontier is to move from recognition to understanding,” said Geoffrey Zweig, who manages the Speech & Dialog research group.

The holy grail according to Shum is “moving away from a world where people must understand computers to a world in which computers must understand us.”

At the rate the technology is advancing, that goal now seems within reach.

October 19, 2016
Google Video Looks At ‘Science Of Talking With Computers’

Google has a new video out, which it calls a “short film” about speech recognition. It starts with the early days and works its way up to the incredible technology that’s available today (including from Google, of course…okay, it’s pretty much all Google).

The description says: “Language. Easy for humans to understand (most of the time), but not so easy for computers. This is a short film about speech recognition, language understanding, neural nets, and using our voices to communicate with the technology around us.”

While you could look at the video as an ad for Google, it does serve as a pretty interesting reminder of how far this stuff has come.

Image via YouTube

October 17, 2014
Voice Recognition Comes To Chrome (Stable)

Google launched Chrome 25 beta last month, which included support for voice commands via the Web Speech API. Now, voice recognition has come to the stable release.

Developers can use the API to to integrate speech recognition capabilities into their web apps, so Chrome users can benefit from the feature.

Google has a demo here, if you want to see how it works.

The release also disables silent extension installs in Chrome for Windows.

“This keeps Chrome fast and safe by ensuring that you consent to every extension that’s installed on your computer,” says Google software engineer Glen Shires.

The new features will come with the auto-update as the release is rolled out.

February 22, 2013
Google Launches Chrome 25 Beta With Speech Recognition

Google released Chrome 25 in beta today, and this version of the browser is noteworthy because it supports voice commands via the Web Speech API. Developers will be able to tap into this to integrate speech recognition into their web apps.

Google speech specialist and software engineer Glen Shires pretty well sums up what this means when he says, “Using your voice to search on your computer or phone is handy, but there’s so much more you can do with voice commands. Imagine if you could dictate documents, have a freestyle rap battle, or control game characters with your browser using only your voice.”

When this hits the stable release, Google should really have some cool stuff to show off when it advertisers Chrome on TV as it has been doing lately.

The beta also automatically disables some extensions on Windows that have been added by third party programs without you necessarily knowing about it.

“The original intent was to give people an option to add useful extensions when installing applications, but unfortunately this feature has been widely abused by third parties who added extensions without user consent,” says Shires. “A notification will appear with the option to re-enable the affected extensions.”

The beta can be downloaded here. Once you do, you can check out this demo Google has set up, where you can compose an email by talking.

January 14, 2013