Google have been working on GAUDI (Google audio indexing) for some time now, they incorporated their speech recognition technology into YouTube (transferring speech to text and then indexing it), and now there is a dedicated labs page for the project.
From the Google labs page:
“Google Audio Indexing uses speech technology to transform spoken words into text and leverages the Google indexing technology to return the best results to the user.
The returned videos are ranked based — among other things — on the spoken content, the metadata, the freshness.
We periodically crawl the YouTube political channels for new content. As soon as a new video is uploaded to YouTube, it is processed by our system and made available in our index for people to search.”
Audio indexing research has been around at least since the early 90′s. The main problem is obviously accuracy, as for example the system has to recognise different accents. I think that getting it to work in music would be quite a breakthrough because of all the “noise” around the words.
The kind of methods of measure used in this type of technology are things like amplitude, zero-crossing, bandwidth, band energy in the sub-bands, spectrum and periodicity properties.
Once GAUDI works well and is fully deployed it’ll be extremely useful for us all I’m sure. This also means that all those podcasts you have about your company and it’s products and services will be very useful to help users find you in Google.