Research

Compression for Speech Recognition and Music Classification

Overview

One of the goals of this project is to develop methods for compressing speech signals for a distributed speech recognition task. The objective of current speech compression techniques is to minimize perceptual distortion. In this project, however, we investigate efficient compression techniques that achieve low bit rate transmission, while incurring a minimal degradation of automatic speech recognition accuracy (as compared to the performance with uncompressed data). Intended applications of this project will be in cases where speech acquisition is done using low power, and possibly mobile, devices while the more complex speech recognition task is performed at a remote server. This framework can be used either on the Internet or in wireless networks.

Technology Summary
NSF Report
Poster


Distributed Speech Recognition

To enable low complexity (mobile) wireless devices to support speech recognition applications, the acquired speech is transmitted across a wireless network to a remote desktop computer hosting the speech recognizer. To minimize the bandwidth requirement and maximize the battery life of the wireless device, the acquired speech is compressed before transmission. By optimizing the compression performance to the speech recognizer (rather than minimizing perceptual distortion) it is shown that good rate- recognition performance can be achieved. A phoneme recognizer is implemented on the desktop to achieve continuous speech recognition capability.


Scalable Algorithms for Distributed Speech Recognition

Overall coding/recognition scheme

The goal of this project is to develop methods for compressing speech signals for a distributed speech recognition task. The objective of current speech compression techniques is to minimize perceptual distortion. In this project, however, we investigate efficient compression techniques that achieve low bit rate transmission, while incurring a minimal degradation of automatic speech recognition accuracy (as compared to the performance with uncompressed data). Intended applications of this project will be in cases where speech acquisition is done using low power, and possibly mobile, devices while the more complex speech recognition task is performed at a remote server. This framework can be used either on the Internet or in wireless networks.

IMSC's 2001 NSF Report
More Info