Hierarchical Speech Recognition

Speech recognition is an essential component of any Human Computer Interaction (HCI) scheme, which aspires to be natural. Thus, high accuracy speech recognition is of critical importance in making natural man-machine interfaces. Most systems today are based on phonemes, which are considered to be the fundamental units of speech based communication. For recognition purposes the phoneme provide a convenient unit in terms of training data requirements and availability. However the short duration of the phoneme limits us to correlations and information present in time scales of around 30-40ms. This puts fundamental limits on the recognition accuracy that can be achieved. The goal of this project is to design training and recognition algorithms for building systems, which will use units such as syllable or word to provide a much larger acoustic context for recognition. In addition larger units are more robust in handling pronunciation variations, which are common in a diverse cultural society such as the USA. Based on our current experimental results we can confidently say that such hierarchical systems will clearly outperform existing phoneme based systems.

NSF Report
Poster
Laboratory

Research

Hierarchical Speech Recognition