Variable-Length Vocal Tract Modeling for Speech Synthesis

Student: Siddharth Mathur

Modeling of the human vocal tract an essential element in many speech synthesis systems. The Kelly-Lochbaum model uses fixed-length tubes of different cross-sectional areas to approximate the vocal tract. Because the length of each tube is closely tied to the sampling frequency, the total length of the tract cannot be changed dynamically without changing the sampling frequency. A fractional-delay filter is used for bandlimited interpolation between samples. In conjunction with the digital waveguide model of the vocal tract, such filters can be used to efectively lengthen individual tube lengths, while keeping the sampling frequency constant. In this project, various extensions to the Kelly-Lochbaum model were investigated, with the goal of obtaining more realistic speech synthesis.

This work was conducted in the Speech Acoustics Laboratory (Director: Prof. Brad H. Story) in the Dept. of Speech and Hearing Science.