| Abstract: |
Voice conversion or morphing is the classic academic problem of
transforming speech uttered by one speaker (the "Source Speaker") so that
it sounds as if were spoken by another (the "Target Speaker"). Approaches
to voice morphing typically conclude that pitch and formant transformation
are simply not enough; prosody, accent, speaking rate and other perceptual
and suprasegmental characteristics must also be taken into account.
In this talk, we present recent work in synthesizing a breathy quality in
speech, especially in the context of improving the quality of voice
morphing. We use sinusoidal modeling, along with classic techniques from
analog communication systems (variants of AM and PM), to de-voice regions
of the spectrum and synthesize breathiness in speech. We also discuss in
detail a new Java-based sinusoidal modelling re-synthesis framework
developed over the summer.
This work, performed in conjunction with Ellen Eide and Raimo Bakis, was done during an internship in the Text-to-Speech Synthesis
Group at the IBM TJ Watson Research Center in Yorktown Heights, NY.
|