Speaker Dr. Juraj Šimko
Title Analysing prosodic characteristics using WaveNet embeddings
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Oct-19


A statistical speech recogniser or synthesiser is a complex statistical model of data used to train the system. In addition to its intended use, it can thus serve as a tool for analysing various phenomena manifested in the training corpus, such as distributions of phonemic or prosodic characteristics of the material.

The talk will present a novel methodology for prosody analysis based on a WaveNet speech synthesis system trained on a downsampled speech material containing only prosodic characteristics of interest. The phenomena of interest are investigated using embeddings trained as a conditioning part of the synthesiser.

The approach will be illustrated on (1) a corpus of Lombard speech material with varying type and level of ambient noise and the focus structure of the utterances, and (2) a corpus of Swedish typological variation.


Page last modified on October 11, 2019