2018-2019 Talks

Talks.Talks1819 History

Hide minor edits - Show changes to markup

October 11, 2019 by 134.226.84.68 -
Deleted lines 3-9:

Upcoming Speakers


To Be Provided...


January 23, 2019 by 134.226.84.64 -
Changed line 17 from:
Time & Venue Thomas Davis Lecture Theatre (Room 2043), Arts Block. TCD, Friday 18th January at 11.30am
to:
Time & Venue Thomas Davis Lecture Theatre (Room 2043), Arts Block. TCD, Friday 18th January, 2019 at 11.30am
January 23, 2019 by 134.226.84.64 -
Added lines 19-28:
Abstract: Almost every text-to-speech synthesiser contains three components. A front-end text processor normalises the input text and extracts useful features from it. An acoustic model performs regression from these features to an acoustic representation, such as a spectrogram. A waveform generator then creates the corresponding waveform.

In many commercially-deployed speech synthesisers, the waveform generator still constructs the output signal by concatenating pre-recorded fragments of natural speech. But very soon we expect that to be replaced by a neural vocoder that directly outputs a waveform. Neural approaches are already the dominant choice for acoustic modelling, starting with simple Deep Neural Networks guiding waveform concatenation, and progressing to sequence-to-sequence models driving a vocoder. Completely replacing the traditional front-end pipeline with an entirely neural approach is trickier, although there are some impressive so-called "end-to-end" systems.

In this rush to use end-to-end neural models to directly generate waveforms given raw text input, much of what we know about text and speech signal processing appears to have been cast aside. Maybe this is a good thing: the new methods are a long-overdue breath of fresh air. Or, perhaps there is still some value in the knowledge accumulated from 50+ years of speech processing. If there is, how do we decide what to keep and what to discard - for example, is source-filter modelling still a good idea?

January 23, 2019 by 134.226.84.64 -
Changed lines 15-17 from:
Speaker Dr. Ravi Kumar, TCD
Title Control by Example using a quadcopter as a guinea pig
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 17-Oct-18
to:
Speaker Prof. Simon King, University of Edinburgh
Title Does “end-to-end” speech synthesis mean we don’t need text processing or signal processing any more?
Time & Venue Thomas Davis Lecture Theatre (Room 2043), Arts Block. TCD, Friday 18th January at 11.30am
January 23, 2019 by 134.226.84.64 -
Added lines 1-18:

See other talks


Upcoming Speakers


To Be Provided...



Talks


Speaker Dr. Ravi Kumar, TCD
Title Control by Example using a quadcopter as a guinea pig
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 17-Oct-18


Page last modified on October 11, 2019