Sigmedia Talks Page

Upcoming Talks, 2009-2010 Talks?, 2011-2012 Talks?, 2012-2013 Talks

Past Speakers

Speaker Dan Ring
Title Research at The Foundry
Time & Venue Printing House Hall - 12:00 21-Feb-14

The Foundry is a visual effects (VFX) software company which constantly advances the state of the art in image manipulation. This is largely due to being active and responsive to the latest trends in research and technology. However, research in industry is obviously very different from academia. With personal anecdotes and examples, this talk will highlight those differences from both sides of the fence, and offer ideas on how to bring cutting edge academic insight to industry. The talk will also give an overview of the kinds of problems you hit in industry, how we solved them, and point to some of the hot research topics we're excited about.

Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 14-Feb-14

Digitisation of original document sources for the purpose of conservation, detailed study, and facilitating access for a wider audience has been an increasing trend over recent years, particularly with constantly improving imaging technology available at ever decreasing costs. Many documents suffer from a wide variety of degradations that reduce their legibility and usefulness as sources. With the increase in digitisation has also come an increase in image processing based enhancement and restoration techniques. This thesis presents new approaches to automatic restoration of one particular type of degradation - bleed-through, which occurs when ink from one side of a page seeps through and interferes with the text on the other side, reducing legibility - with the aim being to preserve the document appearance as far as possible.

Speaker Eoin Gillen
Title TCD-TIMIT: A New Database for Audio-Visual Speech Recognition
Time & Venue Printing House Hall - 12:00 7-Feb-14

Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of research. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This issue motivated the creation of TCD-TIMIT, a new corpus designed for continuous audio-visual speech recognition research. TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. This talk will give an overview of TCD-TIMIT's creation and also discuss the baseline results obtained on the database.

Speaker Ailbhe Cullen
Title Charisma in Political Speech
Time & Venue Printing House Hall - 12:00 31-Jan-14

In recent years, there has been much interest in the automatic detection and classification of a range of paralinguistic phenomena. Previous work has shown that it is possible to predict personality traits, inter-speaker dynamics, and even election results from the spectral and prosodic characteristics of the voice. In this talk we turn our attention to political speech, in an attempt to identify what makes a politician appeal to us. We present a new database of Irish political speech, which attempts to exploit the vast amounts of speech data freely available on the internet. The advantages and disadvantages of this method of data collection will be discussed along with the ongoing annotation process. Finally, some early stage results will be presented, demonstrating marked differences in speech from different situations (interviews, press releases, Dáil Éireann etc.).

Speaker Finnian Kelly
Title Automatic Recognition of Ageing Speakers
Time & Venue Printing House Hall - 12:00 24-Jan-14

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to 'vocal ageing', has received only marginal attention however. This presentation will address the effect of vocal ageing on automatic speaker recognition, from biometric and forensic perspectives, and describe novel methods to counteract its effect.

Speaker Nick Holliman
Title Stereoscopic 3D everywhere: computational solutions for 3D displays
Time & Venue Printing House Hall - 12:00 11-Dec-13

One reason for the lack of market penetration of 3D display systems is the difficulties found in producing high quality content. In this presentation I will summarise three strands of our research that tackle this challenge. Firstly research into algorithms for producing high quality 3D images, secondly a recent multi-site study of subjective film quality on 3DTV. Finally, looking to the future, I will review some of our most recent results on how the use of cross-modal stimuli that combine visual and auditory depth cues could improve users experience of 3D displays.

Speaker John Kane
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
Time & Venue Printing House Hall - 12:00 27-Nov-13

Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This talk introduces a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow for more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this talk, I will provide an overview of the current offerings of COVAREP and I will also include a demonstration of the algorithms through an emotion classification experiment.

Speaker Joao Cabral
Title Expressive Speech Synthesis for Human-Computer Interaction
Time & Venue Printing House Hall - 12:00 20-Nov-13

Speech is the one of the most important forms of communication between humans. Thus, it also plays an important role in human-computer interaction (HCI). In many applications of HCI, such as spoken dialogue systems, e-books, and computer games, the machine often needs to understand the spoken utterances and to synthesise speech which is intelligible, sounds sufficiently natural and conveys the appropriate expressiveness or affect.

Also, there has been an increasing interest from manufacturers to integrate the latest speech technology in portable electronic devices, such as PDAs and mobile phones. Statistical parametric speech synthesisers are very attractive for these applications because they are fully parametric, have small memory footprint and can be used to easily transform voice characteristics. However, its synthetic speech does not sound as natural as human speech, mainly due to limitations of the type of speech model typically used by these systems. This talk focus on improvements of this model for producing high-quality speech while permitting a better control over voice characteristics. In particular, these improvements are related to the voice source component, which represents the signal produced at the glottis during human speech production.

Speaker Claudia Arellano
Title L2 Inference for Shape Parameters Estimation
Time & Venue Printing House Hall - 12:00 13-Nov-13

In this thesis, we propose a method to robustly estimate the parameters that controls the mapping of a shape (model shape) onto another (target shape). The shapes of interest are contours in the 2D space, surfaces in the 3D space and point clouds (either in 2D and 3D spaces). We propose to model the shapes using Gaussian Mixture Models (GMMs) and estimate the transformation parameters by minimising a cost function based on the Euclidean (L2) distance between the target and model GMMs. This strategy allows us to avoid the need for the computation of one to one point correspondences that are required by state of the art approaches making them sensitive to both outliers and the choice of the starting guess in the algorithm used for optimisation. Shapes are well represented by GMMs when careful consideration is given to the design of the covariance matrices. Compared to isotropic covariance matrices, we show how shape matching with L2 can be made more robust and accurate by using well chosen non isotropic ones. Our framework offers a novel extension to L2 based cost functions by allowing prior information about the parameters to be included. Our approach is therefore fully Bayesian. This Bayesian-L2 framework is tested successfully for estimating the affiane transformation between data sets, for fi tting morphable models and fitting ellipses. Finally we show how to extend this framework to shapes de fined in higher dimensional feature spaces in addition to the spatial domain.

Speaker Ed Lalor
Title The Effects of Attention and Visual Input on the Representation of Natural Speech in EEG
Time & Venue Printing House Hall - 12:00 30-Oct-13

Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural speech in humans has been constrained by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech. This has led to studies using linear regression methods which allow the presentation of continuous speech. In this talk I will present the results of several studies that use such methods to examine how the representation of speech is affected by attention and by visual inputs. Specifically, I will present data showing that it is possible to “reconstruct” a speech stimulus from single-trial EEG and, by doing so, to decode how a subject is deploying attention in a naturalistic cocktail party scenario. I will also present results showing that the representation of the envelope of auditory speech in the cortex is earlier when accompanied by visual speech. Finally I will discuss some implications that these findings have for the design of future EEG studies into the ongoing dynamics of cognition and for research aimed at identifying biomarkers of clinical disorders.

Speaker Félix Raimbault
Title User-assisted Sparse Stereo-video Segmentation
Time & Venue Printing House Hall - 12:00 23-Oct-13

Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this talk we present a novel sparse segmentation algorithm for two-view stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.

Speaker Ken Sooknanan
Title Mosaics for Burrow detection in Underwater Surveillance Videos
Time & Venue Printing House Hall - 12:00 16-Oct-13

Harvesting the commercially significant lobster, Nephrops norvegicus, is a multimillion dollar industry in Europe. Stock assessment is essential for maintaining this activity but it is conducted by manually inspecting hours of underwater surveillance videos. To improve this tedious process, we propose the use of mosaics for the automated detection of burrows on the seabed. We present novel approaches for handling the difficult lighting conditions that cause poor video quality in this kind of video material. Mosaics are built using 1-10 minutes of footage and candidate burrows are selected using image segmentation based on local image contrast. A K-Nearest Neighbour classifier is then used to select burrows from these candidate regions. Our final decision accuracy at 93.6% recall and 86.6% precision shows a corresponding 18% and 14.2% improvement compared with previous work.

(Organised by the School of Linguistic, Speech and Communication Sciences, in conjunction with the Long Room Hub)

Speaker Professor James J. Mahshie
Title Children with Cochlear Implants: Perception and Production of Speech
Time & Venue Long Room Hub - 13:00 09-Oct-13


Dr. Mahshie is Professor and Chair of the Department of Speech and Hearing Science at George Washington University, Washington DC, as well as Professor Emeritus at Gallaudet University in Washington DC. His talk will focus on his research exploring the production, and perception of speech by young children with cochlear implants, possible mechanisms and factors relating perception and production, and preliminary findings on the voice quality characteristics of children with cochlear implants.

Page last modified on October 16, 2014