2017-2018 Talks

Talks.Talks1718 History

Hide minor edits - Show changes to markup

October 26, 2018 by 134.226.84.64 -
Deleted line 14:


October 26, 2018 by 134.226.84.64 -
Deleted line 14:
October 26, 2018 by 134.226.84.64 -
Added line 25:

October 26, 2018 by 134.226.84.64 -
Added lines 15-23:


Speaker Dr. Ravi Kumar, TCD
Title Control by Example using a quadcopter as a guinea pig
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 17-Oct-18


Abstract: This project explores the concept of “Control by example”. We use machine learning and control theory to control a drone to hover at a specific height. The idea is for this work to lead eventually to an interesting teaching platform for both ML and Control Engineering. In this talk, we will introduce a “proof of concept” for such a design where a quadcopter is controlled using PID controllers and Machine learning techniques. Two different ML frameworks are explored for replacing the PID controller NNs and Gaussian Processes (GPs). Results show that the generation of the control signal is definitely more of a regression problem amenable to solution using GPs.
August 19, 2018 by 134.226.84.64 -
Added lines 9-11:



August 19, 2018 by 134.226.84.64 -
Changed line 7 from:

TBP

to:

To Be Provided...

August 19, 2018 by 134.226.84.64 -
Added line 7:

TBP

August 19, 2018 by 134.226.84.64 -
Changed lines 4-7 from:
to:

Upcoming Speakers



August 19, 2018 by 134.226.84.64 -
Changed line 8 from:
to:


August 19, 2018 by 134.226.84.64 -
Added line 8:
August 19, 2018 by 134.226.84.64 -
Changed lines 4-9 from:

Upcoming Talks



to:
August 19, 2018 by 134.226.84.64 -
Changed line 7 from:
to:

August 19, 2018 by 134.226.84.64 -
Deleted line 11:


August 19, 2018 by 134.226.84.64 -
Changed lines 10-11 from:
to:

August 19, 2018 by 134.226.84.64 -
Added lines 2-5:
Changed line 7 from:


to:
August 19, 2018 by 134.226.84.64 -
Added line 1:
August 19, 2018 by 134.226.84.64 -
Changed line 90 from:
Abstract: In statistical-based speech enhancement algorithms, the challenge is to find a non-linear estimator using the set of DFT coefficients of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. This talk discusses a different approach to estimate the a priori SNR using the neural responses simulated by a computational model of the auditory-nerve (AN) system. The outputs of the AN model corresponding to a range of characteristic frequencies construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. Support Vector Regression is used to predict the a priori SNR of unseen frames. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.
to:
Abstract: In statistical-based speech enhancement algorithms, the challenge is to find a non-linear estimator using the set of DFT coefficients of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. This talk discusses a new approach to estimate the a priori SNR using the neural responses simulated by a computational model of the auditory-nerve (AN) system. The outputs of the AN model corresponding to a range of characteristic frequencies construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. Support Vector Regression is used to predict the a priori SNR of unseen frames. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.
August 19, 2018 by 134.226.84.64 -
Changed lines 90-91 from:
Abstract: In statistical-based speech enhancement algorithms, the challenge is to find a non-linear estimator using the set of DFT coefficients

of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. This talk discusses a different approach to estimate the a priori SNR using the neural responses simulated by a computational model of the auditory-nerve (AN) system. The outputs of the AN model corresponding to a range of characteristic frequencies construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. Support Vector Regression is used to predict the a priori SNR of unseen frames. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.

to:
Abstract: In statistical-based speech enhancement algorithms, the challenge is to find a non-linear estimator using the set of DFT coefficients of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. This talk discusses a different approach to estimate the a priori SNR using the neural responses simulated by a computational model of the auditory-nerve (AN) system. The outputs of the AN model corresponding to a range of characteristic frequencies construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. Support Vector Regression is used to predict the a priori SNR of unseen frames. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.
August 19, 2018 by 134.226.84.64 -
Changed lines 91-94 from:

of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. Motivated by the fact that neural responses exhibit strong robustness to noise, this study proposes a different approach to estimate the a priori SNR using neural responses simulated by a computational model of the auditory-nerve (AN) system. The input to the model is a speech stimuli and the output is the time-varying spike counts for AN fibers tuned to different characteristic frequencies (CFs) as a function of time. The outputs of the AN model corresponding to a range of CF values construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. The Radon features are combined with the corresponding speech features of each frame (magnitude of noisy speech, noise power spectral density, and the a posteriori SNR). Support Vector Regression training based on this feature set is used to predict the a priori SNR of unseen frames. The performance of the proposed method was tested using the NOIZEUS database. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.

to:

of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. This talk discusses a different approach to estimate the a priori SNR using the neural responses simulated by a computational model of the auditory-nerve (AN) system. The outputs of the AN model corresponding to a range of characteristic frequencies construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. Support Vector Regression is used to predict the a priori SNR of unseen frames. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.

August 19, 2018 by 134.226.84.64 -
Added lines 83-95:



Speaker Dr. Wissam Jassim, TCD
Title Speech enhancement using neural responses
Time & Venue Stack-B Trinity College - 10:30-11:30am, 27-Jan-17


Abstract: In statistical-based speech enhancement algorithms, the challenge is to find a non-linear estimator using the set of DFT coefficients

of noisy speech to estimate the corresponding unknown clean speech coefficients. Most of these algorithms require the a priori SNR to be estimated. Motivated by the fact that neural responses exhibit strong robustness to noise, this study proposes a different approach to estimate the a priori SNR using neural responses simulated by a computational model of the auditory-nerve (AN) system. The input to the model is a speech stimuli and the output is the time-varying spike counts for AN fibers tuned to different characteristic frequencies (CFs) as a function of time. The outputs of the AN model corresponding to a range of CF values construct the neurogram which is a 2D representation (time-frequency). Features are extracted from the resultant 2D neurogram using the Radon transform. The Radon features are combined with the corresponding speech features of each frame (magnitude of noisy speech, noise power spectral density, and the a posteriori SNR). Support Vector Regression training based on this feature set is used to predict the a priori SNR of unseen frames. The performance of the proposed method was tested using the NOIZEUS database. A range of speech quality and intelligibility measures were used for performance evaluation. The results demonstrate that the estimate of the a priori SNR using neural features can improve the output of a range of speech enhancement algorithms, including MMSE, Wiener filtering and perceptually-motivated approaches.

August 19, 2018 by 134.226.84.64 -
Added lines 74-75:
Changed lines 77-82 from:
to:

Speaker Dr. Sebastien Le Maguer, Saarland University in the Multimodal Speech Processing group of MMCI
Title About the future directions of MaryTTS
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 04-Oct-17


Abstract: MaryTTS is a modular Text-To-Speech (TTS) synthesis system whose development started around 2003. The system is open-source and has grown significantly thanks to the contribution of the community. Lately, we have completely redesigned the core of the system in order to make it more flexible. During this talk, I am going to present the new architecture, the motivations and the problems we addressed by using it. While so, we will also cover some challenges that are currently addressed in the TTS field and the ones which should be addressed.
August 19, 2018 by 134.226.84.64 -
Added line 66:
Added lines 68-75:

Speaker Dr. Andew Hines, UCD
Title TPredicting People’s Opinions Sounds Easy
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 18-Oct-17


Abstract: Quality of Experience (QoE), or degree of delight or annoyance experiences by the user of an application or service, involves a synthesis of content, context, human and service factors. Predicting how a user perceives speech and audio quality has become more important with the transition from traditional fixed telephony to Voice over Internet Protocol (VoIP)-based systems. Similarly, media streaming is now an established method for listening to music and watching movie and TV content. Network bandwidth constraints are variable across the diverse range of devices on which content is consumed (e.g., mobile, desktop, home theatre). As a result, content distributors, such as YouTube or SoundCloud must support a range of bit rates and codecs, beyond the once ubiquitous mp3, to optimise consumers’ QoE. This talk will look at models developed to predict QoE for users of systems from ranging from hearing aids to VoIP, to Netflix and VR 3D spatial sound.


August 19, 2018 by 134.226.84.64 -
Changed lines 31-33 from:
Speaker Dr. Ilaria Torre, TCD
Title Trust in artificial voices: A “congruency effect” of first impressions and behavioural experience
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 27-March-18
to:
Speaker Antonia Crescenci, Imperial College
Title Image Editing With Deep Generative Models
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 13-Feb-18
Changed lines 35-38 from:
Abstract: Societies rely on trustworthy communication in order to function, and the need for trust clearly extends to human-machine communication. Therefore, it is essential to design machines to elicit trust, so as to make interactions with them acceptable and successful. However, while there is a substantial literature on first impressions of trustworthiness based on various characteristics, including voice, not much is known about the trust development process. Are first

impressions maintained over time? Or are they influenced by the experience of an agent’s behaviour? We addressed these questions in three experiments using the “iterated investment game”, a methodology derived from game theory that allows implicit measures of trust to be collected over time. Participants played the game with various agents having different voices: in the first experiment, participants played with a computer agent that had either a Standard Southern British English accent or a Liverpool accent; in the second experiment, they played with a computer agent that had either an SSBE or a Birmingham accent; in the third experiment, they played with a robot that had either a natural or a synthetic voice. All these agents behaved either trustworthily or untrustworthily. In all three experiments, participants trusted the agent with one voice more when it was trustworthy, and the agent with the other voice more when it was untrustworthy. This suggests that participants might change their trusting behaviour based on the congruency of the agent’s behaviour with the participant’s first impression. Implications for human-machine interaction design are discussed.

to:
Abstract: “Deep learning”, involves modelling and explaining observations using highly complex functions with many, learnable parameters. Recently, there has been significant progress in using deep models for synthesising new data, in particular images, these are referred to as “generative models”. Typically, these take on the form of latent variable models, whereby new images may be synthesised by drawing latent samples and passing them through a learned generative model. During training, generative models learn to organise the space of latent variables in an interesting (near-linear) way, allowing interesting image operations to be performed in latent space. For example, traversing a line in latent space may correspond to the addition or removal of an attribute in image space. This enables interesting object design and image manipulation applications.

When editing attributes of an image (in this talk human faces), we often want to edit a single attribute (e.g. smiling) of that image. By training a “generative model” with both images (e.g. faces) and additional attribute information (e.g. smiling/not smiling), we are able to weakly control how the latent space is organised, allowing us to then edit attributes of an image by changing a single (latent) variable in the model. Unfortunately, naive implementations may result in undesirable changes to other attributes in the image. We introduce a novel approach to “factor” the latent space such that by changing a single latent variable, the identity of the person is preserved, and only the desired attribute is changed.

Deleted lines 39-40:


Changed lines 41-43 from:
Speaker Antonia Crescenci, Imperial College
Title Image Editing With Deep Generative Models
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 13-Feb-18
to:
Speaker Dr. Joao Cabral, TCD
Title Acoustic analysis of the voice source and its applications to speech synthesis
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Jan-18
Changed lines 45-48 from:
Abstract: “Deep learning”, involves modelling and explaining observations using highly complex functions with many, learnable parameters. Recently, there has been significant progress in using deep models for synthesising new data, in particular images, these are referred to as “generative models”. Typically, these take on the form of latent variable models, whereby new images may be synthesised by drawing latent samples and passing them through a learned generative model. During training, generative models learn to organise the space of latent variables in an interesting (near-linear) way, allowing interesting image operations to be performed in latent space. For example, traversing a line in latent space may correspond to the addition or removal of an attribute in image space. This enables interesting object design and image manipulation applications.

When editing attributes of an image (in this talk human faces), we often want to edit a single attribute (e.g. smiling) of that image. By training a “generative model” with both images (e.g. faces) and additional attribute information (e.g. smiling/not smiling), we are able to weakly control how the latent space is organised, allowing us to then edit attributes of an image by changing a single (latent) variable in the model. Unfortunately, naive implementations may result in undesirable changes to other attributes in the image. We introduce a novel approach to “factor” the latent space such that by changing a single latent variable, the identity of the person is preserved, and only the desired attribute is changed.

to:
Abstract: Accurate estimation of the voice source and vocal tract from the speech signal is a complex and difficult task. Alternatively, robust and simple speech analysis methods that do not perform good source-tract separation are very popular because they are more robust and provide good performance for several applications, including the domains of automatic speech recognition (ASR), Text-To-Speech Synthesis (TTS), speaker identification, affect computing and speech coding. Although glottal source modelling has been successfully applied in these domains, its application to real technology systems is small. A better method to separate the voice source from the vocal tract is needed to open a new road for future improvements.

In this talk, I will give an overview of previous work on acoustic glottal source estimation, focusing on the applications to speech synthesis and voice transformation. I will also present my current research interest on its evaluation. The resulting experimental outcomes are expected to give insights on how to improve the voice source analysis.

Changed lines 50-52 from:
Speaker Dr. Joao Cabral, TCD
Title Acoustic analysis of the voice source and its applications to speech synthesis
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Jan-18
to:
Speaker Dr. Ali Karaali, TCD
Title Edge-Based Defocus Blur Estimation with Adaptive Scale Selection
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 22-Nov-1
Changed lines 54-56 from:
Abstract: Accurate estimation of the voice source and vocal tract from the speech signal is a complex and difficult task. Alternatively, robust and simple speech analysis methods that do not perform good source-tract separation are very popular because they are more robust and provide good performance for several applications, including the domains of automatic speech recognition (ASR), Text-To-Speech Synthesis (TTS), speaker identification, affect computing and speech coding. Although glottal source modelling has been successfully applied in these domains, its application to real technology systems is small. A better method to separate the voice source from the vocal tract is needed to open a new road for future improvements.

In this talk, I will give an overview of previous work on acoustic glottal source estimation, focusing on the applications to speech synthesis and voice transformation. I will also present my current research interest on its evaluation. The resulting experimental outcomes are expected to give insights on how to improve the voice source analysis.

to:
Abstract: Camera systems have limited depth of field (DOF) which may cause defocused blur regions in the captured images. In this talk, a new edge-based method for spatially varying defocus blur estimation will be presented. The proposed approach is based on locally reblurred gradient magnitudes ratios. The core of the proposed approach is to estimate a scale-consistent edge map along with a local scale parameter that indicates how isolated each detected edge is. The local scale is used to adaptively select a reblurring parameter accounting for noise, edge mis-localization and interfering edges. After the initial defocus blur estimation at the detected scale-consistent edge points, a novel Connected Edge Filter (CEF) is then introduced to smooth the initial blur estimates based on pixel connectivity within the detected edge contours. Finally, a fast guided filter is used to propagate the sparse blur map through the whole image. Experimental results show that the proposed approach presents a very good compromise between estimation error and running time when compared to state-of-the-art methods. The proposed blur estimation method is also explored in the context of image deblurring, and it is shown that metrics typically used to evaluate blur estimation may not correlate as expected with the visual quality of the deblurred image.
Changed lines 58-60 from:
Speaker Dr. Ali Karaali, TCD
Title Edge-Based Defocus Blur Estimation with Adaptive Scale Selection
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 22-Nov-18
to:
Speaker Dr. Ilaria Torre, TCD
Title Trust in artificial voices: A “congruency effect” of first impressions and behavioural experience
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 8-Nov-17
Changed lines 62-65 from:
Abstract: Camera systems have limited depth of field (DOF) which may cause defocused blur regions in the captured images. In this talk, a new edge-based method for spatially varying defocus blur estimation will be presented. The proposed approach is based on locally reblurred gradient magnitudes ratios. The core of the proposed approach is to estimate a scale-consistent edge map along with a local scale parameter that indicates how isolated each detected edge is. The local scale is used to adaptively select a reblurring parameter accounting for noise, edge mis-localization and interfering edges. After the initial defocus blur estimation at the detected scale-consistent edge points, a novel Connected Edge Filter (CEF) is then introduced to smooth the initial blur estimates based on pixel connectivity within the detected edge contours. Finally, a fast guided filter is used to propagate the sparse blur map through the whole image. Experimental results show that the proposed approach presents a very good compromise between estimation error and running time when compared to state-of-the-art methods. The proposed blur estimation method is also explored in the context of image deblurring, and it is shown that metrics typically used to evaluate blur estimation may not correlate as expected with the visual quality of the deblurred image.
to:
Abstract: Societies rely on trustworthy communication in order to function, and the need for trust clearly extends to human-machine communication. Therefore, it is essential to design machines to elicit trust, so as to make interactions with them acceptable and successful. However, while there is a substantial literature on first impressions of trustworthiness based on various characteristics, including voice, not much is known about the trust development process. Are first

impressions maintained over time? Or are they influenced by the experience of an agent’s behaviour? We addressed these questions in three experiments using the “iterated investment game”, a methodology derived from game theory that allows implicit measures of trust to be collected over time. Participants played the game with various agents having different voices: in the first experiment, participants played with a computer agent that had either a Standard Southern British English accent or a Liverpool accent; in the second experiment, they played with a computer agent that had either an SSBE or a Birmingham accent; in the third experiment, they played with a robot that had either a natural or a synthetic voice. All these agents behaved either trustworthily or untrustworthily. In all three experiments, participants trusted the agent with one voice more when it was trustworthy, and the agent with the other voice more when it was untrustworthy. This suggests that participants might change their trusting behaviour based on the congruency of the agent’s behaviour with the participant’s first impression. Implications for human-machine interaction design are discussed.

Deleted line 66:
August 19, 2018 by 134.226.84.64 -
Added line 59:
Added lines 61-69:

Speaker Dr. Ali Karaali, TCD
Title Edge-Based Defocus Blur Estimation with Adaptive Scale Selection
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 22-Nov-18


Abstract: Camera systems have limited depth of field (DOF) which may cause defocused blur regions in the captured images. In this talk, a new edge-based method for spatially varying defocus blur estimation will be presented. The proposed approach is based on locally reblurred gradient magnitudes ratios. The core of the proposed approach is to estimate a scale-consistent edge map along with a local scale parameter that indicates how isolated each detected edge is. The local scale is used to adaptively select a reblurring parameter accounting for noise, edge mis-localization and interfering edges. After the initial defocus blur estimation at the detected scale-consistent edge points, a novel Connected Edge Filter (CEF) is then introduced to smooth the initial blur estimates based on pixel connectivity within the detected edge contours. Finally, a fast guided filter is used to propagate the sparse blur map through the whole image. Experimental results show that the proposed approach presents a very good compromise between estimation error and running time when compared to state-of-the-art methods. The proposed blur estimation method is also explored in the context of image deblurring, and it is shown that metrics typically used to evaluate blur estimation may not correlate as expected with the visual quality of the deblurred image.


August 19, 2018 by 134.226.84.64 -
Changed lines 44-45 from:
Title Image Editing With Deep Generative Models


to:
Title Image Editing With Deep Generative Models
August 19, 2018 by 134.226.84.64 -
Deleted lines 44-45:
 
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 13-Feb-18
Added lines 46-47:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 13-Feb-18


Deleted lines 51-54:


August 19, 2018 by 134.226.84.64 -
Added lines 40-53:



Speaker Antonia Crescenci, Imperial College
Title Image Editing With Deep Generative Models
 
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 13-Feb-18


Abstract: “Deep learning”, involves modelling and explaining observations using highly complex functions with many, learnable parameters. Recently, there has been significant progress in using deep models for synthesising new data, in particular images, these are referred to as “generative models”. Typically, these take on the form of latent variable models, whereby new images may be synthesised by drawing latent samples and passing them through a learned generative model. During training, generative models learn to organise the space of latent variables in an interesting (near-linear) way, allowing interesting image operations to be performed in latent space. For example, traversing a line in latent space may correspond to the addition or removal of an attribute in image space. This enables interesting object design and image manipulation applications.

When editing attributes of an image (in this talk human faces), we often want to edit a single attribute (e.g. smiling) of that image. By training a “generative model” with both images (e.g. faces) and additional attribute information (e.g. smiling/not smiling), we are able to weakly control how the latent space is organised, allowing us to then edit attributes of an image by changing a single (latent) variable in the model. Unfortunately, naive implementations may result in undesirable changes to other attributes in the image. We introduce a novel approach to “factor” the latent space such that by changing a single latent variable, the identity of the person is preserved, and only the desired attribute is changed.


August 19, 2018 by 134.226.84.64 -
Deleted line 27:


August 19, 2018 by 134.226.84.64 -
Added lines 29-42:



Speaker Dr. Ilaria Torre, TCD
Title Trust in artificial voices: A “congruency effect” of first impressions and behavioural experience
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 27-March-18


Abstract: Societies rely on trustworthy communication in order to function, and the need for trust clearly extends to human-machine communication. Therefore, it is essential to design machines to elicit trust, so as to make interactions with them acceptable and successful. However, while there is a substantial literature on first impressions of trustworthiness based on various characteristics, including voice, not much is known about the trust development process. Are first

impressions maintained over time? Or are they influenced by the experience of an agent’s behaviour? We addressed these questions in three experiments using the “iterated investment game”, a methodology derived from game theory that allows implicit measures of trust to be collected over time. Participants played the game with various agents having different voices: in the first experiment, participants played with a computer agent that had either a Standard Southern British English accent or a Liverpool accent; in the second experiment, they played with a computer agent that had either an SSBE or a Birmingham accent; in the third experiment, they played with a robot that had either a natural or a synthetic voice. All these agents behaved either trustworthily or untrustworthily. In all three experiments, participants trusted the agent with one voice more when it was trustworthy, and the agent with the other voice more when it was untrustworthy. This suggests that participants might change their trusting behaviour based on the congruency of the agent’s behaviour with the participant’s first impression. Implications for human-machine interaction design are discussed.

August 19, 2018 by 134.226.84.64 -
Changed line 34 from:
to:


August 19, 2018 by 134.226.84.64 -
Changed lines 15-16 from:
Abstract

Progressive loss of cognitive function in Alzheimer’s disease (AD) includes deficits in episodic memory, executive skills, working memory, attention, speech and language abilities, visuospatial and orientation skills. Many studies have shown that cognitive deficits in AD may be apparent years before diagnosis is made. Detecting these changes at an early stage is therefore critical, particularly for the development and establishment of early interventions and to make effective treatment decisions.

to:
Abstract: Progressive loss of cognitive function in Alzheimer’s disease (AD) includes deficits in episodic memory, executive skills, working memory, attention, speech and language abilities, visuospatial and orientation skills. Many studies have shown that cognitive deficits in AD may be apparent years before diagnosis is made. Detecting these changes at an early stage is therefore critical, particularly for the development and establishment of early interventions and to make effective treatment decisions.
Changed lines 27-28 from:
Abstract

Automatic speaker recognition is increasingly being deployed in real-world scenarios, ranging from forensic investigation to online authentication. This talk will highlight some of the challenges created by unseen and diverse conditions in different applications of speaker recognition, and present some of the ways in which these challenges can be dealt with in practice. This talk will also highlight the use of automatic speaker profiling to estimate speaker characteristics such as age, gender, spoken language, and vocal effort, and how these estimates can be used to inform speaker recognition system behaviour.

to:
Abstract: Automatic speaker recognition is increasingly being deployed in real-world scenarios, ranging from forensic investigation to online authentication. This talk will highlight some of the challenges created by unseen and diverse conditions in different applications of speaker recognition, and present some of the ways in which these challenges can be dealt with in practice. This talk will also highlight the use of automatic speaker profiling to estimate speaker characteristics such as age, gender, spoken language, and vocal effort, and how these estimates can be used to inform speaker recognition system behaviour.
Changed line 33 from:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Jan-18
to:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Jan-18
August 19, 2018 by 134.226.84.64 -
Deleted line 27:
August 19, 2018 by 134.226.84.64 -
Changed lines 38-39 from:
Abstract

Accurate estimation of the voice source and vocal tract from the speech signal is a complex and difficult task. Alternatively, robust and simple speech analysis methods that do not perform good source-tract separation are very popular because they are more robust and provide good performance for several applications, including the domains of automatic speech recognition (ASR), Text-To-Speech Synthesis (TTS), speaker identification, affect computing and speech coding. Although glottal source modelling has been successfully applied in these domains, its application to real technology systems is small. A better method to separate the voice source from the vocal tract is needed to open a new road for future improvements.

to:
Abstract: Accurate estimation of the voice source and vocal tract from the speech signal is a complex and difficult task. Alternatively, robust and simple speech analysis methods that do not perform good source-tract separation are very popular because they are more robust and provide good performance for several applications, including the domains of automatic speech recognition (ASR), Text-To-Speech Synthesis (TTS), speaker identification, affect computing and speech coding. Although glottal source modelling has been successfully applied in these domains, its application to real technology systems is small. A better method to separate the voice source from the vocal tract is needed to open a new road for future improvements.
August 19, 2018 by 134.226.84.64 -
Deleted line 36:


August 19, 2018 by 134.226.84.64 -
Changed line 23 from:
to:

Added line 33:

August 19, 2018 by 134.226.84.64 -
Changed line 9 from:

(----)

to:

August 19, 2018 by 134.226.84.64 -
Changed line 9 from:
to:

(----)

August 19, 2018 by 134.226.84.64 -
Changed line 10 from:
Speaker Dr. Celine De Looze
to:
Speaker Dr. Celine De Looze, TCD
Changed line 24 from:
Speaker Dr. Finnian Kelly
to:
Speaker Dr. Finnian Kelly, Oxford Wave Research (OWR)
August 19, 2018 by 134.226.84.64 -
Changed line 32 from:
to:


August 19, 2018 by 134.226.84.64 -
Added lines 33-41:
Speaker Dr. Joao Cabral, TCD
Title Acoustic analysis of the voice source and its applications to speech synthesis
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 16-Jan-18


Abstract

Accurate estimation of the voice source and vocal tract from the speech signal is a complex and difficult task. Alternatively, robust and simple speech analysis methods that do not perform good source-tract separation are very popular because they are more robust and provide good performance for several applications, including the domains of automatic speech recognition (ASR), Text-To-Speech Synthesis (TTS), speaker identification, affect computing and speech coding. Although glottal source modelling has been successfully applied in these domains, its application to real technology systems is small. A better method to separate the voice source from the vocal tract is needed to open a new road for future improvements. In this talk, I will give an overview of previous work on acoustic glottal source estimation, focusing on the applications to speech synthesis and voice transformation. I will also present my current research interest on its evaluation. The resulting experimental outcomes are expected to give insights on how to improve the voice source analysis.

August 19, 2018 by 134.226.84.64 -
Changed line 26 from:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 29-MAy-18
to:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 29-May-18
August 19, 2018 by 134.226.84.64 -
Changed lines 22-23 from:
 
to:


August 19, 2018 by 134.226.84.64 -
Deleted line 6:
Deleted line 21:


August 19, 2018 by 134.226.84.64 -
Added lines 24-25:
 
August 19, 2018 by 134.226.84.64 -
Changed lines 23-30 from:
Bio

Dr. Céline De Looze is a phonetician by training, with expertise in speech prosody and cognitive communication. Her scientific approach combines knowledge from clinical linguistics, phonetics, neuropsychology and neurology with signal processing methods from clinical neural engineering and human factors engineering. Her work aims to develop quantitative methods to better understand the impact of cognitive functioning on communication in health care and high-workload environments.

Her research essentially focuses on investigating cognitive communication, that is how cognitive functioning and cognitive load and underlying neural mechanisms may impact an individual’s ability to communicate and how, in turn, resulting speech difficulties may have an effect on that individual’s well-being, ability to engage with others, or to perform a task efficiently. She gained expertise in cognitive communication and in developing new advanced quantitative methods and tools throughout her PhD and postdoctoral experience in Linguistics, Speech Communication and Speech Pathology. These gave her the foundations of her research work within the frame of neurodegenerative diseases (Alzheimer’s, Parkinson’s) , immuno-mediated disorders (Multiple Sclerosis), foundations which also allow her to implement the methods and tools she developed in other contexts: radiology, respiratory diseases and crew resource management.

Céline was awarded a CARDI fellowship in 2015. Her current research specifically focuses on investigating the relationship between structural changes, functional connectivity, cognitive function and speech in Mild Cognitive Impairment and Alzheimer’s Disease and in identifying speech-based markers of cognitive function and underlying neural changes in normal ageing and dementia. Another aspect of her research consists in providing recommendations of effective evidenced-based communication practices that can enhance patients’ and caregivers’ well-being.

to:


Speaker Dr. Finnian Kelly
Title Challenges for real-world automatic speaker recognition
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 29-MAy-18


Abstract

Automatic speaker recognition is increasingly being deployed in real-world scenarios, ranging from forensic investigation to online authentication. This talk will highlight some of the challenges created by unseen and diverse conditions in different applications of speaker recognition, and present some of the ways in which these challenges can be dealt with in practice. This talk will also highlight the use of automatic speaker profiling to estimate speaker characteristics such as age, gender, spoken language, and vocal effort, and how these estimates can be used to inform speaker recognition system behaviour.

August 19, 2018 by 134.226.84.64 -
Changed line 13 from:
Time & Venue Stack-B Trinity College - 2:00-3:00pm 24-Jul-18
to:
Time & Venue Stack-B Trinity College - 2:00-3:00pm, 24-Jul-18
August 19, 2018 by 134.226.84.64 -
Changed line 13 from:
Time & Venue Stack-B Trinity College - 2:00 24-Jul-18
to:
Time & Venue Stack-B Trinity College - 2:00-3:00pm 24-Jul-18
August 19, 2018 by 134.226.84.64 -
Changed lines 12-13 from:
Title Changes in Speech Chunking in Reading Aloud is a Marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer’s Disease
to:
Title Changes in Speech Chunking in Reading Aloud is a Marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer’s Disease
August 19, 2018 by 134.226.84.64 -
Added line 13:
August 19, 2018 by 134.226.84.64 -
Changed lines 13-14 from:


Time & Venue Printing House Hall - 12:00 10-Dec-14
to:
Time & Venue Stack-B Trinity College - 2:00 24-Jul-18
August 19, 2018 by 134.226.84.64 -
Changed line 13 from:
 
to:


August 19, 2018 by 134.226.84.64 -
Deleted line 0:
Changed lines 25-34 from:
Abstract
to:
Bio

Dr. Céline De Looze is a phonetician by training, with expertise in speech prosody and cognitive communication. Her scientific approach combines knowledge from clinical linguistics, phonetics, neuropsychology and neurology with signal processing methods from clinical neural engineering and human factors engineering. Her work aims to develop quantitative methods to better understand the impact of cognitive functioning on communication in health care and high-workload environments.

Her research essentially focuses on investigating cognitive communication, that is how cognitive functioning and cognitive load and underlying neural mechanisms may impact an individual’s ability to communicate and how, in turn, resulting speech difficulties may have an effect on that individual’s well-being, ability to engage with others, or to perform a task efficiently. She gained expertise in cognitive communication and in developing new advanced quantitative methods and tools throughout her PhD and postdoctoral experience in Linguistics, Speech Communication and Speech Pathology. These gave her the foundations of her research work within the frame of neurodegenerative diseases (Alzheimer’s, Parkinson’s) , immuno-mediated disorders (Multiple Sclerosis), foundations which also allow her to implement the methods and tools she developed in other contexts: radiology, respiratory diseases and crew resource management.

Céline was awarded a CARDI fellowship in 2015. Her current research specifically focuses on investigating the relationship between structural changes, functional connectivity, cognitive function and speech in Mild Cognitive Impairment and Alzheimer’s Disease and in identifying speech-based markers of cognitive function and underlying neural changes in normal ageing and dementia. Another aspect of her research consists in providing recommendations of effective evidenced-based communication practices that can enhance patients’ and caregivers’ well-being.


August 19, 2018 by 134.226.84.64 -
Deleted line 1:
Changed lines 12-13 from:
Speaker Colm O’Reilly
Title Birdsong Forensics
to:
Speaker Dr. Celine De Looze
Title Changes in Speech Chunking in Reading Aloud is a Marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer’s Disease
 
Added lines 18-26:
Abstract

Progressive loss of cognitive function in Alzheimer’s disease (AD) includes deficits in episodic memory, executive skills, working memory, attention, speech and language abilities, visuospatial and orientation skills. Many studies have shown that cognitive deficits in AD may be apparent years before diagnosis is made. Detecting these changes at an early stage is therefore critical, particularly for the development and establishment of early interventions and to make effective treatment decisions.

Speech and language impairments in Mild Cognitive Impairment (MCI ) and AD are generally attributed to lexico-semantic deficits. In this talk, I will discuss how the temporal organisation of speech (reflective of speech production planning) in reading aloud may be indicative of working memory and attention deficits, and underlying neural mechanisms, in MCI and AD. I will also address their discriminative ability for the detection of cognitive impairment. The findings and the methodology used (with regard to the designed speech tasks) will be discussed in relation to earlier work on speech planning in Multiple Sclerosis and Parkinson’s disease.

The development and implementation of connected speech-based technologies in clinical and community settings may provide additional information for the early detection of cognitive impairment in neurological diseases.

Abstract
August 19, 2018 by 134.226.84.64 -
Added lines 2-11:

Upcoming Talks



Talks


August 19, 2018 by 134.226.84.64 -
Added lines 1-7:
Speaker Colm O’Reilly
Title Birdsong Forensics
Time & Venue Printing House Hall - 12:00 10-Dec-14


Page last modified on October 26, 2018