Sigmedia Talks Page

Talks.List History

Show minor edits - Show changes to markup

October 11, 2019 by 134.226.84.68 -
Added lines 2-3:
January 23, 2019 by 134.226.84.64 -
Changed line 3 from:
to:
January 23, 2019 by 134.226.84.64 -
Added lines 3-4:
August 19, 2018 by 134.226.84.64 -
Deleted lines 12-14:

Upcoming Speakers


August 19, 2018 by 134.226.84.64 -
Changed lines 11-12 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed line 11 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed line 3 from:
to:
Changed line 5 from:
to:
Changed line 7 from:
to:
Changed line 9 from:
to:
August 19, 2018 by 134.226.84.64 -
Added line 6:

Added line 8:

Added line 10:

August 19, 2018 by 134.226.84.64 -
Changed line 2 from:
to:

Added line 4:

August 19, 2018 by 134.226.84.64 -
Changed lines 6-8 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed lines 3-5 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed lines 3-5 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed line 3 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed line 3 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed line 3 from:
to:
August 19, 2018 by 134.226.84.64 -
Changed lines 11-36 from:
 Semester 1 (Wednesdays at Noon)
DateSpeaker(s)
22-Oct-14Yun Feng Wang
29-Oct-14Ailbhe Cullen
10-Dec-13Colm O'Reilly


Speaker Ailbhe Cullen
Title Building a Database of Political Speech - Does culture matter in charisma annotations?
Time & Venue Printing House Hall - 12:00 29-Oct-14


For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.


Speaker Yun Feng Wang
Title Automated Registration of Low and High Resolution Atomic Force Microscopy Images Using Scale Invariant Features
Time & Venue Printing House Hall - 12:00 22-Oct-14


Our work introduces a method for registering scans acquired by Atomic Force Microscopy (AFM). Due to compromises between scan size, resolution, and scan rate, high resolution data is only attainable in a very limited field of view. Our proposed method uses a sparse set of feature matches between the low and high resolution AFM scans and maps them onto a common coordinate system. This can provide a wider field of view of the sample and give context to the regions where high resolution AFM data has been obtained. The algorithm employs a robust approach overcoming complications due to temporal sample changes and sample drift of the AFM system which becomes significant at higher-resolutions. To our knowledge, this is the first approach for automatic high resolution AFM image registration. Experimental results show the correctness and robustness of our approach and shows that the estimated transforms can be used to deduce plausible measures of sample drift.

to:
August 19, 2018 by 134.226.84.64 -
Deleted lines 18-25:


Speaker Colm O’Reilly
Title Birdsong Forensics
Time & Venue Printing House Hall - 12:00 10-Dec-14


Distinguishing the calls and songs of different bird populations is important to Ornithologists. Together with morphological and genetic information, these vocalisations can yield an increased understanding of population diversity. This talk investigates the application of signal processing and machine learning techniques to analyse bird populations. Dialect distance metrics are explored in an attempt to quantify the similarity of calls across bird populations. Experiments are conducted on island populations of Olive-backed Sunbirds and Black-naped Orioles from Indonesia. This talk is part of a PhD confirmation so will also address where this project is heading in the future.

December 05, 2014 by 134.226.86.134 -
Deleted line 13:
26-Nov-13Andrew Hines
Added line 16:
Changed lines 19-21 from:
Speaker Yun Feng Wang
Title Automated Registration of Low and High Resolution Atomic Force Microscopy Images Using Scale Invariant Features
Time & Venue Printing House Hall - 12:00 22-Oct-14
to:
Speaker Colm O’Reilly
Title Birdsong Forensics
Time & Venue Printing House Hall - 12:00 10-Dec-14
Changed lines 23-24 from:

Our work introduces a method for registering scans acquired by Atomic Force Microscopy (AFM). Due to compromises between scan size, resolution, and scan rate, high resolution data is only attainable in a very limited field of view. Our proposed method uses a sparse set of feature matches between the low and high resolution AFM scans and maps them onto a common coordinate system. This can provide a wider field of view of the sample and give context to the regions where high resolution AFM data has been obtained. The algorithm employs a robust approach overcoming complications due to temporal sample changes and sample drift of the AFM system which becomes significant at higher-resolutions. To our knowledge, this is the first approach for automatic high resolution AFM image registration. Experimental results show the correctness and robustness of our approach and shows that the estimated transforms can be used to deduce plausible measures of sample drift.

to:

Distinguishing the calls and songs of different bird populations is important to Ornithologists. Together with morphological and genetic information, these vocalisations can yield an increased understanding of population diversity. This talk investigates the application of signal processing and machine learning techniques to analyse bird populations. Dialect distance metrics are explored in an attempt to quantify the similarity of calls across bird populations. Experiments are conducted on island populations of Olive-backed Sunbirds and Black-naped Orioles from Indonesia. This talk is part of a PhD confirmation so will also address where this project is heading in the future.

Changed lines 34-42 from:


to:



Speaker Yun Feng Wang
Title Automated Registration of Low and High Resolution Atomic Force Microscopy Images Using Scale Invariant Features
Time & Venue Printing House Hall - 12:00 22-Oct-14


Our work introduces a method for registering scans acquired by Atomic Force Microscopy (AFM). Due to compromises between scan size, resolution, and scan rate, high resolution data is only attainable in a very limited field of view. Our proposed method uses a sparse set of feature matches between the low and high resolution AFM scans and maps them onto a common coordinate system. This can provide a wider field of view of the sample and give context to the regions where high resolution AFM data has been obtained. The algorithm employs a robust approach overcoming complications due to temporal sample changes and sample drift of the AFM system which becomes significant at higher-resolutions. To our knowledge, this is the first approach for automatic high resolution AFM image registration. Experimental results show the correctness and robustness of our approach and shows that the estimated transforms can be used to deduce plausible measures of sample drift.

November 11, 2014 by 134.226.86.134 -
Changed line 14 from:
12-Nov-13Andrew Hines
to:
26-Nov-13Andrew Hines
October 22, 2014 by 134.226.86.126 -
Changed line 15 from:
29-Nov-13Colm O'Reilly
to:
10-Dec-13Colm O'Reilly
October 22, 2014 by 134.226.86.126 -
Deleted line 16:
Changed lines 19-21 from:
Speaker Ailbhe Cullen
Title Building a Database of Political Speech - Does culture matter in charisma annotations?
Time & Venue Printing House Hall - 12:00 29-Oct-14
to:
Speaker Yun Feng Wang
Title Automated Registration of Low and High Resolution Atomic Force Microscopy Images Using Scale Invariant Features
Time & Venue Printing House Hall - 12:00 22-Oct-14
Changed lines 23-24 from:

For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.

to:

Our work introduces a method for registering scans acquired by Atomic Force Microscopy (AFM). Due to compromises between scan size, resolution, and scan rate, high resolution data is only attainable in a very limited field of view. Our proposed method uses a sparse set of feature matches between the low and high resolution AFM scans and maps them onto a common coordinate system. This can provide a wider field of view of the sample and give context to the regions where high resolution AFM data has been obtained. The algorithm employs a robust approach overcoming complications due to temporal sample changes and sample drift of the AFM system which becomes significant at higher-resolutions. To our knowledge, this is the first approach for automatic high resolution AFM image registration. Experimental results show the correctness and robustness of our approach and shows that the estimated transforms can be used to deduce plausible measures of sample drift.

Added lines 26-32:
Speaker Ailbhe Cullen
Title Building a Database of Political Speech - Does culture matter in charisma annotations?
Time & Venue Printing House Hall - 12:00 29-Oct-14


For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.

October 16, 2014 by 134.226.86.134 -
Changed lines 12-42 from:
9-Oct-13James J. Mahshie
16-Oct-13Ken Sooknanan
23-Oct-13Felix Raimbault
30-Oct-13Ed Lalor
06-Nov-13Reading week
13-Nov-13Claudia Arellano
20-Nov-13Joao Cabral
27-Nov-13John Kane
04-Dec-13No Talk
11-Dec-13Nick Holliman
 Semester 2 Talks (Fridays at Noon)
DateSpeaker(s)
24-Jan-14Finnian Kelly
31-Jan-14Ailbhe Cullen
7-Feb-14Eoin Gillen
14-Feb-14Róisín Rowley-Brooke
21-Feb-14Dan Ring
28-Feb-14Reading Week
7-Mar-14No Talk
14-Mar-14Gary Baugh (Postponed)
21-Mar-14Francois Pitie / Andrew Hines
28-Mar-14 
4-Apr-14 
11-Apr-14 
18-Apr-14 
25-Apr-14 
2-May-14No Talk
9-May-14 
to:
22-Oct-14Yun Feng Wang
29-Oct-14Ailbhe Cullen
12-Nov-13Andrew Hines
29-Nov-13Colm O'Reilly
Changed lines 20-22 from:
Speaker Dan Ring
Title Research at The Foundry
Time & Venue Printing House Hall - 12:00 21-Feb-14
to:
Speaker Ailbhe Cullen
Title Building a Database of Political Speech - Does culture matter in charisma annotations?
Time & Venue Printing House Hall - 12:00 29-Oct-14
Changed line 24 from:

The Foundry is a visual effects (VFX) software company which constantly advances the state of the art in image manipulation. This is largely due to being active and responsive to the latest trends in research and technology. However, research in industry is obviously very different from academia. With personal anecdotes and examples, this talk will highlight those differences from both sides of the fence, and offer ideas on how to bring cutting edge academic insight to industry. The talk will also give an overview of the kinds of problems you hit in industry, how we solved them, and point to some of the hot research topics we're excited about.

to:

For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.

Deleted lines 25-149:

Past Talks

Next Scheduled Talks

Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 14-Feb-14


Digitisation of original document sources for the purpose of conservation, detailed study, and facilitating access for a wider audience has been an increasing trend over recent years, particularly with constantly improving imaging technology available at ever decreasing costs. Many documents suffer from a wide variety of degradations that reduce their legibility and usefulness as sources. With the increase in digitisation has also come an increase in image processing based enhancement and restoration techniques. This thesis presents new approaches to automatic restoration of one particular type of degradation - bleed-through, which occurs when ink from one side of a page seeps through and interferes with the text on the other side, reducing legibility - with the aim being to preserve the document appearance as far as possible.


Speaker Eoin Gillen
Title TCD-TIMIT: A New Database for Audio-Visual Speech Recognition
Time & Venue Printing House Hall - 12:00 7-Feb-14


Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of research. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This issue motivated the creation of TCD-TIMIT, a new corpus designed for continuous audio-visual speech recognition research. TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. This talk will give an overview of TCD-TIMIT's creation and also discuss the baseline results obtained on the database.

Speaker Ailbhe Cullen
Title Charisma in Political Speech
Time & Venue Printing House Hall - 12:00 31-Jan-14


In recent years, there has been much interest in the automatic detection and classification of a range of paralinguistic phenomena. Previous work has shown that it is possible to predict personality traits, inter-speaker dynamics, and even election results from the spectral and prosodic characteristics of the voice. In this talk we turn our attention to political speech, in an attempt to identify what makes a politician appeal to us. We present a new database of Irish political speech, which attempts to exploit the vast amounts of speech data freely available on the internet. The advantages and disadvantages of this method of data collection will be discussed along with the ongoing annotation process. Finally, some early stage results will be presented, demonstrating marked differences in speech from different situations (interviews, press releases, Dáil Éireann etc.).

Speaker Finnian Kelly
Title Automatic Recognition of Ageing Speakers
Time & Venue Printing House Hall - 12:00 24-Jan-14


The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to 'vocal ageing', has received only marginal attention however. This presentation will address the effect of vocal ageing on automatic speaker recognition, from biometric and forensic perspectives, and describe novel methods to counteract its effect.


Speaker Nick Holliman
Title Stereoscopic 3D everywhere: computational solutions for 3D displays
Time & Venue Printing House Hall - 12:00 11-Dec-13


One reason for the lack of market penetration of 3D display systems is the difficulties found in producing high quality content. In this presentation I will summarise three strands of our research that tackle this challenge. Firstly research into algorithms for producing high quality 3D images, secondly a recent multi-site study of subjective film quality on 3DTV. Finally, looking to the future, I will review some of our most recent results on how the use of cross-modal stimuli that combine visual and auditory depth cues could improve users experience of 3D displays.

Speaker John Kane
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
Time & Venue Printing House Hall - 12:00 27-Nov-13


Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This talk introduces a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow for more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this talk, I will provide an overview of the current offerings of COVAREP and I will also include a demonstration of the algorithms through an emotion classification experiment.



Speaker Joao Cabral
Title Expressive Speech Synthesis for Human-Computer Interaction
Time & Venue Printing House Hall - 12:00 20-Nov-13


Speech is the one of the most important forms of communication between humans. Thus, it also plays an important role in human-computer interaction (HCI). In many applications of HCI, such as spoken dialogue systems, e-books, and computer games, the machine often needs to understand the spoken utterances and to synthesise speech which is intelligible, sounds sufficiently natural and conveys the appropriate expressiveness or affect.

Also, there has been an increasing interest from manufacturers to integrate the latest speech technology in portable electronic devices, such as PDAs and mobile phones. Statistical parametric speech synthesisers are very attractive for these applications because they are fully parametric, have small memory footprint and can be used to easily transform voice characteristics. However, its synthetic speech does not sound as natural as human speech, mainly due to limitations of the type of speech model typically used by these systems. This talk focus on improvements of this model for producing high-quality speech while permitting a better control over voice characteristics. In particular, these improvements are related to the voice source component, which represents the signal produced at the glottis during human speech production.




Speaker Claudia Arellano
Title L2 Inference for Shape Parameters Estimation
Time & Venue Printing House Hall - 12:00 13-Nov-13


In this thesis, we propose a method to robustly estimate the parameters that controls the mapping of a shape (model shape) onto another (target shape). The shapes of interest are contours in the 2D space, surfaces in the 3D space and point clouds (either in 2D and 3D spaces). We propose to model the shapes using Gaussian Mixture Models (GMMs) and estimate the transformation parameters by minimising a cost function based on the Euclidean (L2) distance between the target and model GMMs. This strategy allows us to avoid the need for the computation of one to one point correspondences that are required by state of the art approaches making them sensitive to both outliers and the choice of the starting guess in the algorithm used for optimisation. Shapes are well represented by GMMs when careful consideration is given to the design of the covariance matrices. Compared to isotropic covariance matrices, we show how shape matching with L2 can be made more robust and accurate by using well chosen non isotropic ones. Our framework offers a novel extension to L2 based cost functions by allowing prior information about the parameters to be included. Our approach is therefore fully Bayesian. This Bayesian-L2 framework is tested successfully for estimating the affiane transformation between data sets, for fi tting morphable models and fitting ellipses. Finally we show how to extend this framework to shapes de fined in higher dimensional feature spaces in addition to the spatial domain.



Speaker Ed Lalor
Title The Effects of Attention and Visual Input on the Representation of Natural Speech in EEG
Time & Venue Printing House Hall - 12:00 30-Oct-13


Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural speech in humans has been constrained by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech. This has led to studies using linear regression methods which allow the presentation of continuous speech. In this talk I will present the results of several studies that use such methods to examine how the representation of speech is affected by attention and by visual inputs. Specifically, I will present data showing that it is possible to “reconstruct” a speech stimulus from single-trial EEG and, by doing so, to decode how a subject is deploying attention in a naturalistic cocktail party scenario. I will also present results showing that the representation of the envelope of auditory speech in the cortex is earlier when accompanied by visual speech. Finally I will discuss some implications that these findings have for the design of future EEG studies into the ongoing dynamics of cognition and for research aimed at identifying biomarkers of clinical disorders.

Speaker Félix Raimbault
Title User-assisted Sparse Stereo-video Segmentation
Time & Venue Printing House Hall - 12:00 23-Oct-13


Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this talk we present a novel sparse segmentation algorithm for two-view stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.



Speaker Ken Sooknanan
Title Mosaics for Burrow detection in Underwater Surveillance Videos
Time & Venue Printing House Hall - 12:00 16-Oct-13


Harvesting the commercially significant lobster, Nephrops norvegicus, is a multimillion dollar industry in Europe. Stock assessment is essential for maintaining this activity but it is conducted by manually inspecting hours of underwater surveillance videos. To improve this tedious process, we propose the use of mosaics for the automated detection of burrows on the seabed. We present novel approaches for handling the difficult lighting conditions that cause poor video quality in this kind of video material. Mosaics are built using 1-10 minutes of footage and candidate burrows are selected using image segmentation based on local image contrast. A K-Nearest Neighbour classifier is then used to select burrows from these candidate regions. Our final decision accuracy at 93.6% recall and 86.6% precision shows a corresponding 18% and 14.2% improvement compared with previous work.

(Organised by the School of Linguistic, Speech and Communication Sciences, in conjunction with the Long Room Hub)

Speaker Professor James J. Mahshie
Title Children with Cochlear Implants: Perception and Production of Speech
Time & Venue Long Room Hub - 13:00 09-Oct-13


Abstract

Dr. Mahshie is Professor and Chair of the Department of Speech and Hearing Science at George Washington University, Washington DC, as well as Professor Emeritus at Gallaudet University in Washington DC. His talk will focus on his research exploring the production, and perception of speech by young children with cochlear implants, possible mechanisms and factors relating perception and production, and preliminary findings on the voice quality characteristics of children with cochlear implants.

March 13, 2014 by 86.200.254.199 -
Changed line 33 from:
14-Mar-14Gary Baugh
to:
14-Mar-14Gary Baugh (Postponed)
February 17, 2014 by 86.209.230.164 -
Changed lines 45-49 from:

Next Scheduled Talks

Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 14-Feb-14
to:
Speaker Dan Ring
Title Research at The Foundry
Time & Venue Printing House Hall - 12:00 21-Feb-14
Changed lines 49-50 from:

Digitisation of original document sources for the purpose of conservation, detailed study, and facilitating access for a wider audience has been an increasing trend over recent years, particularly with constantly improving imaging technology available at ever decreasing costs. Many documents suffer from a wide variety of degradations that reduce their legibility and usefulness as sources. With the increase in digitisation has also come an increase in image processing based enhancement and restoration techniques. This thesis presents new approaches to automatic restoration of one particular type of degradation - bleed-through, which occurs when ink from one side of a page seeps through and interferes with the text on the other side, reducing legibility - with the aim being to preserve the document appearance as far as possible.

to:

The Foundry is a visual effects (VFX) software company which constantly advances the state of the art in image manipulation. This is largely due to being active and responsive to the latest trends in research and technology. However, research in industry is obviously very different from academia. With personal anecdotes and examples, this talk will highlight those differences from both sides of the fence, and offer ideas on how to bring cutting edge academic insight to industry. The talk will also give an overview of the kinds of problems you hit in industry, how we solved them, and point to some of the hot research topics we're excited about.

Changed lines 52-59 from:
Speaker Dan Ring
Title Research at The Foundry
Time & Venue Printing House Hall - 12:00 21-Feb-14


The Foundry is a visual effects (VFX) software company which constantly advances the state of the art in image manipulation. This is largely due to being active and responsive to the latest trends in research and technology. However, research in industry is obviously very different from academia. With personal anecdotes and examples, this talk will highlight those differences from both sides of the fence, and offer ideas on how to bring cutting edge academic insight to industry. The talk will also give an overview of the kinds of problems you hit in industry, how we solved them, and point to some of the hot research topics we're excited about.

to:
Changed lines 55-63 from:
to:

Next Scheduled Talks

Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 14-Feb-14


Digitisation of original document sources for the purpose of conservation, detailed study, and facilitating access for a wider audience has been an increasing trend over recent years, particularly with constantly improving imaging technology available at ever decreasing costs. Many documents suffer from a wide variety of degradations that reduce their legibility and usefulness as sources. With the increase in digitisation has also come an increase in image processing based enhancement and restoration techniques. This thesis presents new approaches to automatic restoration of one particular type of degradation - bleed-through, which occurs when ink from one side of a page seeps through and interferes with the text on the other side, reducing legibility - with the aim being to preserve the document appearance as far as possible.


February 11, 2014 by 86.200.134.186 -
Changed lines 45-52 from:

Next Scheduled Talk

Speaker Eoin Gillen
Title TCD-TIMIT: A New Database for Audio-Visual Speech Recognition
Time & Venue Printing House Hall - 12:00 7-Feb-14


Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of research. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This issue motivated the creation of TCD-TIMIT, a new corpus designed for continuous audio-visual speech recognition research. TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. This talk will give an overview of TCD-TIMIT's creation and also discuss the baseline results obtained on the database.

to:

Next Scheduled Talks

Changed lines 65-71 from:

Next Scheduled Talk

to:
Speaker Eoin Gillen
Title TCD-TIMIT: A New Database for Audio-Visual Speech Recognition
Time & Venue Printing House Hall - 12:00 7-Feb-14


Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of research. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This issue motivated the creation of TCD-TIMIT, a new corpus designed for continuous audio-visual speech recognition research. TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. This talk will give an overview of TCD-TIMIT's creation and also discuss the baseline results obtained on the database.

February 01, 2014 by 86.40.19.71 -
Changed line 48 from:
Title
to:
Title TCD-TIMIT: A New Database for Audio-Visual Speech Recognition
Changed lines 51-52 from:


to:

Automatic audio-visual speech recognition currently lags behind its audio-only counterpart in terms of research. One of the reasons commonly cited by researchers is the scarcity of suitable research corpora. This issue motivated the creation of TCD-TIMIT, a new corpus designed for continuous audio-visual speech recognition research. TCD-TIMIT consists of high-quality audio and video footage of 62 speakers reading a total of 6913 sentences. Three of the speakers are professionally-trained lipspeakers, recorded to test the hypothesis that lipspeakers may have an advantage over regular speakers in automatic visual speech recognition systems. This talk will give an overview of TCD-TIMIT's creation and also discuss the baseline results obtained on the database.

January 31, 2014 by 134.226.85.173 -
Changed line 30 from:
21-Feb-14 
to:
21-Feb-14Dan Ring
Changed lines 33-34 from:
14-Mar-14 
21-Mar-14 
to:
14-Mar-14Gary Baugh
21-Mar-14Francois Pitie / Andrew Hines
Changed lines 47-49 from:
Speaker Ailbhe Cullen
Title Charisma in Political Speech
Time & Venue Printing House Hall - 12:00 31-Jan-14
to:
Speaker Eoin Gillen
Title
Time & Venue Printing House Hall - 12:00 7-Feb-14
Changed line 51 from:

In recent years, there has been much interest in the automatic detection and classification of a range of paralinguistic phenomena. Previous work has shown that it is possible to predict personality traits, inter-speaker dynamics, and even election results from the spectral and prosodic characteristics of the voice. In this talk we turn our attention to political speech, in an attempt to identify what makes a politician appeal to us. We present a new database of Irish political speech, which attempts to exploit the vast amounts of speech data freely available on the internet. The advantages and disadvantages of this method of data collection will be discussed along with the ongoing annotation process. Finally, some early stage results will be presented, demonstrating marked differences in speech from different situations (interviews, press releases, Dáil Éireann etc.).

to:
Added line 53:
Added lines 62-69:
Speaker Dan Ring
Title Research at The Foundry
Time & Venue Printing House Hall - 12:00 21-Feb-14


The Foundry is a visual effects (VFX) software company which constantly advances the state of the art in image manipulation. This is largely due to being active and responsive to the latest trends in research and technology. However, research in industry is obviously very different from academia. With personal anecdotes and examples, this talk will highlight those differences from both sides of the fence, and offer ideas on how to bring cutting edge academic insight to industry. The talk will also give an overview of the kinds of problems you hit in industry, how we solved them, and point to some of the hot research topics we're excited about.

Added lines 71-79:

Next Scheduled Talk

Speaker Ailbhe Cullen
Title Charisma in Political Speech
Time & Venue Printing House Hall - 12:00 31-Jan-14


In recent years, there has been much interest in the automatic detection and classification of a range of paralinguistic phenomena. Previous work has shown that it is possible to predict personality traits, inter-speaker dynamics, and even election results from the spectral and prosodic characteristics of the voice. In this talk we turn our attention to political speech, in an attempt to identify what makes a politician appeal to us. We present a new database of Irish political speech, which attempts to exploit the vast amounts of speech data freely available on the internet. The advantages and disadvantages of this method of data collection will be discussed along with the ongoing annotation process. Finally, some early stage results will be presented, demonstrating marked differences in speech from different situations (interviews, press releases, Dáil Éireann etc.).

January 29, 2014 by 134.226.85.173 -
Changed lines 47-49 from:
Speaker Finnian Kelly
Title Automatic Recognition of Ageing Speakers
Time & Venue Printing House Hall - 12:00 24-Jan-14
to:
Speaker Ailbhe Cullen
Title Charisma in Political Speech
Time & Venue Printing House Hall - 12:00 31-Jan-14
Changed lines 51-52 from:

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to 'vocal ageing', has received only marginal attention however. This presentation will address the effect of vocal ageing on automatic speaker recognition, from biometric and forensic perspectives, and describe novel methods to counteract its effect.

to:

In recent years, there has been much interest in the automatic detection and classification of a range of paralinguistic phenomena. Previous work has shown that it is possible to predict personality traits, inter-speaker dynamics, and even election results from the spectral and prosodic characteristics of the voice. In this talk we turn our attention to political speech, in an attempt to identify what makes a politician appeal to us. We present a new database of Irish political speech, which attempts to exploit the vast amounts of speech data freely available on the internet. The advantages and disadvantages of this method of data collection will be discussed along with the ongoing annotation process. Finally, some early stage results will be presented, demonstrating marked differences in speech from different situations (interviews, press releases, Dáil Éireann etc.).

Added lines 62-68:
Speaker Finnian Kelly
Title Automatic Recognition of Ageing Speakers
Time & Venue Printing House Hall - 12:00 24-Jan-14


The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to 'vocal ageing', has received only marginal attention however. This presentation will address the effect of vocal ageing on automatic speaker recognition, from biometric and forensic perspectives, and describe novel methods to counteract its effect.

January 21, 2014 by 134.226.86.191 -
Changed line 56 from:
Time & Venue Printing House Hall - TBD
to:
Time & Venue Printing House Hall - 12:00 14-Feb-14
January 21, 2014 by 134.226.86.191 -
Added lines 47-49:
Speaker Finnian Kelly
Title Automatic Recognition of Ageing Speakers
Time & Venue Printing House Hall - 12:00 24-Jan-14
Changed lines 51-53 from:
Speaker Nick Holliman
Title Stereoscopic 3D everywhere: computational solutions for 3D displays
Time & Venue Printing House Hall - 12:00 11-Dec-13
to:

The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to 'vocal ageing', has received only marginal attention however. This presentation will address the effect of vocal ageing on automatic speaker recognition, from biometric and forensic perspectives, and describe novel methods to counteract its effect.

Deleted lines 53-57:

One reason for the lack of market penetration of 3D display systems is the difficulties found in producing high quality content. In this presentation I will summarise three strands of our research that tackle this challenge. Firstly research into algorithms for producing high quality 3D images, secondly a recent multi-site study of subjective film quality on 3DTV. Finally, looking to the future, I will review some of our most recent results on how the use of cross-modal stimuli that combine visual and auditory depth cues could improve users experience of 3D displays.


Added lines 63-70:


Speaker Nick Holliman
Title Stereoscopic 3D everywhere: computational solutions for 3D displays
Time & Venue Printing House Hall - 12:00 11-Dec-13


One reason for the lack of market penetration of 3D display systems is the difficulties found in producing high quality content. In this presentation I will summarise three strands of our research that tackle this challenge. Firstly research into algorithms for producing high quality 3D images, secondly a recent multi-site study of subjective film quality on 3DTV. Finally, looking to the future, I will review some of our most recent results on how the use of cross-modal stimuli that combine visual and auditory depth cues could improve users experience of 3D displays.

January 15, 2014 by 134.226.86.54 -
Changed line 9 from:
 Semester 1 
to:
 Semester 1 (Wednesdays at Noon)
Changed line 20 from:
04-Dec-13Róisín Rowley-Brooke
to:
04-Dec-13No Talk
Changed line 23 from:
 Semester 2 
to:
 Semester 2 Talks (Fridays at Noon)
Changed lines 26-29 from:
15-Jan-14Finnian Kelly
22-Jan-14Ailbhe Cullen
29-Jan-13Eoin Gillen
to:
24-Jan-14Finnian Kelly
31-Jan-14Ailbhe Cullen
7-Feb-14Eoin Gillen
14-Feb-14Róisín Rowley-Brooke
21-Feb-14 
28-Feb-14Reading Week
7-Mar-14No Talk
14-Mar-14 
21-Mar-14 
28-Mar-14 
4-Apr-14 
11-Apr-14 
18-Apr-14 
25-Apr-14 
2-May-14No Talk
9-May-14 
December 08, 2013 by 79.97.129.43 -
Deleted line 34:
Changed lines 36-38 from:
Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 4-Dec-13
to:
Speaker Nick Holliman
Title Stereoscopic 3D everywhere: computational solutions for 3D displays
Time & Venue Printing House Hall - 12:00 11-Dec-13


One reason for the lack of market penetration of 3D display systems is the difficulties found in producing high quality content. In this presentation I will summarise three strands of our research that tackle this challenge. Firstly research into algorithms for producing high quality 3D images, secondly a recent multi-site study of subjective film quality on 3DTV. Finally, looking to the future, I will review some of our most recent results on how the use of cross-modal stimuli that combine visual and auditory depth cues could improve users experience of 3D displays.


Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - TBD
December 02, 2013 by 134.226.86.54 -
Changed lines 8-10 from:
DateSpeaker(s)
9-Oct-13Professor James J. Mahshie
to:
 Semester 1 
DateSpeaker(s)
9-Oct-13James J. Mahshie
Changed lines 23-25 from:
 Semester 2
to:
 Semester 2 
DateSpeaker(s)
Added lines 29-30:
December 02, 2013 by 134.226.86.54 -
Changed lines 19-20 from:
11-Dec-13Eoin Gillen
to:
11-Dec-13Nick Holliman
Added line 24:
29-Jan-13Eoin Gillen
Added lines 28-29:
Changed lines 31-33 from:
Speaker John Kane
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
Time & Venue Printing House Hall - 12:00 27-Nov-13
to:
Speaker Róisín Rowley-Brooke
Title Bleed-Through Document Image Restoration
Time & Venue Printing House Hall - 12:00 4-Dec-13
Changed lines 35-36 from:

Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This talk introduces a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow for more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this talk, I will provide an overview of the current offerings of COVAREP and I will also include a demonstration of the algorithms through an emotion classification experiment.

to:

Digitisation of original document sources for the purpose of conservation, detailed study, and facilitating access for a wider audience has been an increasing trend over recent years, particularly with constantly improving imaging technology available at ever decreasing costs. Many documents suffer from a wide variety of degradations that reduce their legibility and usefulness as sources. With the increase in digitisation has also come an increase in image processing based enhancement and restoration techniques. This thesis presents new approaches to automatic restoration of one particular type of degradation - bleed-through, which occurs when ink from one side of a page seeps through and interferes with the text on the other side, reducing legibility - with the aim being to preserve the document appearance as far as possible.

Changed lines 40-47 from:
to:


Speaker John Kane
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
Time & Venue Printing House Hall - 12:00 27-Nov-13


Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This talk introduces a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow for more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this talk, I will provide an overview of the current offerings of COVAREP and I will also include a demonstration of the algorithms through an emotion classification experiment.


November 21, 2013 by 134.226.86.54 -
Deleted line 26:
Changed line 29 from:
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
to:
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
November 21, 2013 by 134.226.86.54 -
Deleted line 28:


Added line 31:
Time & Venue Printing House Hall - 12:00 27-Nov-13
Deleted lines 32-33:
Time & Venue Printing House Hall - 12:00 27-Nov-13


Changed line 38 from:


to:
November 21, 2013 by 134.226.86.54 -
Changed lines 30-32 from:
Speaker Joao Cabral
Title Expressive Speech Synthesis for Human-Computer Interaction
Time & Venue Printing House Hall - 12:00 20-Nov-13
to:
Speaker John Kane
Title Introducing COVAREP - A collaborative voice analysis repository for speech technologies
Added lines 33-45:
Time & Venue Printing House Hall - 12:00 27-Nov-13


Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This talk introduces a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow for more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this talk, I will provide an overview of the current offerings of COVAREP and I will also include a demonstration of the algorithms through an emotion classification experiment.


Past Talks



Speaker Joao Cabral
Title Expressive Speech Synthesis for Human-Computer Interaction
Time & Venue Printing House Hall - 12:00 20-Nov-13


Deleted lines 68-69:

Past Talks

November 18, 2013 by 134.226.86.54 -
Changed lines 17-18 from:
27-Nov-13Finnian Kelly
04-Dec-13Ailbhe Cullen
to:
27-Nov-13John Kane
04-Dec-13Róisín Rowley-Brooke
Changed lines 22-23 from:
15-Jan-14Róisín Rowley-Brooke
to:
15-Jan-14Finnian Kelly
22-Jan-14Ailbhe Cullen
Changed lines 30-32 from:
Speaker Claudia Arellano
Title L2 Inference for Shape Parameters Estimation
Time & Venue Printing House Hall - 12:00 13-Nov-13
to:
Speaker Joao Cabral
Title Expressive Speech Synthesis for Human-Computer Interaction
Time & Venue Printing House Hall - 12:00 20-Nov-13
Changed lines 34-39 from:

In this thesis, we propose a method to robustly estimate the parameters that controls the mapping of a shape (model shape) onto another (target shape). The shapes of interest are contours in the 2D space, surfaces in the 3D space and point clouds (either in 2D and 3D spaces). We propose to model the shapes using Gaussian Mixture Models (GMMs) and estimate the transformation parameters by minimising a cost function based on the Euclidean (L2) distance between the target and model GMMs. This strategy allows us to avoid the need for the computation of one to one point correspondences that are required by state of the art approaches making them sensitive to both outliers and the choice of the starting guess in the algorithm used for optimisation. Shapes are well represented by GMMs when careful consideration is given to the design of the covariance matrices. Compared to isotropic covariance matrices, we show how shape matching with L2 can be made more robust and accurate by using well chosen non isotropic ones. Our framework offers a novel extension to L2 based cost functions by allowing prior information about the parameters to be included. Our approach is therefore fully Bayesian. This Bayesian-L2 framework is tested successfully for estimating the affiane transformation between data sets, for fi tting morphable models and fitting ellipses. Finally we show how to extend this framework to shapes de fined in higher dimensional feature spaces in addition to the spatial domain.

to:

Speech is the one of the most important forms of communication between humans. Thus, it also plays an important role in human-computer interaction (HCI). In many applications of HCI, such as spoken dialogue systems, e-books, and computer games, the machine often needs to understand the spoken utterances and to synthesise speech which is intelligible, sounds sufficiently natural and conveys the appropriate expressiveness or affect.

Also, there has been an increasing interest from manufacturers to integrate the latest speech technology in portable electronic devices, such as PDAs and mobile phones. Statistical parametric speech synthesisers are very attractive for these applications because they are fully parametric, have small memory footprint and can be used to easily transform voice characteristics. However, its synthetic speech does not sound as natural as human speech, mainly due to limitations of the type of speech model typically used by these systems. This talk focus on improvements of this model for producing high-quality speech while permitting a better control over voice characteristics. In particular, these improvements are related to the voice source component, which represents the signal produced at the glottis during human speech production.

Added lines 59-71:



Speaker Claudia Arellano
Title L2 Inference for Shape Parameters Estimation
Time & Venue Printing House Hall - 12:00 13-Nov-13


In this thesis, we propose a method to robustly estimate the parameters that controls the mapping of a shape (model shape) onto another (target shape). The shapes of interest are contours in the 2D space, surfaces in the 3D space and point clouds (either in 2D and 3D spaces). We propose to model the shapes using Gaussian Mixture Models (GMMs) and estimate the transformation parameters by minimising a cost function based on the Euclidean (L2) distance between the target and model GMMs. This strategy allows us to avoid the need for the computation of one to one point correspondences that are required by state of the art approaches making them sensitive to both outliers and the choice of the starting guess in the algorithm used for optimisation. Shapes are well represented by GMMs when careful consideration is given to the design of the covariance matrices. Compared to isotropic covariance matrices, we show how shape matching with L2 can be made more robust and accurate by using well chosen non isotropic ones. Our framework offers a novel extension to L2 based cost functions by allowing prior information about the parameters to be included. Our approach is therefore fully Bayesian. This Bayesian-L2 framework is tested successfully for estimating the affiane transformation between data sets, for fi tting morphable models and fitting ellipses. Finally we show how to extend this framework to shapes de fined in higher dimensional feature spaces in addition to the spatial domain.

November 11, 2013 by 134.226.86.191 -
Changed lines 29-31 from:
Speaker Ed Lalor
Title The Effects of Attention and Visual Input on the Representation of Natural Speech in EEG
Time & Venue Printing House Hall - 12:00 30-Oct-13
to:
Speaker Claudia Arellano
Title L2 Inference for Shape Parameters Estimation
Time & Venue Printing House Hall - 12:00 13-Nov-13
Changed lines 33-34 from:

Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural speech in humans has been constrained by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech. This has led to studies using linear regression methods which allow the presentation of continuous speech. In this talk I will present the results of several studies that use such methods to examine how the representation of speech is affected by attention and by visual inputs. Specifically, I will present data showing that it is possible to “reconstruct” a speech stimulus from single-trial EEG and, by doing so, to decode how a subject is deploying attention in a naturalistic cocktail party scenario. I will also present results showing that the representation of the envelope of auditory speech in the cortex is earlier when accompanied by visual speech. Finally I will discuss some implications that these findings have for the design of future EEG studies into the ongoing dynamics of cognition and for research aimed at identifying biomarkers of clinical disorders.

to:

In this thesis, we propose a method to robustly estimate the parameters that controls the mapping of a shape (model shape) onto another (target shape). The shapes of interest are contours in the 2D space, surfaces in the 3D space and point clouds (either in 2D and 3D spaces). We propose to model the shapes using Gaussian Mixture Models (GMMs) and estimate the transformation parameters by minimising a cost function based on the Euclidean (L2) distance between the target and model GMMs. This strategy allows us to avoid the need for the computation of one to one point correspondences that are required by state of the art approaches making them sensitive to both outliers and the choice of the starting guess in the algorithm used for optimisation. Shapes are well represented by GMMs when careful consideration is given to the design of the covariance matrices. Compared to isotropic covariance matrices, we show how shape matching with L2 can be made more robust and accurate by using well chosen non isotropic ones. Our framework offers a novel extension to L2 based cost functions by allowing prior information about the parameters to be included. Our approach is therefore fully Bayesian. This Bayesian-L2 framework is tested successfully for estimating the affiane transformation between data sets, for fi tting morphable models and fitting ellipses. Finally we show how to extend this framework to shapes de fined in higher dimensional feature spaces in addition to the spatial domain.

Changed lines 42-48 from:
to:



Speaker Ed Lalor
Title The Effects of Attention and Visual Input on the Representation of Natural Speech in EEG
Time & Venue Printing House Hall - 12:00 30-Oct-13


Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural speech in humans has been constrained by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech. This has led to studies using linear regression methods which allow the presentation of continuous speech. In this talk I will present the results of several studies that use such methods to examine how the representation of speech is affected by attention and by visual inputs. Specifically, I will present data showing that it is possible to “reconstruct” a speech stimulus from single-trial EEG and, by doing so, to decode how a subject is deploying attention in a naturalistic cocktail party scenario. I will also present results showing that the representation of the envelope of auditory speech in the cortex is earlier when accompanied by visual speech. Finally I will discuss some implications that these findings have for the design of future EEG studies into the ongoing dynamics of cognition and for research aimed at identifying biomarkers of clinical disorders.

October 23, 2013 by 134.226.86.54 -
Deleted lines 27-29:
Speaker Félix Raimbault
Title User-assisted Sparse Stereo-video Segmentation
Time & Venue Printing House Hall - 12:00 23-Oct-13
Changed lines 29-30 from:

Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this talk we present a novel sparse segmentation algorithm for two-view stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.

to:
Speaker Ed Lalor
Title The Effects of Attention and Visual Input on the Representation of Natural Speech in EEG
Time & Venue Printing House Hall - 12:00 30-Oct-13
Changed lines 33-36 from:
to:

Traditionally, the use of electroencephalography (EEG) to study the neural processing of natural speech in humans has been constrained by the need to repeatedly present discrete stimuli. Progress has been made recently by the realization that cortical population activity tracks the amplitude envelope of speech. This has led to studies using linear regression methods which allow the presentation of continuous speech. In this talk I will present the results of several studies that use such methods to examine how the representation of speech is affected by attention and by visual inputs. Specifically, I will present data showing that it is possible to “reconstruct” a speech stimulus from single-trial EEG and, by doing so, to decode how a subject is deploying attention in a naturalistic cocktail party scenario. I will also present results showing that the representation of the envelope of auditory speech in the cortex is earlier when accompanied by visual speech. Finally I will discuss some implications that these findings have for the design of future EEG studies into the ongoing dynamics of cognition and for research aimed at identifying biomarkers of clinical disorders.


Changed lines 40-47 from:
to:


Speaker Félix Raimbault
Title User-assisted Sparse Stereo-video Segmentation
Time & Venue Printing House Hall - 12:00 23-Oct-13


Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this talk we present a novel sparse segmentation algorithm for two-view stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.


October 18, 2013 by 134.226.86.54 -
Changed line 29 from:
Title User-assisted Sparse Stereo-video Segmentation
to:
Title User-assisted Sparse Stereo-video Segmentation
October 18, 2013 by 134.226.86.54 -
Deleted line 29:


October 18, 2013 by 134.226.86.54 -
Changed lines 28-30 from:
Speaker Ken Sooknanan
Title Mosaics for Burrow detection in Underwater Surveillance Videos
Time & Venue Printing House Hall - 12:00 16-Oct-13
to:
Speaker Félix Raimbault
Title User-assisted Sparse Stereo-video Segmentation
Changed lines 31-33 from:

Harvesting the commercially significant lobster, Nephrops norvegicus, is a multimillion dollar industry in Europe. Stock assessment is essential for maintaining this activity but it is conducted by manually inspecting hours of underwater surveillance videos. To improve this tedious process, we propose the use of mosaics for the automated detection of burrows on the seabed. We present novel approaches for handling the difficult lighting conditions that cause poor video quality in this kind of video material. Mosaics are built using 1-10 minutes of footage and candidate burrows are selected using image segmentation based on local image contrast. A K-Nearest Neighbour classifier is then used to select burrows from these candidate regions. Our final decision accuracy at 93.6% recall and 86.6% precision shows a corresponding 18% and 14.2% improvement compared with previous work.

to:
Time & Venue Printing House Hall - 12:00 23-Oct-13
Changed lines 33-36 from:
to:

Motion-based video segmentation has been studied for many years and remains challenging. Ill-posed problems must be solved when seeking for a fully automated solution, so it is increasingly popular to maintain users in the processing loop by letting them set parameters or draw mattes to guide the segmentation process. When processing multiple-view videos, however, the amount of user interaction should not be proportional to the number of views. In this talk we present a novel sparse segmentation algorithm for two-view stereoscopic videos that maintains temporal coherence and view consistency throughout. We track feature points on both views with a generic tracker and analyse the pairwise affinity of both temporally overlapping and disjoint tracks, whereas existing similar techniques only exploit the information available when tracks overlap. The use of stereo-disparity also allows our technique to process jointly feature tracks on both views, exhibiting a good view consistency in the segmentation output. To make up for the lack of high level understanding inherent to segmentation techniques, we allow the user to refine the output with a split-and-merge approach so as to obtain a desired view-consistent segmentation output over many frames in a few clicks. We present several real video examples to illustrate the versatility of our technique.


Added lines 41-48:


Speaker Ken Sooknanan
Title Mosaics for Burrow detection in Underwater Surveillance Videos
Time & Venue Printing House Hall - 12:00 16-Oct-13


Harvesting the commercially significant lobster, Nephrops norvegicus, is a multimillion dollar industry in Europe. Stock assessment is essential for maintaining this activity but it is conducted by manually inspecting hours of underwater surveillance videos. To improve this tedious process, we propose the use of mosaics for the automated detection of burrows on the seabed. We present novel approaches for handling the difficult lighting conditions that cause poor video quality in this kind of video material. Mosaics are built using 1-10 minutes of footage and candidate burrows are selected using image segmentation based on local image contrast. A K-Nearest Neighbour classifier is then used to select burrows from these candidate regions. Our final decision accuracy at 93.6% recall and 86.6% precision shows a corresponding 18% and 14.2% improvement compared with previous work.

October 11, 2013 by 134.226.86.54 -
Deleted line 26:

(Organised by the School of Linguistic, Speech and Communication Sciences, in conjunction with the Long Room Hub)

Changed lines 28-30 from:
Speaker Professor James J. Mahshie
Title Children with Cochlear Implants: Perception and Production of Speech
Time & Venue Long Room Hub - 13:00 09-Oct-13
to:
Speaker Ken Sooknanan
Title Mosaics for Burrow detection in Underwater Surveillance Videos
Time & Venue Printing House Hall - 12:00 16-Oct-13
Changed lines 32-34 from:
Abstract

Dr. Mahshie is Professor and Chair of the Department of Speech and Hearing Science at George Washington University, Washington DC, as well as Professor Emeritus at Gallaudet University in Washington DC. His talk will focus on his research exploring the production, and perception of speech by young children with cochlear implants, possible mechanisms and factors relating perception and production, and preliminary findings on the voice quality characteristics of children with cochlear implants.

to:

Harvesting the commercially significant lobster, Nephrops norvegicus, is a multimillion dollar industry in Europe. Stock assessment is essential for maintaining this activity but it is conducted by manually inspecting hours of underwater surveillance videos. To improve this tedious process, we propose the use of mosaics for the automated detection of burrows on the seabed. We present novel approaches for handling the difficult lighting conditions that cause poor video quality in this kind of video material. Mosaics are built using 1-10 minutes of footage and candidate burrows are selected using image segmentation based on local image contrast. A K-Nearest Neighbour classifier is then used to select burrows from these candidate regions. Our final decision accuracy at 93.6% recall and 86.6% precision shows a corresponding 18% and 14.2% improvement compared with previous work.

Added lines 37-38:

Past Talks

Changed lines 41-42 from:

Past Talks

to:

(Organised by the School of Linguistic, Speech and Communication Sciences, in conjunction with the Long Room Hub)

Changed lines 43-50 from:
to:
Speaker Professor James J. Mahshie
Title Children with Cochlear Implants: Perception and Production of Speech
Time & Venue Long Room Hub - 13:00 09-Oct-13


Abstract

Dr. Mahshie is Professor and Chair of the Department of Speech and Hearing Science at George Washington University, Washington DC, as well as Professor Emeritus at Gallaudet University in Washington DC. His talk will focus on his research exploring the production, and perception of speech by young children with cochlear implants, possible mechanisms and factors relating perception and production, and preliminary findings on the voice quality characteristics of children with cochlear implants.

October 07, 2013 by 134.226.86.54 -
Changed line 16 from:
20-Nov-13Roisin Rowley-Brooke
to:
20-Nov-13Joao Cabral
Changed lines 21-22 from:
to:
 Semester 2
15-Jan-14Róisín Rowley-Brooke
October 04, 2013 by 134.226.86.54 -
Changed line 10 from:
9-Oct-13 
to:
9-Oct-13Professor James J. Mahshie
Added line 26:

(Organised by the School of Linguistic, Speech and Communication Sciences, in conjunction with the Long Room Hub)

Changed lines 28-30 from:
Speaker
Title
Time & Venue Printing House Hall - 12:00 14-Oct-13
to:
Speaker Professor James J. Mahshie
Title Children with Cochlear Implants: Perception and Production of Speech
Time & Venue Long Room Hub - 13:00 09-Oct-13
Changed line 34 from:
to:

Dr. Mahshie is Professor and Chair of the Department of Speech and Hearing Science at George Washington University, Washington DC, as well as Professor Emeritus at Gallaudet University in Washington DC. His talk will focus on his research exploring the production, and perception of speech by young children with cochlear implants, possible mechanisms and factors relating perception and production, and preliminary findings on the voice quality characteristics of children with cochlear implants.

October 04, 2013 by 134.226.86.54 -
Changed line 12 from:
23-Oct-13 
to:
23-Oct-13Felix Raimbault
October 04, 2013 by 134.226.86.54 -
Changed line 11 from:
16-Oct-13 
to:
16-Oct-13Ken Sooknanan
Changed line 42 from:


to:


October 03, 2013 by 134.226.86.54 -
Changed line 15 from:
13-Nov-13 
to:
13-Nov-13Claudia Arellano
Deleted line 42:
October 03, 2013 by 134.226.86.54 -
Changed lines 16-19 from:
20-Nov-13 
27-Nov-13 
04-Dec-13 
11-Dec-13 
to:
20-Nov-13Roisin Rowley-Brooke
27-Nov-13Finnian Kelly
04-Dec-13Ailbhe Cullen
11-Dec-13Eoin Gillen
October 02, 2013 by 134.226.86.54 -
Changed lines 10-19 from:
7-Oct-13 
14-Oct-13 
21-Oct-13 
28-Oct-13 
04-Nov-13Reading week
11-Nov-13 
18-Nov-13 
25-Nov-13 
02-Dec-13 
09-Dec-13 
to:
9-Oct-13 
16-Oct-13 
23-Oct-13 
30-Oct-13Ed Lalor
06-Nov-13Reading week
13-Nov-13 
20-Nov-13 
27-Nov-13 
04-Dec-13 
11-Dec-13 
October 02, 2013 by 134.226.86.54 -
Changed lines 10-34 from:
17-Oct-12Ian Kelly
24-Oct-12Andrew Hines, Naomi Harte, Frank Boland
31-Oct-12Francois Pitie
07-Nov-12Reading week
14-Nov-12Roisin Rowley-Brooke
21-Nov-12No talk - Science Gallery Visit
28-Nov-12Finnian Kelly
05-Dec-12Ken Sooknanan
12-Dec-12No Talk Scheduled
23-Jan-13Yun Feng Wang
30-Jan-13David Corrigan
6-Feb-13Marcin Gorzel
16-Apr-13Félix Raimbault
23-Apr-13Kangyu Pan
30-Apr-13No Talk - PHH in use
7-May-13Liam O'Sullivan
14-May-13Ailbhe Cullen
21-May-13Frank Boland
28-May-13Naomi Harte
4-Jun-13Ian Kelly
11-Jun-13No Talk
18-Jun-13No Talk
25-Jun-13Andrew Hines
TBDFinnian Kelly
TBDRoisin Rowley-Brooke
to:
7-Oct-13 
14-Oct-13 
21-Oct-13 
28-Oct-13 
04-Nov-13Reading week
11-Nov-13 
18-Nov-13 
25-Nov-13 
02-Dec-13 
09-Dec-13 
Changed lines 27-29 from:
Speaker Andrew Hines
Title Detailed Analysis of PESQ And VISQOL Behaviour in the Context of Playout Delay Adjustments Introduced by VOIP Jitter Buffer Algorithms
Time & Venue Printing House Hall - 12:00 25-Jun-13
to:
Speaker
Title
Time & Venue Printing House Hall - 12:00 14-Oct-13
Changed line 33 from:

This talk covers a detailed analysis of both PESQ and VISQOL model behavior, when tested against speech samples modified through playout delay adjustments. The adjustments are typical (in extent and magnitude) to those introduced by VoIP jitter buffer algorithms. In particular, the analysis examines the impact of speaker/sentence on MOS scores predicted by both models and seeks to determine if both models are able to correctly detect and quantify a playout delay adjustments and if so to also predict the impact on quality perceived by the user. The results showed speaker voice preference dominating subjective tests more than playout de- lay duration or location. By design, PESQ andVISQOL do not qualify speaker voice difference reducing their correlation with the subjective tests. In addition, it was found that PESQ is actually quite good at detecting playout delay adjustments but the impact of play- out delay adjustments on a quality perceived by the user is not well modelled. On the other hand, VISQOL model is better in predicting an impact of playout delay adjustments on a quality perceived by the user but there are still some discrepancies in the predicted scores. The reasons for those discrepancies are particularly analyzed and discussed.

to:
Deleted line 39:
Changed lines 41-43 from:
Speaker Ian Kelly
Title Detecting arrivals in room impulse responses with dynamic time warping
Time & Venue Printing House Hall - 12:00 4-Jun-13
to:
Deleted lines 43-316:
Abstract

The detection of early reflections in room impulse responses (RIRs) is of importance to many algorithms including room geometry inference, mixing time determination and speech dereverberation. The detection of early reflections can be hampered by increasing pulse width, as the direct sound undergoes reflection and by overlapping of the reflections, as the pulse density grows.We propose the use of Dynamic Time Warping upon a direct sound pulse to better estimate the temporal distribution of arrivals in room impulse responses. Bounded Dynamic Time Warping is performed after an initial correlation of the direct sound with the remaining signal to further refine the arrival’s location and duration and to find arrivals which may otherwise not correlate well with the un-warped direct sound due to a change in the reflection’s shape. Dynamic Time Warping can also be used to help find overlapping reflections which may otherwise go unnoticed. Warping is performed via a set of warp matrices which can be combined together and can also be inverted via a left pseudo-inverse. This pseudo-inverse can be very quickly calculated based upon the properties of the warp matrices and how their transpose can be formed into a non square orthogonal matrix by the deletion of repeated rows.



Speaker Dr Naomi Harte
Title DSP – For the Birds!
Time & Venue Printing House Hall - 12:00 28-May-13


Abstract

The songs of birds, like human voices, are important elements of their identity. In ornithology, distinguishing the songs of different populations is as vital as identifying morphological and genetic differences. This talk will explore how DSP and knowledge of speech processing can potentially transform the approach taken by scientists to comparing birdsongs. Using data gathered in Indonesia by the TCD Zoology Dept, the song from different subspecies of Black-naped Orioles and Olive-backed Sunbirds is examined. The song from different island populations is modelled with MFCCs and Gaussian Mixture Models. Analysing the performance of the classifiers on unseen test data can give an indication of song diversity.

These early stage results, which I will present at Interspeech later this Summer, show that a forensic approach to birdsong analysis, inspired by speech processing, may offer invaluable insights into cryptic species diversity as well as song identification at the subspecies level.


Speaker Prof Frank Boland
Title ‘How loud is that?’
Time & Venue Printing House Hall - 12:00 21-May-13


Abstract

There has been a fundamental change to how the loudness of audio is measured. Broadcasters have responded to persistent and growing complaints from consumers to major jumps in audio levels at breaks within and between programmes. In 2010 the European Broadcasting Union introduced recommendations for new audio loudness level measurements to replace outmoded peak measurements. These measurements are now being adopted beyond the EU and in the wider audio-video industries. In this seminar the challenge of defining and measuring audio loudness will be introduced, as will the ‘loudness wars’ of the past decade. The signal processing, that forms the new three part measurements of audio loudness, will be explained and some recent user trials conducted by my audio research group will be presented.



Speaker Ailbhe Cullen
Title Automatic Classification of Affect from Speech
Time & Venue Printing House Hall - 12:00 14-May-13


Abstract

To truly understand speech it is not enough to know the words that were spoken, we must also know how they were spoken. Emotion (or affect) recognition is a multidisciplinary research domain which aims to exploit non-verbal visual and cues to decode this paralinguistic information. This is still a relatively young field, and as such there remains uncertainty about many aspects of the recognition process, from labelling to feature selection to classifier design.

This talk focuses on the acoustic classifi cation of aff ect from speech using hidden Markov models. A number of feature sets are compared, some which have never before been used for emotion classi cation, in an attempt to discern the optimum features and classif er structure for the various affective dimensions. Probabilistic fusion is then used to combine the benefits of each individual classifi er.

The effect a particular type of phonation, known as creaky voice, on affect classi fication is also explored. Creak is used in the English language to signal certain emotions, and should thus aid classification, but tends to cause problems for automatic feature extraction routines. Finally, some novel features are applied to the classi fication task in order to better exploit creak.



Speaker Liam O'Sullivan
Title MorphOSC- A Toolkit for Building Sound Control GUIs with Preset Interpolation in the Processing Development Environment
Time & Venue Printing House Hall - 12:00 7-May-2013


Abstract

MorphOSC is a new toolkit for building graphical user interfaces for the control of sound using morphing between parameter presets. It uses the multidimensional interpolation space paradigm seen in some other systems, but hitherto unavailable as open-source software in the form presented here. The software is delivered as a class library for the Processing Development Environment and is cross-platform for desktop computers and Android mobile devices. This talk positions the new library within the context of similar software, introduces the main features of the initial code release and details future work on the project.



Speaker Kangyu Pan
Title Shape Models for Image segmentation in Microscopy
Time & Venue Printing House Hall - 12:00 23-Apr-13


Abstract

This project presents three model driven segmentation algorithms for extracting different types of microscopic objects based on prior shape information of the objects. The first part presents a novel Gaussian mixture shape modelling algorithm for protein particle detection and analysis in the images delivered from the biological study of memory formation.The new Gaussian mixture model (GMM) approach with a novel error-based split-and-merge expectation-maximization (eSMEM) algorithm not only estimates the number of candidate particles in a cluster spot (cluster of particles), but also parametrizes the shape characteristics of the particles for the later co-localization analysis. The second part presents a wavelet-based Bayesian segmentation model to reconstruct the shape the synapses (where the protein synthesis takes place) from a stack of image slices recorded with a confocal microscope. In order to tackle the problem of irregular luminance of the synapses and the presence of the ‘out-of-focus’ synaptic features, the segmentation model incorporates the ‘sharpness’ information of the objects, the global intensity histogram, and the inter-slice intensity behaviour. The last part presents a new active contour mode (Cellsnake) for segmenting the overlapped cell/fibre objects in the skeletal muscle images. The challenge of the segmentation is the high variation in the shapes of the fibre objects. In order to distinguish the fibres from overlapped objects and segment the candidate cell/fibre from each overlapped object, the outlined algorithm divides each separated region in the image into small ‘cell candidates’. Each ‘cell candidates’ is associated with an active contour (AC) model, and the deformation is constrained by the energy terms derived from the shapes of the cells. Finally, the ACs after the deformation are merged when the corresponding ‘cell candidates’ belong to the same fibre and hence segment the overlapped fibre/cell objects.


Speaker Félix Raimbault
Title
Time & Venue Printing House Hall - 12:00 16-Apr-13


Abstract




Speaker Marcin Gorzel
Title Optimised real-time rendering of audio in Virtual Auditory Environments
Time & Venue Printing House Hall - 12:00 6-Feb-13


Abstract

This project looks at the problem of a capture or synthesis of acoustic events in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original space, otherwise known as a Virtual Auditory Environment (VAE). Of particular concern is identification and perceptually correct reconstruction of important acoustic cues that allow to localise sound object in the whole 3-D space, with a special emphasis on the perception of auditory distance. Such presentations can be realised with the use of both multichannel loudspeaker arrays and headphones. The latter are able to provide personalised sound field to a single user, minimising the influence of the listening environment and providing a better sense of immersion. However, one of the problems that needs to be addressed is the user interaction and how listener movements affect the experience. Such walk-through auralisations present several challenges for production engineers, most significant of which are the generation of correct room acoustic responses for a given source-listener position as well as identification of the most optimal sound reproduction schemes that can minimise the computational burden. The current framework considers the parametrisation of real-world sound fields and their subsequent real-time auralisation using a hybrid image source model/measurement-based approach. Two different models are constructed based on existing spaces with significantly different acoustic properties: a middle sized lecture hall and a large cathedral interior. Various optimisation techniques, including order reduction of Head Related Transfer Function using approximate factorisation and Room Impulse Response decomposition using directional analysis are incorporated and some important aspects of their perceptual impact investigated extensively by the means of subjective listening trials. Lastly, spatial localisation of sounding objects is affected not only by the auditory cues but also by other modalities such as vision. This is true particularly in the context of perception of distance where the number of auditory cues is limited in comparison to e.g. localisation in horizontal and vertical planes. This work also investigates the influence of vision on the perception of audio. In particular, the effect of incongruent audio-visual cues is explored in the context of the perception of auditory distance in photo-realistic VAEs.



Speaker David Corrigan
Title Depth perception of audio sources in stereo 3D environments
Time & Venue Printing House Hall - 12:00 30-Jan-13


Abstract

In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases.



Speaker Yun Feng Wang
Title Double-Tip Effect Removal In Atomic Force Microscopy Images
Time & Venue Printing House Hall - 12:00 23-Jan-13


Abstract

The Atomic Force Microscope (AFM) has enabled much progress in nanotechnology by capturing a material surface structure with nanoscale resolution. However, due to the imperfection of its scanning probe, some artefacts are induced during the scanning process. Here, we focus on a new type of artefact in AFM images called ‘double-tip’ effect. The ‘double-tip’ effect has a dramatic form of distortion compare with the traditional blurring artefacts. A novel deblurring framework, based on the Bayesian theorem and user interactive method, is carried out to identify an approach to remove this effect. The results have proven that our framework is successful at removing the ‘double-tip’ effect in the AFM artefact images and the details of the sample surface topography are also well preserved.


Speaker Ken Sooknanan
Title Towards Identifying Nephrops Burrows Automatically from Marine Video
Time & Venue Printing House Hall - 12:00 5-Dec-12


Abstract

The Dublin Bay prawn is a commercially significant species of lobster throughout Europe and the UK. To regulate the fishing industry, governing bodies (e.g. The Marine Institute Ireland) carry out yearly underwater surveys to estimate its population. This estimation process mainly involves manually counting individual (or clusters) of the species burrows from underwater survey videos of the seabed. To improve this current tedious process we are exploring the possibility of identifying these burrows automatically. In this talk, a brief overview of the segmentation technique (i.e. ICM) that we are using to locate/detect these burrows will be presented.



Speaker Finnian Kelly
Title Eigen-Ageing Compensation for long-term Speaker Verification
Time & Venue Printing House Hall - 12:00 28-Nov-12


Abstract

Vocal ageing causes speaker verification accuracy to worsen as the time lapse between model enrolment and verification increases. In this talk, a new approach to compensate for the ageing effect will be presented. The method is based on learning the dominant changes in speaker models with ageing, and exploiting this at the verification stage. An evaluation of the technique on a recently expanded ageing database of 26 subjects will be presented.



Speaker Roisin Rowley-Brooke
Title A non-parametric approach to document bleed-through removal. (aka the ink is greener from the other side..)
Time & Venue Printing House Hall - 12:00 14-Nov-12


Abstract

Ink bleed-through degradation poses one of the most difficult problems in document restoration. It occurs when ink has seeped through from one side of the page and interferes with text on the other side. In this talk I will present recent work on a new framework for bleed-through removal including image preprocessing, region classification based on a segmentation of the 2d recto-verso intensity histogram and connected component analysis, and finally restoration of the degraded regions using exemplar-based image inpainting.



Speaker Francois Pitie and Gary Baugh
Title 2D to 3D Conversion For Animated Movies
Time & Venue AAP 2.0.2 - 12:00 31-Oct-12


Abstract

In this talk we will present our research on developing postproduction tools for converting animated movies to stereoscopic 3D. The key to the stereoscopic 3D conversion it to utilise the depth information, that is generated for free by the animation software, to synthesize a novel left and right views of the scene. We will present our results (in 3D) and detail some of the image processing challenges of this approach.



Speaker Andrew Hines, Naomi Harte and Frank Boland
Title Sigmedia's European Research Networks: COST actions
Time & Venue Printing House Hall - 12:00 24-Oct-12


Abstract

European Cooperation in Science and Technology (COST) – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. Sigmedia members are currently representing Ireland and participating in three COST actions. This seminar will give a brief introduction of each of the actions.

- ICT COST Action IC1105 Frank Boland 3D-ConTourNet - 3D Content Creation, Coding and Transmission over Future Media Networks

- ICT COST Action IC1106 Naomi Harte Integrating Biometrics and Forensics for the Digital Age

- ICT COST Action IC1003 Andrew Hines European Network on Quality of Experience in Multimedia Systems and Services (QUALINET)




Speaker Ian Kelly
Title Randomness in Acoustic Impulse Responses and its Effects on Factorization
Time & Venue Printing House Hall - 12:00 17-Oct-12


Abstract

Head-related impulse responses (HRIRs) contain all of the necessary auditory information required for convincing spatial audio reproduction. It was recently proposed that the HRIRs could be factorized via an iterative least squares algorithm to yield a direction independent component and a set of reduced order direction dependent components. However further studies showed this minimization problem to have multiple minima. This short talk will cover my work on determining why multiple solutions occur by exploring the inherent randomness in the responses themselves. Furthermore consideration is given to how this behaviour can be exploited for equalization problems.




October 02, 2013 by 134.226.86.54 -
Changed line 3 from:
to:
June 20, 2013 by 134.226.86.54 -
Changed lines 30-33 from:
11-Jun-13Finnian Kelly
18-Jun-13Roisin Rowley-Brooke
25-Jun-13No Talk
to:
11-Jun-13No Talk
18-Jun-13No Talk
25-Jun-13Andrew Hines
TBDFinnian Kelly
TBDRoisin Rowley-Brooke
Changed lines 40-42 from:
Speaker Dr Naomi Harte
Title DSP – For the Birds!
Time & Venue Printing House Hall - 12:00 28-May-13
to:
Speaker Andrew Hines
Title Detailed Analysis of PESQ And VISQOL Behaviour in the Context of Playout Delay Adjustments Introduced by VOIP Jitter Buffer Algorithms
Time & Venue Printing House Hall - 12:00 25-Jun-13
Changed lines 46-48 from:

The songs of birds, like human voices, are important elements of their identity. In ornithology, distinguishing the songs of different populations is as vital as identifying morphological and genetic differences. This talk will explore how DSP and knowledge of speech processing can potentially transform the approach taken by scientists to comparing birdsongs. Using data gathered in Indonesia by the TCD Zoology Dept, the song from different subspecies of Black-naped Orioles and Olive-backed Sunbirds is examined. The song from different island populations is modelled with MFCCs and Gaussian Mixture Models. Analysing the performance of the classifiers on unseen test data can give an indication of song diversity.

These early stage results, which I will present at Interspeech later this Summer, show that a forensic approach to birdsong analysis, inspired by speech processing, may offer invaluable insights into cryptic species diversity as well as song identification at the subspecies level.

to:

This talk covers a detailed analysis of both PESQ and VISQOL model behavior, when tested against speech samples modified through playout delay adjustments. The adjustments are typical (in extent and magnitude) to those introduced by VoIP jitter buffer algorithms. In particular, the analysis examines the impact of speaker/sentence on MOS scores predicted by both models and seeks to determine if both models are able to correctly detect and quantify a playout delay adjustments and if so to also predict the impact on quality perceived by the user. The results showed speaker voice preference dominating subjective tests more than playout de- lay duration or location. By design, PESQ andVISQOL do not qualify speaker voice difference reducing their correlation with the subjective tests. In addition, it was found that PESQ is actually quite good at detecting playout delay adjustments but the impact of play- out delay adjustments on a quality perceived by the user is not well modelled. On the other hand, VISQOL model is better in predicting an impact of playout delay adjustments on a quality perceived by the user but there are still some discrepancies in the predicted scores. The reasons for those discrepancies are particularly analyzed and discussed.

Added lines 52-75:


Speaker Ian Kelly
Title Detecting arrivals in room impulse responses with dynamic time warping
Time & Venue Printing House Hall - 12:00 4-Jun-13


Abstract

The detection of early reflections in room impulse responses (RIRs) is of importance to many algorithms including room geometry inference, mixing time determination and speech dereverberation. The detection of early reflections can be hampered by increasing pulse width, as the direct sound undergoes reflection and by overlapping of the reflections, as the pulse density grows.We propose the use of Dynamic Time Warping upon a direct sound pulse to better estimate the temporal distribution of arrivals in room impulse responses. Bounded Dynamic Time Warping is performed after an initial correlation of the direct sound with the remaining signal to further refine the arrival’s location and duration and to find arrivals which may otherwise not correlate well with the un-warped direct sound due to a change in the reflection’s shape. Dynamic Time Warping can also be used to help find overlapping reflections which may otherwise go unnoticed. Warping is performed via a set of warp matrices which can be combined together and can also be inverted via a left pseudo-inverse. This pseudo-inverse can be very quickly calculated based upon the properties of the warp matrices and how their transpose can be formed into a non square orthogonal matrix by the deletion of repeated rows.



Speaker Dr Naomi Harte
Title DSP – For the Birds!
Time & Venue Printing House Hall - 12:00 28-May-13


Abstract

The songs of birds, like human voices, are important elements of their identity. In ornithology, distinguishing the songs of different populations is as vital as identifying morphological and genetic differences. This talk will explore how DSP and knowledge of speech processing can potentially transform the approach taken by scientists to comparing birdsongs. Using data gathered in Indonesia by the TCD Zoology Dept, the song from different subspecies of Black-naped Orioles and Olive-backed Sunbirds is examined. The song from different island populations is modelled with MFCCs and Gaussian Mixture Models. Analysing the performance of the classifiers on unseen test data can give an indication of song diversity.

These early stage results, which I will present at Interspeech later this Summer, show that a forensic approach to birdsong analysis, inspired by speech processing, may offer invaluable insights into cryptic species diversity as well as song identification at the subspecies level.

May 24, 2013 by 134.226.86.54 -
Changed lines 39-41 from:
Speaker Prof Frank Boland
Title ‘How loud is that?’
Time & Venue Printing House Hall - 12:00 21-May-13
to:
Speaker Dr Naomi Harte
Title DSP – For the Birds!
Time & Venue Printing House Hall - 12:00 28-May-13
Changed lines 45-47 from:

There has been a fundamental change to how the loudness of audio is measured. Broadcasters have responded to persistent and growing complaints from consumers to major jumps in audio levels at breaks within and between programmes. In 2010 the European Broadcasting Union introduced recommendations for new audio loudness level measurements to replace outmoded peak measurements. These measurements are now being adopted beyond the EU and in the wider audio-video industries. In this seminar the challenge of defining and measuring audio loudness will be introduced, as will the ‘loudness wars’ of the past decade. The signal processing, that forms the new three part measurements of audio loudness, will be explained and some recent user trials conducted by my audio research group will be presented.

to:

The songs of birds, like human voices, are important elements of their identity. In ornithology, distinguishing the songs of different populations is as vital as identifying morphological and genetic differences. This talk will explore how DSP and knowledge of speech processing can potentially transform the approach taken by scientists to comparing birdsongs. Using data gathered in Indonesia by the TCD Zoology Dept, the song from different subspecies of Black-naped Orioles and Olive-backed Sunbirds is examined. The song from different island populations is modelled with MFCCs and Gaussian Mixture Models. Analysing the performance of the classifiers on unseen test data can give an indication of song diversity.

These early stage results, which I will present at Interspeech later this Summer, show that a forensic approach to birdsong analysis, inspired by speech processing, may offer invaluable insights into cryptic species diversity as well as song identification at the subspecies level.

Added lines 53-64:


Speaker Prof Frank Boland
Title ‘How loud is that?’
Time & Venue Printing House Hall - 12:00 21-May-13


Abstract

There has been a fundamental change to how the loudness of audio is measured. Broadcasters have responded to persistent and growing complaints from consumers to major jumps in audio levels at breaks within and between programmes. In 2010 the European Broadcasting Union introduced recommendations for new audio loudness level measurements to replace outmoded peak measurements. These measurements are now being adopted beyond the EU and in the wider audio-video industries. In this seminar the challenge of defining and measuring audio loudness will be introduced, as will the ‘loudness wars’ of the past decade. The signal processing, that forms the new three part measurements of audio loudness, will be explained and some recent user trials conducted by my audio research group will be presented.


May 16, 2013 by 134.226.86.54 -
Changed lines 39-41 from:
Speaker Ailbhe Cullen
Title Automatic Classification of Affect from Speech
Time & Venue Printing House Hall - 12:00 14-May-13
to:
Speaker Prof Frank Boland
Title ‘How loud is that?’
Time & Venue Printing House Hall - 12:00 21-May-13
Changed lines 45-49 from:

To truly understand speech it is not enough to know the words that were spoken, we must also know how they were spoken. Emotion (or affect) recognition is a multidisciplinary research domain which aims to exploit non-verbal visual and cues to decode this paralinguistic information. This is still a relatively young field, and as such there remains uncertainty about many aspects of the recognition process, from labelling to feature selection to classifier design.

This talk focuses on the acoustic classifi cation of aff ect from speech using hidden Markov models. A number of feature sets are compared, some which have never before been used for emotion classi cation, in an attempt to discern the optimum features and classif er structure for the various affective dimensions. Probabilistic fusion is then used to combine the benefits of each individual classifi er.

The effect a particular type of phonation, known as creaky voice, on affect classi fication is also explored. Creak is used in the English language to signal certain emotions, and should thus aid classification, but tends to cause problems for automatic feature extraction routines. Finally, some novel features are applied to the classi fication task in order to better exploit creak.

to:

There has been a fundamental change to how the loudness of audio is measured. Broadcasters have responded to persistent and growing complaints from consumers to major jumps in audio levels at breaks within and between programmes. In 2010 the European Broadcasting Union introduced recommendations for new audio loudness level measurements to replace outmoded peak measurements. These measurements are now being adopted beyond the EU and in the wider audio-video industries. In this seminar the challenge of defining and measuring audio loudness will be introduced, as will the ‘loudness wars’ of the past decade. The signal processing, that forms the new three part measurements of audio loudness, will be explained and some recent user trials conducted by my audio research group will be presented.

Added lines 51-66:


Speaker Ailbhe Cullen
Title Automatic Classification of Affect from Speech
Time & Venue Printing House Hall - 12:00 14-May-13


Abstract

To truly understand speech it is not enough to know the words that were spoken, we must also know how they were spoken. Emotion (or affect) recognition is a multidisciplinary research domain which aims to exploit non-verbal visual and cues to decode this paralinguistic information. This is still a relatively young field, and as such there remains uncertainty about many aspects of the recognition process, from labelling to feature selection to classifier design.

This talk focuses on the acoustic classifi cation of aff ect from speech using hidden Markov models. A number of feature sets are compared, some which have never before been used for emotion classi cation, in an attempt to discern the optimum features and classif er structure for the various affective dimensions. Probabilistic fusion is then used to combine the benefits of each individual classifi er.

The effect a particular type of phonation, known as creaky voice, on affect classi fication is also explored. Creak is used in the English language to signal certain emotions, and should thus aid classification, but tends to cause problems for automatic feature extraction routines. Finally, some novel features are applied to the classi fication task in order to better exploit creak.


May 10, 2013 by 134.226.86.54 -
Changed lines 39-41 from:
Speaker Kangyu Pan
Title Shape Models for Image segmentation in Microscopy
Time & Venue Printing House Hall - 12:00 23-Apr-13
to:
Speaker Ailbhe Cullen
Title Automatic Classification of Affect from Speech
Time & Venue Printing House Hall - 12:00 14-May-13
Changed lines 45-56 from:

This project presents three model driven segmentation algorithms for extracting different types of microscopic objects based on prior shape information of the objects. The first part presents a novel Gaussian mixture shape modelling algorithm for protein particle detection and analysis in the images delivered from the biological study of memory formation.The new Gaussian mixture model (GMM) approach with a novel error-based split-and-merge expectation-maximization (eSMEM) algorithm not only estimates the number of candidate particles in a cluster spot (cluster of particles), but also parametrizes the shape characteristics of the particles for the later co-localization analysis. The second part presents a wavelet-based Bayesian segmentation model to reconstruct the shape the synapses (where the protein synthesis takes place) from a stack of image slices recorded with a confocal microscope. In order to tackle the problem of irregular luminance of the synapses and the presence of the ‘out-of-focus’ synaptic features, the segmentation model incorporates the ‘sharpness’ information of the objects, the global intensity histogram, and the inter-slice intensity behaviour. The last part presents a new active contour mode (Cellsnake) for segmenting the overlapped cell/fibre objects in the skeletal muscle images. The challenge of the segmentation is the high variation in the shapes of the fibre objects. In order to distinguish the fibres from overlapped objects and segment the candidate cell/fibre from each overlapped object, the outlined algorithm divides each separated region in the image into small ‘cell candidates’. Each ‘cell candidates’ is associated with an active contour (AC) model, and the deformation is constrained by the energy terms derived from the shapes of the cells. Finally, the ACs after the deformation are merged when the corresponding ‘cell candidates’ belong to the same fibre and hence segment the overlapped fibre/cell objects.

to:

To truly understand speech it is not enough to know the words that were spoken, we must also know how they were spoken. Emotion (or affect) recognition is a multidisciplinary research domain which aims to exploit non-verbal visual and cues to decode this paralinguistic information. This is still a relatively young field, and as such there remains uncertainty about many aspects of the recognition process, from labelling to feature selection to classifier design.

This talk focuses on the acoustic classifi cation of aff ect from speech using hidden Markov models. A number of feature sets are compared, some which have never before been used for emotion classi cation, in an attempt to discern the optimum features and classif er structure for the various affective dimensions. Probabilistic fusion is then used to combine the benefits of each individual classifi er.

The effect a particular type of phonation, known as creaky voice, on affect classi fication is also explored. Creak is used in the English language to signal certain emotions, and should thus aid classification, but tends to cause problems for automatic feature extraction routines. Finally, some novel features are applied to the classi fication task in order to better exploit creak.

Added lines 55-99:


Speaker Liam O'Sullivan
Title MorphOSC- A Toolkit for Building Sound Control GUIs with Preset Interpolation in the Processing Development Environment
Time & Venue Printing House Hall - 12:00 7-May-2013


Abstract

MorphOSC is a new toolkit for building graphical user interfaces for the control of sound using morphing between parameter presets. It uses the multidimensional interpolation space paradigm seen in some other systems, but hitherto unavailable as open-source software in the form presented here. The software is delivered as a class library for the Processing Development Environment and is cross-platform for desktop computers and Android mobile devices. This talk positions the new library within the context of similar software, introduces the main features of the initial code release and details future work on the project.



Speaker Kangyu Pan
Title Shape Models for Image segmentation in Microscopy
Time & Venue Printing House Hall - 12:00 23-Apr-13


Abstract

This project presents three model driven segmentation algorithms for extracting different types of microscopic objects based on prior shape information of the objects. The first part presents a novel Gaussian mixture shape modelling algorithm for protein particle detection and analysis in the images delivered from the biological study of memory formation.The new Gaussian mixture model (GMM) approach with a novel error-based split-and-merge expectation-maximization (eSMEM) algorithm not only estimates the number of candidate particles in a cluster spot (cluster of particles), but also parametrizes the shape characteristics of the particles for the later co-localization analysis. The second part presents a wavelet-based Bayesian segmentation model to reconstruct the shape the synapses (where the protein synthesis takes place) from a stack of image slices recorded with a confocal microscope. In order to tackle the problem of irregular luminance of the synapses and the presence of the ‘out-of-focus’ synaptic features, the segmentation model incorporates the ‘sharpness’ information of the objects, the global intensity histogram, and the inter-slice intensity behaviour. The last part presents a new active contour mode (Cellsnake) for segmenting the overlapped cell/fibre objects in the skeletal muscle images. The challenge of the segmentation is the high variation in the shapes of the fibre objects. In order to distinguish the fibres from overlapped objects and segment the candidate cell/fibre from each overlapped object, the outlined algorithm divides each separated region in the image into small ‘cell candidates’. Each ‘cell candidates’ is associated with an active contour (AC) model, and the deformation is constrained by the energy terms derived from the shapes of the cells. Finally, the ACs after the deformation are merged when the corresponding ‘cell candidates’ belong to the same fibre and hence segment the overlapped fibre/cell objects.

April 19, 2013 by 134.226.86.54 -
Changed lines 39-41 from:
Speaker Félix Raimbault
Title
Time & Venue Printing House Hall - 12:00 9-Apr-13
to:
Speaker Kangyu Pan
Title Shape Models for Image segmentation in Microscopy
Time & Venue Printing House Hall - 12:00 23-Apr-13
Changed lines 45-56 from:
to:

This project presents three model driven segmentation algorithms for extracting different types of microscopic objects based on prior shape information of the objects. The first part presents a novel Gaussian mixture shape modelling algorithm for protein particle detection and analysis in the images delivered from the biological study of memory formation.The new Gaussian mixture model (GMM) approach with a novel error-based split-and-merge expectation-maximization (eSMEM) algorithm not only estimates the number of candidate particles in a cluster spot (cluster of particles), but also parametrizes the shape characteristics of the particles for the later co-localization analysis. The second part presents a wavelet-based Bayesian segmentation model to reconstruct the shape the synapses (where the protein synthesis takes place) from a stack of image slices recorded with a confocal microscope. In order to tackle the problem of irregular luminance of the synapses and the presence of the ‘out-of-focus’ synaptic features, the segmentation model incorporates the ‘sharpness’ information of the objects, the global intensity histogram, and the inter-slice intensity behaviour. The last part presents a new active contour mode (Cellsnake) for segmenting the overlapped cell/fibre objects in the skeletal muscle images. The challenge of the segmentation is the high variation in the shapes of the fibre objects. In order to distinguish the fibres from overlapped objects and segment the candidate cell/fibre from each overlapped object, the outlined algorithm divides each separated region in the image into small ‘cell candidates’. Each ‘cell candidates’ is associated with an active contour (AC) model, and the deformation is constrained by the energy terms derived from the shapes of the cells. Finally, the ACs after the deformation are merged when the corresponding ‘cell candidates’ belong to the same fibre and hence segment the overlapped fibre/cell objects.

Added lines 62-74:


Speaker Félix Raimbault
Title
Time & Venue Printing House Hall - 12:00 16-Apr-13


Abstract



April 16, 2013 by 134.226.86.54 -
Deleted line 21:

Talks Move to Tuesdays at 12

Changed lines 24-25 from:
30-Apr-13No Talk
7-May-13Andrew Hines
to:
30-Apr-13No Talk - PHH in use
7-May-13Liam O'Sullivan
April 12, 2013 by 134.226.86.54 -
Changed line 12 from:
31-Oct-12Dr. Francois Pitie
to:
31-Oct-12Francois Pitie
Changed line 20 from:
30-Jan-13Dr. David Corrigan
to:
30-Jan-13David Corrigan
Changed lines 22-33 from:
13-Feb-13 
20-Feb-13 
27-Feb-13No Talk
6-Mar-13No Talk
13-Mar-13 
20-Mar-13 
27-Mar-13No Talk
3-Apr-13 
10-Apr-13 
17-Apr-13 
24-Apr-13 
to:

Talks Move to Tuesdays at 12

16-Apr-13Félix Raimbault
23-Apr-13Kangyu Pan
30-Apr-13No Talk
7-May-13Andrew Hines
14-May-13Ailbhe Cullen
21-May-13Frank Boland
28-May-13Naomi Harte
4-Jun-13Ian Kelly
11-Jun-13Finnian Kelly
18-Jun-13Roisin Rowley-Brooke
25-Jun-13No Talk
Changed lines 40-42 from:
Speaker Marcin Gorzel
Title Optimised real-time rendering of audio in Virtual Auditory Environments
Time & Venue Printing House Hall - 12:00 6-Feb-13
to:
Speaker Félix Raimbault
Title
Time & Venue Printing House Hall - 12:00 9-Apr-13
Added lines 46-59:



Past Talks


Speaker Marcin Gorzel
Title Optimised real-time rendering of audio in Virtual Auditory Environments
Time & Venue Printing House Hall - 12:00 6-Feb-13


Abstract
Deleted lines 112-115:


Past Talks

February 06, 2013 by 134.226.86.54 -
Changed lines 39-41 from:
Speaker David Corrigan
Title Depth perception of audio sources in stereo 3D environments
Time & Venue Printing House Hall - 12:00 30-Jan-13
to:
Speaker Marcin Gorzel
Title Optimised real-time rendering of audio in Virtual Auditory Environments
Time & Venue Printing House Hall - 12:00 6-Feb-13
Changed lines 45-46 from:

In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases.

to:

This project looks at the problem of a capture or synthesis of acoustic events in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original space, otherwise known as a Virtual Auditory Environment (VAE). Of particular concern is identification and perceptually correct reconstruction of important acoustic cues that allow to localise sound object in the whole 3-D space, with a special emphasis on the perception of auditory distance. Such presentations can be realised with the use of both multichannel loudspeaker arrays and headphones. The latter are able to provide personalised sound field to a single user, minimising the influence of the listening environment and providing a better sense of immersion. However, one of the problems that needs to be addressed is the user interaction and how listener movements affect the experience. Such walk-through auralisations present several challenges for production engineers, most significant of which are the generation of correct room acoustic responses for a given source-listener position as well as identification of the most optimal sound reproduction schemes that can minimise the computational burden. The current framework considers the parametrisation of real-world sound fields and their subsequent real-time auralisation using a hybrid image source model/measurement-based approach. Two different models are constructed based on existing spaces with significantly different acoustic properties: a middle sized lecture hall and a large cathedral interior. Various optimisation techniques, including order reduction of Head Related Transfer Function using approximate factorisation and Room Impulse Response decomposition using directional analysis are incorporated and some important aspects of their perceptual impact investigated extensively by the means of subjective listening trials. Lastly, spatial localisation of sounding objects is affected not only by the auditory cues but also by other modalities such as vision. This is true particularly in the context of perception of distance where the number of auditory cues is limited in comparison to e.g. localisation in horizontal and vertical planes. This work also investigates the influence of vision on the perception of audio. In particular, the effect of incongruent audio-visual cues is explored in the context of the perception of auditory distance in photo-realistic VAEs.

Changed lines 103-104 from:

Next Scheduled Talk

to:


Speaker David Corrigan
Title Depth perception of audio sources in stereo 3D environments
Time & Venue Printing House Hall - 12:00 30-Jan-13


Abstract

In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases.


January 28, 2013 by 134.226.86.54 -
Changed lines 22-32 from:
to:
13-Feb-13 
20-Feb-13 
27-Feb-13No Talk
6-Mar-13No Talk
13-Mar-13 
20-Mar-13 
27-Mar-13No Talk
3-Apr-13 
10-Apr-13 
17-Apr-13 
24-Apr-13 
January 28, 2013 by 134.226.86.54 -
Changed lines 29-31 from:
Speaker Yun Feng Wang
Title Double-Tip Effect Removal In Atomic Force Microscopy Images
Time & Venue Printing House Hall - 12:00 23-Jan-13
to:
Speaker David Corrigan
Title Depth perception of audio sources in stereo 3D environments
Time & Venue Printing House Hall - 12:00 30-Jan-13
Changed lines 35-36 from:

The Atomic Force Microscope (AFM) has enabled much progress in nanotechnology by capturing a material surface structure with nanoscale resolution. However, due to the imperfection of its scanning probe, some artefacts are induced during the scanning process. Here, we focus on a new type of artefact in AFM images called ‘double-tip’ effect. The ‘double-tip’ effect has a dramatic form of distortion compare with the traditional blurring artefacts. A novel deblurring framework, based on the Bayesian theorem and user interactive method, is carried out to identify an approach to remove this effect. The results have proven that our framework is successful at removing the ‘double-tip’ effect in the AFM artefact images and the details of the sample surface topography are also well preserved.

to:

In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases.

Added lines 43-44:

Next Scheduled Talk

Added lines 46-57:
Speaker Yun Feng Wang
Title Double-Tip Effect Removal In Atomic Force Microscopy Images
Time & Venue Printing House Hall - 12:00 23-Jan-13


Abstract

The Atomic Force Microscope (AFM) has enabled much progress in nanotechnology by capturing a material surface structure with nanoscale resolution. However, due to the imperfection of its scanning probe, some artefacts are induced during the scanning process. Here, we focus on a new type of artefact in AFM images called ‘double-tip’ effect. The ‘double-tip’ effect has a dramatic form of distortion compare with the traditional blurring artefacts. A novel deblurring framework, based on the Bayesian theorem and user interactive method, is carried out to identify an approach to remove this effect. The results have proven that our framework is successful at removing the ‘double-tip’ effect in the AFM artefact images and the details of the sample surface topography are also well preserved.


January 17, 2013 by 134.226.86.54 -
Changed lines 19-22 from:
19-Dec-12Yun Feng Wang
16-Jan-13Dr. David Corrigan
to:
23-Jan-13Yun Feng Wang
30-Jan-13Dr. David Corrigan
6-Feb-13Marcin Gorzel
Added lines 28-37:


Speaker Yun Feng Wang
Title Double-Tip Effect Removal In Atomic Force Microscopy Images
Time & Venue Printing House Hall - 12:00 23-Jan-13


Abstract

The Atomic Force Microscope (AFM) has enabled much progress in nanotechnology by capturing a material surface structure with nanoscale resolution. However, due to the imperfection of its scanning probe, some artefacts are induced during the scanning process. Here, we focus on a new type of artefact in AFM images called ‘double-tip’ effect. The ‘double-tip’ effect has a dramatic form of distortion compare with the traditional blurring artefacts. A novel deblurring framework, based on the Bayesian theorem and user interactive method, is carried out to identify an approach to remove this effect. The results have proven that our framework is successful at removing the ‘double-tip’ effect in the AFM artefact images and the details of the sample surface topography are also well preserved.


December 06, 2012 by 134.226.86.54 -
Changed lines 18-19 from:
12-Dec-12Yun Feng Wang
to:
12-Dec-12No Talk Scheduled
19-Dec-12Yun Feng Wang
Added line 27:
Added lines 29-32:

Past Talks


Deleted lines 42-44:


Past Talks

December 03, 2012 by 95.83.204.31 -
Changed line 28 from:
Title ETowards Identifying Nephrops Burrows Automatically from Marine Video
to:
Title Towards Identifying Nephrops Burrows Automatically from Marine Video
December 03, 2012 by 134.226.86.54 -
Changed lines 43-45 from:

Next Scheduled Talk


to:
December 03, 2012 by 134.226.86.54 -
Changed lines 27-29 from:
Speaker Finnian Kelly
Title Eigen-Ageing Compensation for long-term Speaker Verification
Time & Venue Printing House Hall - 12:00 28-Nov-12
to:
Speaker Ken Sooknanan
Title ETowards Identifying Nephrops Burrows Automatically from Marine Video
Time & Venue Printing House Hall - 12:00 5-Dec-12
Changed lines 33-34 from:

Vocal ageing causes speaker verification accuracy to worsen as the time lapse between model enrolment and verification increases. In this talk, a new approach to compensate for the ageing effect will be presented. The method is based on learning the dominant changes in speaker models with ageing, and exploiting this at the verification stage. An evaluation of the technique on a recently expanded ageing database of 26 subjects will be presented.

to:

The Dublin Bay prawn is a commercially significant species of lobster throughout Europe and the UK. To regulate the fishing industry, governing bodies (e.g. The Marine Institute Ireland) carry out yearly underwater surveys to estimate its population. This estimation process mainly involves manually counting individual (or clusters) of the species burrows from underwater survey videos of the seabed. To improve this current tedious process we are exploring the possibility of identifying these burrows automatically. In this talk, a brief overview of the segmentation technique (i.e. ICM) that we are using to locate/detect these burrows will be presented.

Added lines 40-54:


Next Scheduled Talk


Speaker Finnian Kelly
Title Eigen-Ageing Compensation for long-term Speaker Verification
Time & Venue Printing House Hall - 12:00 28-Nov-12


Abstract

Vocal ageing causes speaker verification accuracy to worsen as the time lapse between model enrolment and verification increases. In this talk, a new approach to compensate for the ageing effect will be presented. The method is based on learning the dominant changes in speaker models with ageing, and exploiting this at the verification stage. An evaluation of the technique on a recently expanded ageing database of 26 subjects will be presented.


November 26, 2012 by 134.226.86.54 -
Changed line 28 from:
Title
to:
Title Eigen-Ageing Compensation for long-term Speaker Verification
Changed line 33 from:
to:

Vocal ageing causes speaker verification accuracy to worsen as the time lapse between model enrolment and verification increases. In this talk, a new approach to compensate for the ageing effect will be presented. The method is based on learning the dominant changes in speaker models with ageing, and exploiting this at the verification stage. An evaluation of the technique on a recently expanded ageing database of 26 subjects will be presented.

November 23, 2012 by 134.226.86.54 -
Changed line 15 from:
21-Nov-12Marcin Gorzel
to:
21-Nov-12No talk - Science Gallery Visit
Changed lines 27-29 from:
Speaker Roisin Rowley-Brooke
Title A non-parametric approach to document bleed-through removal. (aka the ink is greener from the other side..)
Time & Venue Printing House Hall - 12:00 14-Nov-12
to:
Speaker Finnian Kelly
Title
Time & Venue Printing House Hall - 12:00 28-Nov-12
Changed lines 33-34 from:

Ink bleed-through degradation poses one of the most difficult problems in document restoration. It occurs when ink has seeped through from one side of the page and interferes with text on the other side. In this talk I will present recent work on a new framework for bleed-through removal including image preprocessing, region classification based on a segmentation of the 2d recto-verso intensity histogram and connected component analysis, and finally restoration of the degraded regions using exemplar-based image inpainting.

to:
Added lines 40-50:


Speaker Roisin Rowley-Brooke
Title A non-parametric approach to document bleed-through removal. (aka the ink is greener from the other side..)
Time & Venue Printing House Hall - 12:00 14-Nov-12


Abstract

Ink bleed-through degradation poses one of the most difficult problems in document restoration. It occurs when ink has seeped through from one side of the page and interferes with text on the other side. In this talk I will present recent work on a new framework for bleed-through removal including image preprocessing, region classification based on a segmentation of the 2d recto-verso intensity histogram and connected component analysis, and finally restoration of the degraded regions using exemplar-based image inpainting.


November 12, 2012 by 134.226.86.54 -
Deleted line 21:
Changed lines 27-29 from:
Speaker Francois Pitie and Gary Baugh
Title 2D to 3D Conversion For Animated Movies
Time & Venue AAP 2.0.2 - 12:00 31-Oct-12
to:
Speaker Roisin Rowley-Brooke
Title A non-parametric approach to document bleed-through removal. (aka the ink is greener from the other side..)
Time & Venue Printing House Hall - 12:00 14-Nov-12
Changed lines 33-35 from:

In this talk we will present our research on developing postproduction tools for converting animated movies to stereoscopic 3D. The key to the stereoscopic 3D conversion it to utilise the depth information, that is generated for free by the animation software, to synthesize a novel left and right views of the scene. We will present our results (in 3D) and detail some of the image processing challenges of this approach.

to:

Ink bleed-through degradation poses one of the most difficult problems in document restoration. It occurs when ink has seeped through from one side of the page and interferes with text on the other side. In this talk I will present recent work on a new framework for bleed-through removal including image preprocessing, region classification based on a segmentation of the 2d recto-verso intensity histogram and connected component analysis, and finally restoration of the degraded regions using exemplar-based image inpainting.

Added lines 40-51:


Speaker Francois Pitie and Gary Baugh
Title 2D to 3D Conversion For Animated Movies
Time & Venue AAP 2.0.2 - 12:00 31-Oct-12


Abstract

In this talk we will present our research on developing postproduction tools for converting animated movies to stereoscopic 3D. The key to the stereoscopic 3D conversion it to utilise the depth information, that is generated for free by the animation software, to synthesize a novel left and right views of the scene. We will present our results (in 3D) and detail some of the image processing challenges of this approach.


October 25, 2012 by 134.226.86.54 -
Changed lines 28-30 from:
Speaker Andrew Hines, Naomi Harte and Frank Boland
Title Sigmedia's European Research Networks: COST actions
Time & Venue Printing House Hall - 12:00 24-Oct-12
to:
Speaker Francois Pitie and Gary Baugh
Title 2D to 3D Conversion For Animated Movies
Time & Venue AAP 2.0.2 - 12:00 31-Oct-12
Changed lines 34-45 from:

European Cooperation in Science and Technology (COST) – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. Sigmedia members are currently representing Ireland and participating in three COST actions. This seminar will give a brief introduction of each of the actions.

- ICT COST Action IC1105 Frank Boland 3D-ConTourNet - 3D Content Creation, Coding and Transmission over Future Media Networks

- ICT COST Action IC1106 Naomi Harte Integrating Biometrics and Forensics for the Digital Age

- ICT COST Action IC1003 Andrew Hines European Network on Quality of Experience in Multimedia Systems and Services (QUALINET)

to:

In this talk we will present our research on developing postproduction tools for converting animated movies to stereoscopic 3D. The key to the stereoscopic 3D conversion it to utilise the depth information, that is generated for free by the animation software, to synthesize a novel left and right views of the scene. We will present our results (in 3D) and detail some of the image processing challenges of this approach.

Added lines 42-66:


Speaker Andrew Hines, Naomi Harte and Frank Boland
Title Sigmedia's European Research Networks: COST actions
Time & Venue Printing House Hall - 12:00 24-Oct-12


Abstract

European Cooperation in Science and Technology (COST) – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. Sigmedia members are currently representing Ireland and participating in three COST actions. This seminar will give a brief introduction of each of the actions.

- ICT COST Action IC1105 Frank Boland 3D-ConTourNet - 3D Content Creation, Coding and Transmission over Future Media Networks

- ICT COST Action IC1106 Naomi Harte Integrating Biometrics and Forensics for the Digital Age

- ICT COST Action IC1003 Andrew Hines European Network on Quality of Experience in Multimedia Systems and Services (QUALINET)



October 18, 2012 by 134.226.86.54 -
Changed line 34 from:

COST – European Cooperation in Science and Technology – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe.

to:

European Cooperation in Science and Technology (COST) – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe.

October 18, 2012 by 134.226.86.54 -
Changed line 37 from:

\paragraph ICT COST Action IC1105 Frank Boland

to:

- ICT COST Action IC1105 Frank Boland

Changed line 40 from:

\paragraph ICT COST Action IC1106 Naomi Harte

to:

- ICT COST Action IC1106 Naomi Harte

Changed line 43 from:

\paragraph ICT COST Action IC1003 Andrew Hines

to:

- ICT COST Action IC1003 Andrew Hines

October 18, 2012 by 134.226.86.54 -
Changed line 11 from:
24-Oct-12TBD
to:
24-Oct-12Andrew Hines, Naomi Harte, Frank Boland
Added lines 26-50:


Speaker Andrew Hines, Naomi Harte and Frank Boland
Title Sigmedia's European Research Networks: COST actions
Time & Venue Printing House Hall - 12:00 24-Oct-12


Abstract

COST – European Cooperation in Science and Technology – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. Sigmedia members are currently representing Ireland and participating in three COST actions. This seminar will give a brief introduction of each of the actions.

\paragraph ICT COST Action IC1105 Frank Boland 3D-ConTourNet - 3D Content Creation, Coding and Transmission over Future Media Networks

\paragraph ICT COST Action IC1106 Naomi Harte Integrating Biometrics and Forensics for the Digital Age

\paragraph ICT COST Action IC1003 Andrew Hines European Network on Quality of Experience in Multimedia Systems and Services (QUALINET)



Past Talks

October 15, 2012 by 134.226.86.54 -
Deleted lines 22-25:

Next Scheduled Talk

Added lines 24-27:

Next Scheduled Talk


Changed line 39 from:


to:


October 15, 2012 by 134.226.86.54 -
Changed line 29 from:
Title Title
to:
Title Randomness in Acoustic Impulse Responses and its Effects on Factorization
Added lines 34-35:

Head-related impulse responses (HRIRs) contain all of the necessary auditory information required for convincing spatial audio reproduction. It was recently proposed that the HRIRs could be factorized via an iterative least squares algorithm to yield a direction independent component and a set of reduced order direction dependent components. However further studies showed this minimization problem to have multiple minima. This short talk will cover my work on determining why multiple solutions occur by exploring the inherent randomness in the responses themselves. Furthermore consideration is given to how this behaviour can be exploited for equalization problems.

Deleted line 39:
October 09, 2012 by 89.204.191.141 -
Changed line 10 from:
10-Oct-12Ian Kelly
to:
17-Oct-12Ian Kelly
Changed line 30 from:
Time & Venue Printing House Hall - 12:00 10-Oct-12
to:
Time & Venue Printing House Hall - 12:00 17-Oct-12
October 08, 2012 by 134.226.86.54 -
Changed line 15 from:
21-Nov-12TBD
to:
21-Nov-12Marcin Gorzel
October 08, 2012 by 134.226.86.54 -
Changed lines 3-4 from:

Next Scheduled Talk

to:
Changed lines 7-9 from:
Speaker Dr. Jean-Yves Guillemaut
Title Joint Multi-Layer Segmentation and Reconstruction for 3D-TV Content Production
Time & Venue Printing House Hall - 14:30 31st March 2011
to:
DateSpeaker(s)
10-Oct-12Ian Kelly
24-Oct-12TBD
31-Oct-12Dr. Francois Pitie
07-Nov-12Reading week
14-Nov-12Roisin Rowley-Brooke
21-Nov-12TBD
28-Nov-12Finnian Kelly
05-Dec-12Ken Sooknanan
12-Dec-12Yun Feng Wang
16-Jan-13Dr. David Corrigan

Next Scheduled Talk

Changed lines 28-34 from:
Abstract Current state-of-the-art image-based scene reconstruction techniques are capable of generating high-fidelity 3D models when used under controlled capture conditions. However, they are often inadequate when used in more challenging environments such as outdoor scenes with moving cameras. Algorithms must be able to cope with relatively large calibration and segmentation errors as well as input images separated by a wide-baseline and possibly captured at different resolutions.

In this talk, I will present a technique which, under these challenging conditions, is able to efficiently compute a high-quality scene representation via graph-cut optimisation of an energy function combining multiple image cues. Robustness is achieved by jointly optimising scene segmentation and multiple view reconstruction in a view-dependent manner with respect to each input camera. Joint optimisation prevents propagation of errors from segmentation to reconstruction as is often the case with sequential approaches. View-dependent processing increases tolerance to errors in through-the-lens calibration compared to global approaches.

Experimental results will be presented with a variety of challenging outdoor scenes captured with manually operated broadcast cameras as well as several indoor scenes with natural background. These datasets will be used to evaluate the accuracy of the technique for high quality segmentation and reconstruction and demonstrate its application for 3D-TV content production. Particularly, two main applications will be considered: free-viewpoint video, which gives a user the ability to freely control the viewpoint from which a video is rendered, and 3D video, which augments a conventional 2D video with depth information.

to:
Speaker Ian Kelly
Title Title
Time & Venue Printing House Hall - 12:00 10-Oct-12
Changed line 33 from:
Bio Jean-Yves Guillemaut is a Research Fellow in the Centre for Vision, Speech and Signal Processing, University of Surrey, U.K. His research interests includes free-viewpoint video and 3D TV, image/video-based scene reconstruction and rendering, image/video segmentation and matting, camera calibration, and active appearance models for face recognition. Currently, he is working on the i3DLive project, in collaboration with The Foundry and BBC R&D, addressing use of multiple camera systems for stereo production in film and broadcast. Previously, he worked on the iview project developing computer vision algorithms for 3D reconstruction and free-viewpoint video rendering in sports.
to:
Abstract
Added line 35:
Deleted lines 36-37:
Changed lines 38-166 from:


Previous Talks


23rd March 2011

Speaker Viliam Rapcan
Title Can changes in speech predict cognitive decline?


Abstract The biggest limiting factor to independence in older people is impaired cognitive function. While the population of the world is growing older, the burden on the health care providers is increasing. Less expensive and less labour intensive methods of cognitive function assessment are an active area of research. In this presentation, the use of speech as a biomarker for cognitive function will be presented together with the results of a clinic study of 189 elderly participants, and the results of a pilot study of an automated Interactive Voice Response (IVR) system for remote, fully automated delivery of cognitive function assessment tests.





16th March 2011

Speaker Kangyu Pan
Title CELLSNAKE : A new active contour technique for cell/fibre segmentation


Abstract Active contours are well known for object segmentation and widely adopted in various forms for biological image analysis. Most of the techniques are commonly based on object geometry but overlapping regions cause severe problems to contour propagation. In this paper, we propose a novel active contour technique (“cellsnake”) for solving this problem with an application to cell and fibre segmentation. Given that the transparency of overlapped objects is unavailable, we present a new set of contour forces derived from a-priori knowledge of cell geometry that allows the contour to deform correctly in those regions. We have combined these terms with other existing forces and we show that cellsnake gives appropriate shape estimation of the objects especially in the overlapped area in the observed images.





2nd March 2011

Speaker Finian Kelly
Title Effects of Ageing on Long-Term Speaker Verification


Abstract The changes that occur in the human voice due to ageing have been well documented. The impact these changes have on speaker verification is unclear however. Given the increasing prevalence of biometric technology, it is important to quantify this impact. This presentation will describe a preliminary investigation into the effect of long-term vocal ageing on a speaker verification system.

On a cohort of 13 adult speakers, using a conventional verification system, longitudinal testing of each speaker is carried out across a 30-40 year range. A progressive degradation in verification score is observed as the time span between the training and test material increases. Above a time span of 5 years, this degradation exceeds the range of normal inter-session variability. The age of the speaker at the time of training is shown to influence the rate at which the verification scores degrade. Our results suggest that the verification score drop-off accelerates for speakers over the age of 60. The implications of these findings for speaker verification will be discussed along with directions of future work.



9th Februaury 2011

Speaker Claire Masterson
Title Binaural Impulse Response Rendering for Immersive Audio


Abstract This talk will cover the main tenets of my PhD work in spatial audio

reproduction. This includes a method for the factorisation of datasets of head related impulse responses (HRIRs) using a least squares approach as well as a number of regularisation strategies to enable for more psychoacoustically meaningful, initial-condition independent results to be obtained for various types of HRIR data. A technique for the spatial interpolation of room impulse responses using dynamic time warping and tail synthesis will also be covered. The incorporation of both techniques into an overall spatial audio system using the virtual loudspeaker approach will be described.



2nd February 2011

Speaker Damien Kelly
Title Voxel-based Viterbi Active Speaker Tracking (V-VAST) with Best View Selection for Video Lecture Post-production


Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.





26th January 2011

Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition


Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.





19th January 2011

Speaker Felix Raimbault
Title Stereo Video Inpainting


Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.





14th December 2010

Speaker Andrew Hines
Title Speech Intelligibility Prediction using a Simulated Performance Intensity Function


Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.





7th December 2010

Speaker Mohamed Ahmed
Title Reflection Detection in Image Sequences


Abstract/Details

Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.



12th October 2010

Speaker Bruno Nicoletti
Title Developing VFX for Film and Video on GPUs


Abstract/Details

In the visual effects world, London-based award-winning firm The Foundry is renowned for its software. Bruno Nicoletti, founder and CTO of The Foundry, speed-talked through a tour of the company’s tools and software, demonstrating to an audience with a healthy population of VFX artists and developers how GPUs are changing the industry in “Developing GPU-Enabled Visual Effects for Film and Video.”

Foundry technology has been used in a host of blockbusters, such as Avatar, Harry Potter, The Dark Knight and many, many others, and its Nuke compositing software has been used for everything from the fantastic (CGI castles) to the mundane (complexion correction).

As a leader in the industry, Nicoletti has an invaluable perspective on the changes that GPUs are making in VFX. GPUs are reducing rendering times and allowing VFX to be involved more pervasively in all stages of production, in effect blurring the line between post production and production.

The popularity of utilizing the power of GPUs in the visual effects (VFX) industry continues to gain momentum. Major film production studios that historically have been CPU-based for VFX are not only utilizing GPUs, they are starting to replace their CPU-based rendering systems with GPU-based one.

This transition to GPU in VFX, however, requires some legwork, particularly when it comes to the complex image processing algorithms in VFX software. This (along with The Foundry’s solution) was the subject of the second half of Nicoletti’s talk.

With hundreds of effects and millions of lines of code in its software, The Foundry was faced with having to rewrite everything to exploit GPUs while maintaining separate algorithms for CPUs. Faced with the prospect of writing and debugging two sets of complex algorithms, The Foundry created something they’re calling Blink (although Nicoletti used its internal code name of RIP, or “Righteous Image Processing”).

Blink wraps image processing up into a high level C++ API. It lets programmers run kernels on the CPU for debugging, and then those kernels can be translated to spit out GPU CUDA. Nicoletti showed several coding examples and wrapped by showing examples of a motion estimation function run on an Intel Xeon 5504 versus an NVIDIA Quadro 5000. The speed difference was extraordinary (from 5fps to more than 200fps), which augurs for increased demand for VFX on GPU – and Blink.

Bio

Bruno Nicoletti has worked in visual effects since graduating with a degree in Computer Science and Mathematics from Sydney University in 1987. He has worked at production companies, creating visual effects for broadcast and film, as well as at commercial software companies, developing software to sell into visual effects companies. In his career he has developed 2D image processing software, 3D animation, rendering and modelling tools, often before any equivalent tools were commercially available. In 1996 he started The Foundry to develop visual effects plug-ins and oversaw it's initial growth. The Foundry now develops and sells a range of applications and plugins for VFX which are used in may feature films and TV programmes. Now CTO, he acts as senior engineer at the company and is overseeing the effort to move The Foundry's software to a new image processing frameworks that can exploit CPUs and GPUs to yield dramatic speed improvements.

Upcoming Speakers


DateSpeaker(s)
7th FebruaryClaire Masterson
to:
October 08, 2012 by 134.226.86.54 -
October 08, 2012 by 134.226.86.54 -
May 02, 2012 by 134.226.86.54 -
Deleted lines 1-23:

(:table width=100%:) (:cell valign=middle:)

(:cell valign=middle:)

(:cell valign=middle:)

(:cell valign=middle:)

(:cellnr valign=middle:) Original (:cell valign=middle:) Target Palette (:cell valign=middle:) Results (:cell valign=middle:) Results (:tableend:)

May 02, 2012 by 134.226.86.54 -
Changed line 7 from:
to:
Changed line 9 from:
to:
Changed line 11 from:
to:
Changed line 13 from:
to:
May 02, 2012 by 134.226.86.54 -
Changed line 7 from:
to:
Changed line 9 from:
to:
Changed line 11 from:
to:
May 02, 2012 by 134.226.86.54 -
Changed line 7 from:
to:
Changed line 9 from:
to:
Changed line 11 from:
to:
May 02, 2012 by 134.226.86.54 -
Changed line 7 from:
to:
Changed line 9 from:
to:
Changed line 11 from:
to:
Changed line 13 from:
to:
May 02, 2012 by 134.226.86.54 -
Changed line 5 from:

(:table align=center:)

to:

(:table width=100%:)

May 02, 2012 by 134.226.86.54 -
Changed line 7 from:
to:
Changed line 9 from:
to:
Changed line 11 from:
to:
Changed line 13 from:
to:
May 02, 2012 by 134.226.86.54 -
Added lines 2-24:

(:table align=center:) (:cell valign=middle:) Attach::original_mountain.jpg (:cell valign=middle:) Attach::traget_plain.jpg (:cell valign=middle:) Attach::result_mountain.jpg (:cell valign=middle:) Attach::result_mountain.jpg (:cellnr valign=middle:) Original (:cell valign=middle:) Target Palette (:cell valign=middle:) Results (:cell valign=middle:) Results (:tableend:)

March 28, 2011 by 134.226.86.54 -
Changed line 6 from:
Speaker Jean-Yves Guillemaut
to:
Speaker Dr. Jean-Yves Guillemaut
March 28, 2011 by 134.226.86.54 -
Changed lines 7-8 from:
Title TBC
Time & Venue Printing House Hall - time tbc 31st March 2011
to:
Title Joint Multi-Layer Segmentation and Reconstruction for 3D-TV Content Production
Time & Venue Printing House Hall - 14:30 31st March 2011
Changed lines 11-16 from:
Abstract TBC
to:
Abstract Current state-of-the-art image-based scene reconstruction techniques are capable of generating high-fidelity 3D models when used under controlled capture conditions. However, they are often inadequate when used in more challenging environments such as outdoor scenes with moving cameras. Algorithms must be able to cope with relatively large calibration and segmentation errors as well as input images separated by a wide-baseline and possibly captured at different resolutions.

In this talk, I will present a technique which, under these challenging conditions, is able to efficiently compute a high-quality scene representation via graph-cut optimisation of an energy function combining multiple image cues. Robustness is achieved by jointly optimising scene segmentation and multiple view reconstruction in a view-dependent manner with respect to each input camera. Joint optimisation prevents propagation of errors from segmentation to reconstruction as is often the case with sequential approaches. View-dependent processing increases tolerance to errors in through-the-lens calibration compared to global approaches.

Experimental results will be presented with a variety of challenging outdoor scenes captured with manually operated broadcast cameras as well as several indoor scenes with natural background. These datasets will be used to evaluate the accuracy of the technique for high quality segmentation and reconstruction and demonstrate its application for 3D-TV content production. Particularly, two main applications will be considered: free-viewpoint video, which gives a user the ability to freely control the viewpoint from which a video is rendered, and 3D video, which augments a conventional 2D video with depth information.

Changed line 19 from:
Bio TBC
to:
Bio Jean-Yves Guillemaut is a Research Fellow in the Centre for Vision, Speech and Signal Processing, University of Surrey, U.K. His research interests includes free-viewpoint video and 3D TV, image/video-based scene reconstruction and rendering, image/video segmentation and matting, camera calibration, and active appearance models for face recognition. Currently, he is working on the i3DLive project, in collaboration with The Foundry and BBC R&D, addressing use of multiple camera systems for stereo production in film and broadcast. Previously, he worked on the iview project developing computer vision algorithms for 3D reconstruction and free-viewpoint video rendering in sports.
March 24, 2011 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Kangyu Pan
Title CELLSNAKE : A new active contour technique for cell/fibre segmentation
Time & Venue Printing House Hall - 11:30am 16th March 2011
to:
Speaker Jean-Yves Guillemaut
Title TBC
Time & Venue Printing House Hall - time tbc 31st March 2011
Changed line 11 from:
Abstract Active contours are well known for object segmentation and widely adopted in various forms for biological image analysis. Most of the techniques are commonly based on object geometry but overlapping regions cause severe problems to contour propagation. In this paper, we propose a novel active contour technique (“cellsnake”) for solving this problem with an application to cell and fibre segmentation. Given that the transparency of overlapped objects is unavailable, we present a new set of contour forces derived from a-priori knowledge of cell geometry that allows the contour to deform correctly in those regions. We have combined these terms with other existing forces and we show that cellsnake gives appropriate shape estimation of the objects especially in the overlapped area in the observed images.
to:
Abstract TBC
Added lines 13-14:
Bio TBC
Deleted lines 15-16:
Added lines 17-18:
Added line 20:


Deleted line 21:


March 24, 2011 by 134.226.86.54 -
Added lines 20-39:


23rd March 2011

Speaker Viliam Rapcan
Title Can changes in speech predict cognitive decline?


Abstract The biggest limiting factor to independence in older people is impaired cognitive function. While the population of the world is growing older, the burden on the health care providers is increasing. Less expensive and less labour intensive methods of cognitive function assessment are an active area of research. In this presentation, the use of speech as a biomarker for cognitive function will be presented together with the results of a clinic study of 189 elderly participants, and the results of a pilot study of an automated Interactive Voice Response (IVR) system for remote, fully automated delivery of cognitive function assessment tests.





16th March 2011

Speaker Kangyu Pan
Title CELLSNAKE : A new active contour technique for cell/fibre segmentation


Abstract Active contours are well known for object segmentation and widely adopted in various forms for biological image analysis. Most of the techniques are commonly based on object geometry but overlapping regions cause severe problems to contour propagation. In this paper, we propose a novel active contour technique (“cellsnake”) for solving this problem with an application to cell and fibre segmentation. Given that the transparency of overlapped objects is unavailable, we present a new set of contour forces derived from a-priori knowledge of cell geometry that allows the contour to deform correctly in those regions. We have combined these terms with other existing forces and we show that cellsnake gives appropriate shape estimation of the objects especially in the overlapped area in the observed images.




March 14, 2011 by 134.226.86.54 -
Changed line 6 from:
Speaker Finian Kelly
to:
Speaker Kangyu Pan
March 14, 2011 by 134.226.86.54 -
Changed lines 11-13 from:
Abstract The changes that occur in the human voice due to ageing have been well documented. The impact these changes have on speaker verification is unclear however. Given the increasing prevalence of biometric technology, it is important to quantify this impact. This presentation will describe a preliminary investigation into the effect of long-term vocal ageing on a speaker verification system.

On a cohort of 13 adult speakers, using a conventional verification system, longitudinal testing of each speaker is carried out across a 30-40 year range. A progressive degradation in verification score is observed as the time span between the training and test material increases. Above a time span of 5 years, this degradation exceeds the range of normal inter-session variability. The age of the speaker at the time of training is shown to influence the rate at which the verification scores degrade. Our results suggest that the verification score drop-off accelerates for speakers over the age of 60. The implications of these findings for speaker verification will be discussed along with directions of future work.

to:
Abstract Active contours are well known for object segmentation and widely adopted in various forms for biological image analysis. Most of the techniques are commonly based on object geometry but overlapping regions cause severe problems to contour propagation. In this paper, we propose a novel active contour technique (“cellsnake”) for solving this problem with an application to cell and fibre segmentation. Given that the transparency of overlapped objects is unavailable, we present a new set of contour forces derived from a-priori knowledge of cell geometry that allows the contour to deform correctly in those regions. We have combined these terms with other existing forces and we show that cellsnake gives appropriate shape estimation of the objects especially in the overlapped area in the observed images.
March 14, 2011 by 134.226.86.54 -
Changed lines 7-8 from:
Title Effects of Ageing on Long-Term Speaker Verification
Time & Venue AAP 3.19 - 11:30am 2nd March 2011
to:
Title CELLSNAKE : A new active contour technique for cell/fibre segmentation
Time & Venue Printing House Hall - 11:30am 16th March 2011
Added lines 22-33:


2nd March 2011

Speaker Finian Kelly
Title Effects of Ageing on Long-Term Speaker Verification


Abstract The changes that occur in the human voice due to ageing have been well documented. The impact these changes have on speaker verification is unclear however. Given the increasing prevalence of biometric technology, it is important to quantify this impact. This presentation will describe a preliminary investigation into the effect of long-term vocal ageing on a speaker verification system.

On a cohort of 13 adult speakers, using a conventional verification system, longitudinal testing of each speaker is carried out across a 30-40 year range. A progressive degradation in verification score is observed as the time span between the training and test material increases. Above a time span of 5 years, this degradation exceeds the range of normal inter-session variability. The age of the speaker at the time of training is shown to influence the rate at which the verification scores degrade. Our results suggest that the verification score drop-off accelerates for speakers over the age of 60. The implications of these findings for speaker verification will be discussed along with directions of future work.


March 02, 2011 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Claire Masterson
Title Binaural Impulse Response Rendering for Immersive Audio
Time & Venue Printing House Hall - 11:30am 2nd February 2011
to:
Speaker Finian Kelly
Title Effects of Ageing on Long-Term Speaker Verification
Time & Venue AAP 3.19 - 11:30am 2nd March 2011
Changed lines 11-12 from:
Abstract This talk will cover the main tenets of my PhD work in spatial audio

reproduction. This includes a method for the factorisation of datasets of head related impulse responses (HRIRs) using a least squares approach as well as a number of regularisation strategies to enable for more psychoacoustically meaningful, initial-condition independent results to be obtained for various types of HRIR data. A technique for the spatial interpolation of room impulse responses using dynamic time warping and tail synthesis will also be covered. The incorporation of both techniques into an overall spatial audio system using the virtual loudspeaker approach will be described.

to:
Abstract The changes that occur in the human voice due to ageing have been well documented. The impact these changes have on speaker verification is unclear however. Given the increasing prevalence of biometric technology, it is important to quantify this impact. This presentation will describe a preliminary investigation into the effect of long-term vocal ageing on a speaker verification system.

On a cohort of 13 adult speakers, using a conventional verification system, longitudinal testing of each speaker is carried out across a 30-40 year range. A progressive degradation in verification score is observed as the time span between the training and test material increases. Above a time span of 5 years, this degradation exceeds the range of normal inter-session variability. The age of the speaker at the time of training is shown to influence the rate at which the verification scores degrade. Our results suggest that the verification score drop-off accelerates for speakers over the age of 60. The implications of these findings for speaker verification will be discussed along with directions of future work.

Changed lines 22-24 from:

2nd January 2011

Speaker Damien Kelly
Title Voxel-based Viterbi Active Speaker Tracking (V-VAST) with Best View Selection for Video Lecture Post-production
to:


9th Februaury 2011

Speaker Claire Masterson
Title Binaural Impulse Response Rendering for Immersive Audio


Abstract This talk will cover the main tenets of my PhD work in spatial audio

reproduction. This includes a method for the factorisation of datasets of head related impulse responses (HRIRs) using a least squares approach as well as a number of regularisation strategies to enable for more psychoacoustically meaningful, initial-condition independent results to be obtained for various types of HRIR data. A technique for the spatial interpolation of room impulse responses using dynamic time warping and tail synthesis will also be covered. The incorporation of both techniques into an overall spatial audio system using the virtual loudspeaker approach will be described.



2nd February 2011

February 07, 2011 by 134.226.86.54 -
Changed lines 26-30 from:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the

active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.

to:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.
February 07, 2011 by 134.226.86.54 -
Changed lines 26-27 from:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech

activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the

to:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the
February 07, 2011 by 134.226.86.54 -
Changed lines 6-7 from:
Speaker Damien Kelly
Title Voxel-based Viterbi Active Speaker Tracking (V-VAST) with Best View Selection for Video Lecture Post-production
to:
Speaker Claire Masterson
Title Binaural Impulse Response Rendering for Immersive Audio
Changed lines 11-17 from:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based

analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.

to:
Abstract This talk will cover the main tenets of my PhD work in spatial audio

reproduction. This includes a method for the factorisation of datasets of head related impulse responses (HRIRs) using a least squares approach as well as a number of regularisation strategies to enable for more psychoacoustically meaningful, initial-condition independent results to be obtained for various types of HRIR data. A technique for the spatial interpolation of room impulse responses using dynamic time warping and tail synthesis will also be covered. The incorporation of both techniques into an overall spatial audio system using the virtual loudspeaker approach will be described.

Changed lines 26-27 from:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based

analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech

to:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech
Changed line 108 from:
2nd FebruaryDamien Kelly
to:
7th FebruaryClaire Masterson
February 07, 2011 by 134.226.86.54 -
Changed lines 11-12 from:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video

containing a best view summary of active speakers. The system uses skin color detection and voxel-based

to:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based
Added lines 25-40:


2nd January 2011

Speaker Damien Kelly
Title Voxel-based Viterbi Active Speaker Tracking (V-VAST) with Best View Selection for Video Lecture Post-production


Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video containing a best view summary of active speakers. The system uses skin color detection and voxel-based

analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.


January 31, 2011 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition
Time & Venue Printing House Hall - 11:30am 26th January 2011
to:
Speaker Damien Kelly
Title Voxel-based Viterbi Active Speaker Tracking (V-VAST) with Best View Selection for Video Lecture Post-production
Time & Venue Printing House Hall - 11:30am 2nd February 2011
Changed lines 11-18 from:
Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.
to:
Abstract An automated system is presented for reducing a multi-view lecture recording into a single view video

containing a best view summary of active speakers. The system uses skin color detection and voxel-based analysis in locating likely speaker locations. Using time-delay estimates from multiple microphones, speech activity is analyzed for each speaker position. The Viterbi algorithm is then used to estimate a track of the active speaker. This track is determined as that which maximizes the observed speech activity. This novel approach is termed Voxel-based Viterbi Active Speaker Tracking (V-VAST) and is shown to track speakers with an accuracy of 0.23m. Using this tracking information, the system is applied as a post-production step to segment the most frontal face view of active speakers from the available camera views.

Added lines 26-35:


26th January 2011

Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition


Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.




January 25, 2011 by 134.226.86.54 -
Changed lines 82-83 from:
7th DecemberMohamed
14th DecemberAndrew Hines
to:
2nd FebruaryDamien Kelly
January 24, 2011 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Felix Raimbault
Title Stereo Video Inpainting
Time & Venue CAD Lab - 2:00pm 19th January 2011
to:
Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition
Time & Venue Printing House Hall - 11:30am 26th January 2011
Changed line 11 from:
Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.
to:
Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.
Deleted lines 12-14:
Added lines 14-15:
Added line 17:


Changed lines 21-23 from:
Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition
Time & Venue Printing House Hall - 11:30am 26th January 2011
to:
Speaker Felix Raimbault
Title Stereo Video Inpainting
Changed lines 25-26 from:
Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if

state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.

to:
Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.
January 24, 2011 by 134.226.86.54 -
Added lines 39-40:



January 24, 2011 by 134.226.86.54 -
Deleted line 11:
Added lines 19-30:


19th January 2011

Speaker Luca Cappelletta
Title Improved Visual Features for Audio-visual Speech Recognition
Time & Venue Printing House Hall - 11:30am 26th January 2011


Abstract Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into an input device (microphone, telephone, etc) by analyzing the audio signal. In the past years the technology achieved remarkable results, even if

state of the art ASR systems lag human speech perception by up one order of magnitude. A major factor affecting ASR is the signal to noise ratio: in a noisy environment, automatic speech recognition suffers a huge loss in performance. However, is has been proved that human speech production is bimodal by its nature. Moreover, hearing impaired people utilize lipreading in order to improve their speech perception. Thus, it is possible to include visual cues in order to improve ASR. The combination of audio and visual cues forms the so called Audio-Visual Speech Recognition, or AVSR. The main topic of this research is the video branch of a AVSR system. Particularly 'Region of Interest' definition and detection, visual feature extraction and finally visual-only ASR.


January 17, 2011 by 134.226.86.54 -
Changed lines 11-12 from:
Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing

flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.

to:
Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.
January 17, 2011 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Andrew Hines
Title Speech Intelligibility Prediction using a Simulated Performance Intensity Function
Time & Venue Printing House Hall - 12:00pm 14th December
to:
Speaker Felix Raimbault
Title Stereo Video Inpainting
Time & Venue CAD Lab - 2:00pm 19th January 2011
Changed lines 11-13 from:
Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.
to:
Abstract As the production of stereoscopic content increases, so does the need for post-production tools for that content. Video inpainting has become an important tool for rig removal but there has been little consideration of the problem in stereo. This paper presents an algorithm for stereo video inpainting that builds on existing exemplar-based video completion and also considers the issues of view consistency. Given user selected regions in the sequence which may be in the same location in several frames and in both views, the objective is to ll in this area using all the available picture information. Existing algorithms lack temporal consistency, causing

flickering and other artefacts. This paper explores the use of long-term picture information across many frames in order to achieve temporal consistency at the same time as exploiting inter-view dependencies within the same framework.

January 17, 2011 by 134.226.86.54 -
Added lines 19-26:


14th December 2010

Speaker Andrew Hines
Title Speech Intelligibility Prediction using a Simulated Performance Intensity Function


Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.


January 17, 2011 by 134.226.86.54 -
Changed lines 29-31 from:

Upcoming Speakers

to:


12th October 2010

Speaker Bruno Nicoletti
Title Developing VFX for Film and Video on GPUs


Abstract/Details

In the visual effects world, London-based award-winning firm The Foundry is renowned for its software. Bruno Nicoletti, founder and CTO of The Foundry, speed-talked through a tour of the company’s tools and software, demonstrating to an audience with a healthy population of VFX artists and developers how GPUs are changing the industry in “Developing GPU-Enabled Visual Effects for Film and Video.”

Foundry technology has been used in a host of blockbusters, such as Avatar, Harry Potter, The Dark Knight and many, many others, and its Nuke compositing software has been used for everything from the fantastic (CGI castles) to the mundane (complexion correction).

As a leader in the industry, Nicoletti has an invaluable perspective on the changes that GPUs are making in VFX. GPUs are reducing rendering times and allowing VFX to be involved more pervasively in all stages of production, in effect blurring the line between post production and production.

The popularity of utilizing the power of GPUs in the visual effects (VFX) industry continues to gain momentum. Major film production studios that historically have been CPU-based for VFX are not only utilizing GPUs, they are starting to replace their CPU-based rendering systems with GPU-based one.

This transition to GPU in VFX, however, requires some legwork, particularly when it comes to the complex image processing algorithms in VFX software. This (along with The Foundry’s solution) was the subject of the second half of Nicoletti’s talk.

With hundreds of effects and millions of lines of code in its software, The Foundry was faced with having to rewrite everything to exploit GPUs while maintaining separate algorithms for CPUs. Faced with the prospect of writing and debugging two sets of complex algorithms, The Foundry created something they’re calling Blink (although Nicoletti used its internal code name of RIP, or “Righteous Image Processing”).

Blink wraps image processing up into a high level C++ API. It lets programmers run kernels on the CPU for debugging, and then those kernels can be translated to spit out GPU CUDA. Nicoletti showed several coding examples and wrapped by showing examples of a motion estimation function run on an Intel Xeon 5504 versus an NVIDIA Quadro 5000. The speed difference was extraordinary (from 5fps to more than 200fps), which augurs for increased demand for VFX on GPU – and Blink.

Bio

Bruno Nicoletti has worked in visual effects since graduating with a degree in Computer Science and Mathematics from Sydney University in 1987. He has worked at production companies, creating visual effects for broadcast and film, as well as at commercial software companies, developing software to sell into visual effects companies. In his career he has developed 2D image processing software, 3D animation, rendering and modelling tools, often before any equivalent tools were commercially available. In 1996 he started The Foundry to develop visual effects plug-ins and oversaw it's initial growth. The Foundry now develops and sells a range of applications and plugins for VFX which are used in may feature films and TV programmes. Now CTO, he acts as senior engineer at the company and is overseeing the effort to move The Foundry's software to a new image processing frameworks that can exploit CPUs and GPUs to yield dramatic speed improvements.

December 15, 2010 by 134.226.86.54 -
Changed line 7 from:
Title Speech Intelligibility prediction using a Neurogram Quality Index Measure
to:
Title Speech Intelligibility Prediction using a Simulated Performance Intensity Function
December 15, 2010 by 134.226.86.54 -
Changed lines 11-13 from:
Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. This paper presents a Neurogram Quality Index Measure (NQIM) that automates this inspection process, and translates the response pattern differences into a bounded discrimination metric.

The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. The newly developed NQIM was used to evaluate the model outputs in response to CVC word lists and produce phoneme discrimination scores. The simulated results are rigorously compared to those from normal hearing subjects. The accuracy of the tests and the minimum number of word lists necessary for repeatable results is established. The experiments demonstrate that the proposed Simulated Performance Intensity Function (SPIF) produces results with confidence intervals within the human error bounds expected with real listener tests. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.

to:
Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.
December 10, 2010 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Mohamed Ahmed
Title Reflection Detection in Image Sequences
Time & Venue Printing House Hall - 12:00pm 7th December
to:
Speaker Andrew Hines
Title Speech Intelligibility prediction using a Neurogram Quality Index Measure
Time & Venue Printing House Hall - 12:00pm 14th December
Changed lines 11-13 from:
Abstract Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.
to:
Abstract Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. This paper presents a Neurogram Quality Index Measure (NQIM) that automates this inspection process, and translates the response pattern differences into a bounded discrimination metric.

The Performance Intensity Function is a standard listener test that evaluates a test subject’s phoneme discrimination performance over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. The newly developed NQIM was used to evaluate the model outputs in response to CVC word lists and produce phoneme discrimination scores. The simulated results are rigorously compared to those from normal hearing subjects. The accuracy of the tests and the minimum number of word lists necessary for repeatable results is established. The experiments demonstrate that the proposed Simulated Performance Intensity Function (SPIF) produces results with confidence intervals within the human error bounds expected with real listener tests. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.

Added lines 20-31:

Previous Talks


7th December 2010

Speaker Mohamed Ahmed
Title Reflection Detection in Image Sequences


Abstract/Details

Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.

December 07, 2010 by 134.226.86.54 -
Changed line 25 from:
14th DecemberTBA
to:
14th DecemberAndrew Hines
December 06, 2010 by 134.226.86.54 -
Changed lines 24-25 from:
2nd JuneFinian/Ken
16th JuneLuca/Felix
to:
7th DecemberMohamed
14th DecemberTBA
December 06, 2010 by 134.226.86.54 -
Changed lines 11-12 from:
Abstract Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion

estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.

to:
Abstract Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.
December 06, 2010 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Ken Sooknanan
Title Content Analysis of Underwater Video Sequences for Automatic Seabed Change Detection
Time & Venue Printing House Hall - 11:30am 2nd June
to:
Speaker Mohamed Ahmed
Title Reflection Detection in Image Sequences
Time & Venue Printing House Hall - 12:00pm 7th December
Changed lines 11-12 from:
Abstract Underwater surveys normally involve the recording of long videos of the ocean bed. These recordings are then tediously analysed by a marine biologist to detect specific items of interest e.g. changes of the ocean bed etc. This project aims to simplify the analysing phase of these surveys, by automatically detecting the critical portions of the video when (if any) major changes in the seabed had occurred, and then summarizing these findings in short sequences to the user. In this talk, some of the problems encountered in analysing underwater video sequences are presented, along with some of the possible solutions investigated. Previous work involving content analysis specific to this area of underwater surveillance videos would also be discussed.
to:
Abstract Reflections in image sequences consist of several layers superimposed over each other. This phenomenon causes many image processing techniques to fail as they assume the presence of only one layer at each examined site e.g. motion

estimation and object recognition. Reflections can arise by mixing any two images and hence detecting them automatically remains a hard problem that was not addressed before. This work presents an automated technique for detecting reflections in image sequences by analyzing motion trajectories of feature points. We generate sparse and dense detection maps and our results show high detection rate with rejection to pathological motion, occlusion, and motion blur.

Changed lines 15-17 from:
Speaker Finnian Kelly
Title Speaker Verification for Biometrics
Time & Venue Printing House Hall - 11:30am 2nd June
to:
Deleted lines 17-289:
Abstract Biometrics involves the use of intrinsic physical or behavioural traits of humans to verify their identity. Speaker verification offers a less invasive alternative to iris and fingerprint authentication.

Typical applications involve banking by phone and access authentication. Key challenges include dealing with mismatched conditions and natural changes in the speaker's voice due to illness or ageing. This talk outlines the implementation of a Gaussian Mixture Model based speaker verification system. Different training strategies, including a split and merge approach are described. Results are presented on the YOHO Speaker Verification database. Normalization methods to deal with mismatched conditions are outlined. Some 'higher level' voice features and their potential to provide robustness to ageing are introduced.

Upcomming Talks

Previous Talks


26th May 2010

Speaker Marcin Gorzel
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH


Abstract/Details

Virtual acoustic recording refers to the capture of real-world acoustic performances in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original performance space, otherwise known as a Virtual Auditory Environment (VAE). An important aspect in the quest for realism in such auditory scene synthesis is user interaction. That is, how the movements of a person listening to the virtual auditory scene directly influences the scene presentation. Such 'walkthrough auralization’ presents several challenges for production engineers, the most significant of which is the generation of the correct room acoustic response due to a given source-listener position. In particular, the correct direction of arrival of the direct sound and early reflections must be maintained since these signals contain the most vital cues for localization of acoustic sources.


Speaker Róisín Rowley-Brooke
Title The Digital Restoration of Degraded Historical Manuscripts


Abstract/Details

Until fairly recently the only way of improving the readability of degraded manuscripts was through manual, and often destructive methods. The advancement of image processing has allowed for manuscript restoration techniques that avoid the possibility of doing further damage to document. The aim of this talk is to outline briefly the main types of degradation that can occur such as bleed through, fading and erosion of the writing medium due to poor storage. Methods for manuscript improvement will be discussed; the use of adaptive histogram equalization for enhancement of faded text, two approaches to modelling palimpsest (or bleed-through) documents, and also a method for recto/verso registration - an important pre-processing step in bleed-through removal.



19th May 2010

Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain


Abstract/Details

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.

Slides David's Slides





18th May 2010

Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.


Abstract/Details

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.



12th May 2010

Speaker Eric Risser
Title Texture Synthesis


Abstract/Details

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.



14th April 2010

Speaker Francois Pitie and Dan Ring
Title Nuke and Plugin Development


Abstract/Details This talk will outline some of the recent work in plugin development for the Nuke platform.




31st March 2010

Speaker Gary Baugh
Title Semi-automatic Motion Based Segmentation using Long Term Motion Trajectories


Abstract Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.




24th March 2010

Speaker Prof. Anil Kokaram
Details Anil gave an account about his recent visit to the west coast of the USA and how he talks about sigmedia when on tour.




4th March 2010

Speaker Andrew Hines
Title Measuring Sensorineural Hearing Loss with an Auditory Peripheral Model


Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs.




10th February 2010

Speaker Dr. JiWon Yoon
Title Bayesian Inference for Single Molecule Fluorescence Microscopic image processing


Abstract Using fluorescence microscopy with single-molecule sensitivity, it is now possible to follow to movement of individual fluoro-phore tagged molecules such as proteins and lipids in the cell membrane with nano-meter precision. Diffusion or directed motion of molecules on the cell can be investigated to elucidate the structure of the cell membrane by tracking the single molecules. There are mainly three steps in processing data and tracking the molecules from the sequential images: filtering (de-noising), spot detection and tracking. In this talk, we will give a presentation on both filtering and tracking techniques.

First of all, we have recently developed a robust de-noising algorithm in a Gibbs scheme. This algorithm embeds Gaussian Markov Random Field (GMRF) prior to explain the properties of the images. Since this algorithm is based on Bayesian framework, we do have few systematic parameters to be tuned. The performance of this algorithm is compared with several conventional approaches including Gaussian filter, Weiner filter and Wavelet filter.

We also developed several multi-target tracking algorithms in a Bayesian framework. Roughly, we will retrieve the concept of single target tracking and multi-target tracking. Then, the marginalized Markov Chain Monte Carlo Data Association (MCMCDA) which is originally proposed by Oh is presented. Marginalized MCMCDA is a fully off-line system and it infers most systematic parameters which are commonly fixed in tracking society.

Bio He received the B.Sc. degree in information engineering at the SungKyunKwan University, Korea. He obtained the M.Sc. degree in School of informatics at the University of Edinburgh UK in 2004 and the Ph.D. degree in signal processing group at the University of Cambridge UK in 2008 respectively. In 2008, he moved to department of Engineering science, the University of Oxford, UK to do postdoctoral research. He is currently a Research Fellow with Statistics department, Trinity College Dublin, Ireland. His research interests include Bayesian statistics, Machine Learning, data mining, Network Security and Biomedical engineering. He has worked on applications in brain signals, cosmology, biophysics and multimedia.




4th February 2010

Speaker Dr. Sid-Ahmed Berrani, Orange Telecom
Title TV Broadcast Analysis and Structuring: Advances and Challenges


Abstract In order to make use of the large number of TV broadcasts through novel services like Catch-up TV or TV-on-Demand, TV streams have to be precisely and automatically segmented, annotated and structured. The exact start time and the exact end time of each of the broadcasted programs have to be determined. Each extracted program has then to be classified, annotated and indexed.

The aim of this talk is to provide an overview of novel TV services and to highlight the need for powerful audio-visual content-based analysis techniques in order to build these services. Technical constraints will be also discussed. Our work toward building a fully automatic system for TV broadcast structuring will be then described. Finally, open issues and challenges will be presented.

Bio Sid-Ahmed Berrani received his Ph.D. in Computer Science in February 2004 from the University of Rennes 1, France. His Ph.D. work was carried out at INRIA, Rennes and was funded by Thomson R&D France. It was dedicated to similarity searches in very large image databases. The Ph.D. thesis of Sid-Ahmed Berrani received the SPECIF Award from the French Society of Education and Research in Computer Science. He then spent 6 months as a Research Fellow in the Sigmedia Group at the University of Dublin, Trinity College, where he worked on video indexing. Since November 2004, Sid-Ahmed Berrani has been a researcher at Orange Labs - France Telecom in Rennes, France. He is currently leading R&D activities on video indexing and analysis for media search services. In particular, he has focused on video analysis techniques for TV broadcast structuring and video fingerprinting.




15th January 2010

Speaker Prof. Nick Kingsbury, Cambridge University
Title Iterative Methods for 3-D Deconvolution with Overcomplete Transforms, such as Dual-Tree Complex Wavelets.


Abstract Overcomplete transforms, such as complex wavelets, can offer more flexible signal representations than critically-sampled transforms. They have been shown to perform well in image denoising benchmarks, and we have therefore been developing iterative wavelet-based regularisation algorithms for more demanding applications such as image and 3D-data deconvolution. In this talk we will briefly describe the characteristics of complex wavelets that make them well-suited to such tasks, and then we will describe an algorithm for wavelet-based 3-dimensional image deconvolution which employs subband-dependent minimization and the dual-tree wavelet transform in an iterative Bayesian framework. This algorithm employs a prior based on an extended Gaussian Scale Mixtures (GSM) model that approximates an L0-norm, instead of the conventional L1-norm, to provide a sparseness constraint in the wavelet domain. Hence it introduces spatially varying inter-scale information into the deconvolution process and thus achieves improved deconvolution results and faster convergence.


Bio Nick Kingsbury is Professor of Signal Processing at the University of Cambridge, Department of Engineering. He has worked in the areas of digital communications, audio analysis and coding, and image processing. He has developed the dual-tree complex wavelet transform and is especially interested in the application of complex wavelets and related multiscale and multiresolution methods to the analysis of images and 3-D datasets.




16th December 2009

Speaker Dan Ring
Title ICCV 2009 Review.


Slides Dan's Slides




10th December 2009

Speaker Prof. Anil Kokaram, Dr. Naomi Harte
Title Scientific Writing Forum.


Slides Naomi's Slides, Anil's Slides




4th December 2009

Speaker Stephen Adams
Title Binaural synthesis for Virtual Auditory Environments


Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems.




26th November 2009

Speaker Mohamed Ahmed
Title ICIP 2009 Review


Slides Mohamed's Slides




5th November 2009

Speaker Dr. Edward Jones, NUIG
Title Aspects of DSP Research at NUI Galway


Abstract In this presentation, Dr. Edward Jones will discuss some DSP-related projects currently underway in Electrical & Electronic Engineering at NUI Galway. The talk will cover speech-related projects in robust speech recognition, and audio quality assessment, as well as current biomedical signal processing research including the use of ultra wide band radar for the early detection of breast cancer.




29th October 2009

Speaker Darren Kavanagh
Title Speech Segmentation with Applications in e-Learning


Abstract His talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.




22nd October 2009

Speaker Kangyu Pan
Title Spot Analysis in Microscopy.


Abstract The formations of various memories require the present of different specify mRNP in the neuron cells. The development of fluorescent proteins and high resolution fluorescence imaging allow the biologists to locate the mRNP in a living specimen by the co-localization of the differently labeled protein markers. However, the quantitative interpretation of the labeled proteins is still heavily reliant on manual evaluation. In this talk, a novel shape modeling algorithm will be introduced for automating the detection and analysis of the proteins. The algorithm exploits a Gaussian mixture model to characterize the geometric information of the protein particles, and applies Split-and-Merge Expectation-Maximization alg0rithm for optimizing the parameters of the model.




15th October 2009

Speaker Mohamed Ahmed
Title Bayesian Inference for Transparent Blotch Removal.


Abstract Current blotch removal algorithms model the corruption as a binary mixture between the original, clean images and an opaque (dirt) field. This typically causes incomplete blotch removal that manifests as blotch haloes in reconstruction. The talk will start by introducing a new algorithm for removing blotches. The novelity of this algorithm is in treating blotches as semi-transparent objects. The second part of the the talk will discuss a quantitative approach for accessing the restoration quality of blotch removal algorithms. The idea here is to create a near ground-truth blotch mattes by exploring the transparency property of the supplied Infrared scans of blotches. Finally, the talk will conculude by providing a brief overview on current techniques for separating mixtures of natural images.



May 31, 2010 by 134.226.86.54 -
Changed lines 7-8 from:
Title Content Analysis of Underwater Video Sequences for Automatic Seabed

Change Detection

to:
Title Content Analysis of Underwater Video Sequences for Automatic Seabed Change Detection
May 31, 2010 by 134.226.86.54 -
Changed line 6 from:
Speaker Ken Sooknanan / Finnian Kelly
to:
Speaker Ken Sooknanan
Changed lines 15-17 from:
Speaker Ken Sooknanan / Finnian Kelly
Title Content Analysis of Underwater Video Sequences for Automatic Seabed

Change Detection

to:
Speaker Finnian Kelly
Title Speaker Verification for Biometrics
Changed lines 20-21 from:
Abstract Underwater surveys normally involve the recording of long videos of the ocean bed. These recordings are then tediously analysed by a marine biologist to detect specific items of interest e.g. changes of the ocean bed etc. This project aims to simplify the analysing phase of these surveys, by automatically detecting the critical portions of the video when (if any) major changes in the seabed had occurred, and then summarizing these findings in short sequences to the user. In this talk, some of the problems encountered in analysing underwater video sequences are presented, along with some of the possible solutions investigated. Previous work involving content analysis specific to this area of underwater surveillance videos would also be discussed.
to:
Abstract Biometrics involves the use of intrinsic physical or behavioural traits of humans to verify their identity. Speaker verification offers a less invasive alternative to iris and fingerprint authentication.

Typical applications involve banking by phone and access authentication. Key challenges include dealing with mismatched conditions and natural changes in the speaker's voice due to illness or ageing. This talk outlines the implementation of a Gaussian Mixture Model based speaker verification system. Different training strategies, including a split and merge approach are described. Results are presented on the YOHO Speaker Verification database. Normalization methods to deal with mismatched conditions are outlined. Some 'higher level' voice features and their potential to provide robustness to ageing are introduced.

May 31, 2010 by 134.226.86.54 -
Added lines 7-8:
Title Content Analysis of Underwater Video Sequences for Automatic Seabed

Change Detection

Added lines 11-23:
Abstract Underwater surveys normally involve the recording of long videos of the ocean bed. These recordings are then tediously analysed by a marine biologist to detect specific items of interest e.g. changes of the ocean bed etc. This project aims to simplify the analysing phase of these surveys, by automatically detecting the critical portions of the video when (if any) major changes in the seabed had occurred, and then summarizing these findings in short sequences to the user. In this talk, some of the problems encountered in analysing underwater video sequences are presented, along with some of the possible solutions investigated. Previous work involving content analysis specific to this area of underwater surveillance videos would also be discussed.


Speaker Ken Sooknanan / Finnian Kelly
Title Content Analysis of Underwater Video Sequences for Automatic Seabed

Change Detection

Time & Venue Printing House Hall - 11:30am 2nd June


Abstract Underwater surveys normally involve the recording of long videos of the ocean bed. These recordings are then tediously analysed by a marine biologist to detect specific items of interest e.g. changes of the ocean bed etc. This project aims to simplify the analysing phase of these surveys, by automatically detecting the critical portions of the video when (if any) major changes in the seabed had occurred, and then summarizing these findings in short sequences to the user. In this talk, some of the problems encountered in analysing underwater video sequences are presented, along with some of the possible solutions investigated. Previous work involving content analysis specific to this area of underwater surveillance videos would also be discussed.


May 26, 2010 by 134.226.86.54 -
Changed lines 15-16 from:

26th May


to:

26th May 2010

Changed line 37 from:

19th May

to:

19th May 2010

Changed line 61 from:

18th May

to:

18th May 2010

Changed line 73 from:

12th May

to:

12th May 2010

May 26, 2010 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Marcin Gorzel
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH
Time & Venue Printing House Hall - 11:30am 26th May
to:
Speaker Ken Sooknanan / Finnian Kelly
Time & Venue Printing House Hall - 11:30am 2nd June
Deleted lines 8-23:
Abstract/Details

Virtual acoustic recording refers to the capture of real-world acoustic performances in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original performance space, otherwise known as a Virtual Auditory Environment (VAE). An important aspect in the quest for realism in such auditory scene synthesis is user interaction. That is, how the movements of a person listening to the virtual auditory scene directly influences the scene presentation. Such 'walkthrough auralization’ presents several challenges for production engineers, the most significant of which is the generation of the correct room acoustic response due to a given source-listener position. In particular, the correct direction of arrival of the direct sound and early reflections must be maintained since these signals contain the most vital cues for localization of acoustic sources.


Speaker Róisín Rowley-Brooke
Title The Digital Restoration of Degraded Historical Manuscripts


Abstract/Details

Until fairly recently the only way of improving the readability of degraded manuscripts was through manual, and often destructive methods. The advancement of image processing has allowed for manuscript restoration techniques that avoid the possibility of doing further damage to document. The aim of this talk is to outline briefly the main types of degradation that can occur such as bleed through, fading and erosion of the writing medium due to poor storage. Methods for manuscript improvement will be discussed; the use of adaptive histogram equalization for enhancement of faded text, two approaches to modelling palimpsest (or bleed-through) documents, and also a method for recto/verso registration - an important pre-processing step in bleed-through removal.

May 26, 2010 by 134.226.86.54 -
Added lines 31-53:


26th May


Speaker Marcin Gorzel
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH


Abstract/Details

Virtual acoustic recording refers to the capture of real-world acoustic performances in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original performance space, otherwise known as a Virtual Auditory Environment (VAE). An important aspect in the quest for realism in such auditory scene synthesis is user interaction. That is, how the movements of a person listening to the virtual auditory scene directly influences the scene presentation. Such 'walkthrough auralization’ presents several challenges for production engineers, the most significant of which is the generation of the correct room acoustic response due to a given source-listener position. In particular, the correct direction of arrival of the direct sound and early reflections must be maintained since these signals contain the most vital cues for localization of acoustic sources.


Speaker Róisín Rowley-Brooke
Title The Digital Restoration of Degraded Historical Manuscripts


Abstract/Details

Until fairly recently the only way of improving the readability of degraded manuscripts was through manual, and often destructive methods. The advancement of image processing has allowed for manuscript restoration techniques that avoid the possibility of doing further damage to document. The aim of this talk is to outline briefly the main types of degradation that can occur such as bleed through, fading and erosion of the writing medium due to poor storage. Methods for manuscript improvement will be discussed; the use of adaptive histogram equalization for enhancement of faded text, two approaches to modelling palimpsest (or bleed-through) documents, and also a method for recto/verso registration - an important pre-processing step in bleed-through removal.


May 26, 2010 by 134.226.86.54 -
Deleted line 278:
26th MayMarcin/Róisín
Changed line 280 from:
9th JuneLuca/Felix
to:
16th JuneLuca/Felix
May 21, 2010 by 134.226.86.54 -
Changed line 18 from:
Title
to:
Title The Digital Restoration of Degraded Historical Manuscripts
May 21, 2010 by 134.226.86.54 -
Changed lines 279-282 from:
12th MayEric Risser
18th MayBarak Pearlmutter
19th MayDavid
26th MayMarcin/Róisín
to:
26th MayMarcin/Róisín
May 21, 2010 by 134.226.86.54 -
Changed line 7 from:
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH"
to:
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH
May 21, 2010 by 134.226.86.54 -
Changed line 7 from:
Title
to:
Title VIRTUAL ACOUSTIC RECORDING: AN INTERACTIVE APPROACH"
May 21, 2010 by 134.226.86.54 -
Changed line 50 from:
to:
May 21, 2010 by 134.226.86.54 -
Deleted lines 46-47:

Changed lines 48-50 from:

18th May

Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.
to:
Deleted lines 51-54:
Abstract/Details

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.

Changed lines 54-56 from:
to:


18th May

Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.


Abstract/Details

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.

May 21, 2010 by 134.226.86.54 -
Added lines 59-60:
May 21, 2010 by 134.226.86.54 -
Changed line 6 from:
Speaker Marcin Gorzel
to:
Speaker Marcin Gorzel
Changed line 22 from:
to:

Until fairly recently the only way of improving the readability of degraded manuscripts was through manual, and often destructive methods. The advancement of image processing has allowed for manuscript restoration techniques that avoid the possibility of doing further damage to document. The aim of this talk is to outline briefly the main types of degradation that can occur such as bleed through, fading and erosion of the writing medium due to poor storage. Methods for manuscript improvement will be discussed; the use of adaptive histogram equalization for enhancement of faded text, two approaches to modelling palimpsest (or bleed-through) documents, and also a method for recto/verso registration - an important pre-processing step in bleed-through removal.

May 21, 2010 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.
Time & Venue Printing House Hall - 11:30am 18th May
to:
Speaker Marcin Gorzel Title
Time & Venue Printing House Hall - 11:30am 26th May
Changed lines 12-13 from:

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.

to:

Virtual acoustic recording refers to the capture of real-world acoustic performances in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original performance space, otherwise known as a Virtual Auditory Environment (VAE). An important aspect in the quest for realism in such auditory scene synthesis is user interaction. That is, how the movements of a person listening to the virtual auditory scene directly influences the scene presentation. Such 'walkthrough auralization’ presents several challenges for production engineers, the most significant of which is the generation of the correct room acoustic response due to a given source-listener position. In particular, the correct direction of arrival of the direct sound and early reflections must be maintained since these signals contain the most vital cues for localization of acoustic sources.

Changed lines 17-19 from:
Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain
Time & Venue Printing House Hall - 11:30am 19th May
to:
Speaker Róisín Rowley-Brooke
Title
Changed lines 22-29 from:

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.

to:
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
May 21, 2010 by 134.226.86.54 -
Added line 58:


Added line 70:


Added line 89:


Changed line 100 from:
to:


May 21, 2010 by 134.226.86.54 -
Added lines 42-69:
Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain


Abstract/Details

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.


18th May

Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.


Abstract/Details

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.


12th May

May 13, 2010 by 134.226.86.54 -
Changed lines 6-8 from:
Speaker Eric Risser
Title Texture Synthesis
Time & Venue Printing House Hall - 11:30am 19th May 2010
to:
Speaker Prof. Barak Pearlmutter
Title What Sparsity Means to Me.
Time & Venue Printing House Hall - 11:30am 18th May
Changed lines 12-19 from:

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.

to:

Our computational toolbox holds many highly efficient methods, like least-squares linear fits. Our theoretical cookbook lists many intractable recipes, like calculating and marginalizing over posteriors using complete detailed models and accurate priors. Sparseness supplies a surprising bridge between these: methods to find "sparse" solutions to under-constrained problems can be surprisingly efficient, and distributions based on a simple sparsity criterion are surprisingly effective at encoding prior structure. In this talk we will explore a variety of ways in which sparseness can be represented mathematically and exploited computationally.

Added lines 16-34:


Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain
Time & Venue Printing House Hall - 11:30am 19th May


Abstract/Details

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.

May 13, 2010 by 134.226.86.54 -
Added line 32:


May 13, 2010 by 134.226.86.54 -
Changed line 8 from:
Time & Venue Printing House Hall - 11:30am 19th May
to:
Time & Venue Printing House Hall - 11:30am 19th May 2010
May 13, 2010 by 134.226.86.54 -
Added lines 28-44:

19th May

Speaker Eric Risser
Title Texture Synthesis
Abstract/Details

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.


May 10, 2010 by 134.226.86.54 -
Changed lines 6-7 from:
Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain
to:
Speaker Eric Risser
Title Texture Synthesis
May 10, 2010 by 134.226.86.54 -
Changed line 8 from:
Time & Venue Printing House Hall - 11:30am 12th May
to:
Time & Venue Printing House Hall - 11:30am 19th May
Changed lines 220-221 from:
12th MayDavid
19th MayEric Risser
to:
12th MayEric Risser
18th MayBarak Pearlmutter
19th MayDavid
April 21, 2010 by 134.226.86.54 -
Changed lines 3-4 from:

Next Week's Talk

to:

Next Scheduled Talk

Changed lines 6-8 from:
Speaker Dr. Patrick Perez and Dr. Lionel Oisel
Title Track2x : video analysis with point tracks
Time & Venue Printing House Hall - 2:30pm 21st April
to:
Speaker Dr. David Corrigan
Title Non-parametric Texture Synthesis in the Wavelet Domain
Time & Venue Printing House Hall - 11:30am 12th May
Changed lines 12-19 from:

Analysis of visual motion is a key step to processing, annotating and understanding videos and image sequences. Such an analysis includes different generic tasks: motion-based detection, motion-based segmentation, visual tracking of objects or object parts, motion classification, action characterization. We propose to cast some of these tasks in terms of analyzing point tracks (or "tracklets"). Tracking "points" (small image patches, really) is a long-standing computer vision tool, whose utilization dates back to the early eighties for 3D scene reconstruction. Point tracks can be easily extracted either with classic techniques, typically celebrated KLT (Kanade, Lucas, Tomasi) point tracker, or with more recent and sophisticated approaches such as "particle video". In any case, the motion information captured by sets of point tracks is similar to optical flow, but with an extended time horizon (at the price of reduced spatial density and reduced spatial and temporal regularity). We argue that this extended temporal information is useful (1) to gain robustness with respect to classic nuisances to motion analysis, in particular occlusions and (2) to better analyze motion over time intervals required by higher-level tasks. Building upon these insights, we address the following problems: robust tracking of arbitrary objects, motion-based segmentation of video shots, view invariant synchronization of video of same events and view invariant synchronization of action classes for recognition.

to:

The goal Texture Synthesis is to generate new and typically larger textures that are perceptually similar to an example texture. It is an important concept in the domain of cinema-post production as it can be used to generate textures to map onto the surfaces of computer generated objects. This talk will outline the techniques we have developed for texture synthesis that operate wholly or partly in the wavelet domain. The basis of each technique is a non-parametric method which synthesises wavelet coefficients at the coarsest level of the wavelet transform. For textures with small features, it is shown that the coarse resolution search is sufficient for realistic synthesis. Two alternative approaches are proposed for more structured synthesis. The first is a non-parametric refined multiscale synthesis which synthesises coefficients at all levels of the wavelet tree. The second uses the coarse resolution search to facilitate a patch-based synthesis. These approaches are shown to synthesise realistic textures at a much reduced computational expense than other non-parametric techniques. The fidelity of the synthesised texture is also more robust to variations in texture scale.

Changed lines 22-27 from:
Bio

Patrick Pérez received the Ph.D. degree from University of Rennes in 1993. After one year post-doctorate at Brown University (USA), he joined INRIA (France) in 1994 as a full time researcher. From 2000 to 2004, he was with Microsoft Research (Cambridge, UK). He then returned to INRIA as a senior researcher and took, in 2007, the direction of a research team on video analysis. In November 2009, Patrick Pérez joined Technicolor R&I (France) as a Distinguished Scientist in charge of fostering exploratory research in the fields of computer vision and image analysis. He is currently an Associate Editor for the IEEE Transactions on Pattern Intelligence and member of the Editorial Board of the International Journal of Computer Vision.

Lionel Oisel received his PhD in Computer Science from the University of Rennes I in 1998. He joined Technicolor in 2000 and is now principal scientist in this company. He is also in charge of a project, of approximately 30 people, dealing with multimedia analysis for the improvement of the content production workflow (from creation to consumption). This project includes video restoration, content enrichment and content recommendation technologies. His research interests include multimodal video indexing, object recognition and object retrieval.

to:
April 21, 2010 by 134.226.86.54 -
April 21, 2010 by 134.226.86.54 -
Deleted lines 216-217:
14th AprilDan/Francois
21st AprilPatrick Perez
Changed lines 219-220 from:
26th MayMarcin/Finian
2nd JuneRóisín/Ken
to:
26th MayMarcin/Róisín
2nd JuneFinian/Ken
April 19, 2010 by 134.226.86.54 -
Added line 9:
Added lines 18-19:



April 19, 2010 by 134.226.86.54 -
Added lines 13-16:
Bio

Patrick Pérez received the Ph.D. degree from University of Rennes in 1993. After one year post-doctorate at Brown University (USA), he joined INRIA (France) in 1994 as a full time researcher. From 2000 to 2004, he was with Microsoft Research (Cambridge, UK). He then returned to INRIA as a senior researcher and took, in 2007, the direction of a research team on video analysis. In November 2009, Patrick Pérez joined Technicolor R&I (France) as a Distinguished Scientist in charge of fostering exploratory research in the fields of computer vision and image analysis. He is currently an Associate Editor for the IEEE Transactions on Pattern Intelligence and member of the Editorial Board of the International Journal of Computer Vision.

Lionel Oisel received his PhD in Computer Science from the University of Rennes I in 1998. He joined Technicolor in 2000 and is now principal scientist in this company. He is also in charge of a project, of approximately 30 people, dealing with multimedia analysis for the improvement of the content production workflow (from creation to consumption). This project includes video restoration, content enrichment and content recommendation technologies. His research interests include multimodal video indexing, object recognition and object retrieval.

April 19, 2010 by 134.226.86.54 -
April 19, 2010 by 134.226.86.54 -
Changed lines 5-7 from:
Speaker Francois Pitie and Dan Ring
Title Nuke and Plugin Development
Time & Venue Printing House Hall - 11:30am 14th April
to:
Speaker Dr. Patrick Perez and Dr. Lionel Oisel
Title Track2x : video analysis with point tracks
Time & Venue Printing House Hall - 2:30pm 21st April
Changed lines 9-10 from:
Abstract/Details This talk will outline some of the recent work in plugin development for the Nuke platform. More details to follow.
to:
Abstract/Details

Analysis of visual motion is a key step to processing, annotating and understanding videos and image sequences. Such an analysis includes different generic tasks: motion-based detection, motion-based segmentation, visual tracking of objects or object parts, motion classification, action characterization. We propose to cast some of these tasks in terms of analyzing point tracks (or "tracklets"). Tracking "points" (small image patches, really) is a long-standing computer vision tool, whose utilization dates back to the early eighties for 3D scene reconstruction. Point tracks can be easily extracted either with classic techniques, typically celebrated KLT (Kanade, Lucas, Tomasi) point tracker, or with more recent and sophisticated approaches such as "particle video". In any case, the motion information captured by sets of point tracks is similar to optical flow, but with an extended time horizon (at the price of reduced spatial density and reduced spatial and temporal regularity). We argue that this extended temporal information is useful (1) to gain robustness with respect to classic nuisances to motion analysis, in particular occlusions and (2) to better analyze motion over time intervals required by higher-level tasks. Building upon these insights, we address the following problems: robust tracking of arbitrary objects, motion-based segmentation of video shots, view invariant synchronization of video of same events and view invariant synchronization of action classes for recognition.

Added lines 26-27:

April 19, 2010 by 134.226.86.54 -
Added lines 17-25:

14th April 2010

Speaker Francois Pitie and Dan Ring
Title Nuke and Plugin Development


Abstract/Details This talk will outline some of the recent work in plugin development for the Nuke platform.


April 06, 2010 by 134.226.86.54 -
Changed lines 198-204 from:
14th AprilDan/Francois
21st AprilPatrick Perez
12th MayDavid
19th MayEric Risser
26th MayMarcin/Finian
2nd JuneRóisín/Ken
9th JuneLuca/Felix
to:
14th AprilDan/Francois
21st AprilPatrick Perez
12th MayDavid
19th MayEric Risser
26th MayMarcin/Finian
2nd JuneRóisín/Ken
9th JuneLuca/Felix
April 06, 2010 by 134.226.86.54 -
April 06, 2010 by 134.226.86.54 -
April 06, 2010 by 134.226.86.54 -
Changed lines 198-204 from:
14th AprilDan/Francois
21st AprilPatrick Perez
12th MayDavid
19th MayEric Risser
26th MayMarcin/Finian
2nd MayRóisín/Ken
9th JuneLuca/Felix
to:
14th AprilDan/Francois
21st AprilPatrick Perez
12th MayDavid
19th MayEric Risser
26th MayMarcin/Finian
2nd JuneRóisín/Ken
9th JuneLuca/Felix
April 06, 2010 by 134.226.86.54 -
Changed lines 198-199 from:
14th AprilDan/Francois
21st AprilPatrick Perez
to:
14th AprilDan/Francois
21st AprilPatrick Perez
Changed lines 201-204 from:
19th MayEric Risser
26th MayMarcin/Finian
2nd MayRóisín/Ken
9th JuneLuca/Felix
to:
19th MayEric Risser
26th MayMarcin/Finian
2nd MayRóisín/Ken
9th JuneLuca/Felix
April 06, 2010 by 134.226.86.54 -
Changed lines 24-25 from:
to:


Added line 27:


April 06, 2010 by 134.226.86.54 -
Changed lines 197-201 from:
21st AprilTBA
12th MayDavid/Rozenn
19th MayMarcin/Finian
26th MayRóisín/Ken
2th JuneLuca/Felix
to:
21st AprilPatrick Perez
12th MayDavid
19th MayEric Risser
26th MayMarcin/Finian
2nd MayRóisín/Ken
9th JuneLuca/Felix
April 06, 2010 by 134.226.86.54 -
April 06, 2010 by 134.226.86.54 -
Changed lines 5-7 from:
Speaker Gary Baugh
Title Semi-automatic Motion Based Segmentation using Long Term Motion Trajectories
Time & Venue Printing House Hall - 11:30am 31st March
to:
Speaker Francois Pitie and Dan Ring
Title Nuke and Plugin Development
Time & Venue Printing House Hall - 11:30am 14th April
Changed line 9 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.
to:
Abstract/Details This talk will outline some of the recent work in plugin development for the Nuke platform. More details to follow.
Changed line 23 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.
to:
Abstract Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.
March 31, 2010 by 134.226.86.54 -
Changed line 25 from:

---

to:

March 31, 2010 by 134.226.86.54 -
Changed lines 17-25 from:
to:

31st March 2010

Speaker Gary Baugh
Title Semi-automatic Motion Based Segmentation using Long Term Motion Trajectories


Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.

---

March 31, 2010 by 134.226.86.54 -
Changed lines 188-195 from:
31st MarchGary
7th AprilDan/Francois
14th AprilDavid
21st AprilRozenn
28th AprilTBA
12th MayMarcin/Finian
19th MayRóisín/Ken
26th MayLuca/Felix
to:
14th AprilDan/Francois
21st AprilTBA
12th MayDavid/Rozenn
19th MayMarcin/Finian
26th MayRóisín/Ken
2th JuneLuca/Felix
March 25, 2010 by 134.226.86.54 -
Changed line 6 from:
Title SEMI-AUTOMATIC MOTION BASED SEGMENTATION USING LONG TERM MOTION TRAJECTORIES
to:
Title Semi-automatic Motion Based Segmentation using Long Term Motion Trajectories
March 25, 2010 by 134.226.86.54 -
Added line 6:
Title SEMI-AUTOMATIC MOTION BASED SEGMENTATION USING LONG TERM MOTION TRAJECTORIES
March 24, 2010 by 134.226.86.54 -
Changed lines 8-9 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion

segmentation.

to:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.
March 24, 2010 by 134.226.86.54 -
Changed lines 8-9 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to

propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion

to:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion
March 24, 2010 by 134.226.86.54 -
Changed lines 8-9 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based

segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to

to:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to
March 24, 2010 by 134.226.86.54 -
March 24, 2010 by 134.226.86.54 -
Changed lines 8-9 from:
Abstract/Details Semi-automated object segmentation is an important step in the cinema

post-production workflow. We propose a dense motion based

to:
Abstract/Details Semi-automated object segmentation is an important step in the cinema post-production workflow. We propose a dense motion based
March 24, 2010 by 134.226.86.54 -
Changed lines 8-12 from:
Abstract/Details Gary will be talking about his recent work on segmentation. More Details to follow.
to:
Abstract/Details Semi-automated object segmentation is an important step in the cinema

post-production workflow. We propose a dense motion based segmentation process that employs sparse feature based trajectories estimated across a long sequence of frames, articulated with a Bayesian framework. The algorithm first classifies the sparse trajectories into sparsely defined objects. Then the sparse object trajectories together with motion model side information are used to generate a dense object segmentation of each video frame. Unlike previous work, we do not use the sparse trajectories only to propose motion models, but instead use their position and motion throughout the sequence as part of the classification of pixels in the second step. Furthermore, we introduce novel colour and motion priors that employ the sparse trajectories to make explicit the spatiotemporal smoothness constraints important for long term motion segmentation.

March 24, 2010 by 134.226.86.54 -
Changed lines 5-6 from:
Speaker Prof. Anil Kokaram
Time & Venue Printing House Hall - 11:30am 24th March
to:
Speaker Gary Baugh
Time & Venue Printing House Hall - 11:30am 31st March
Changed line 8 from:
Abstract/Details Anil will be talking about his recent visit to the west coast of the USA.
to:
Abstract/Details Gary will be talking about his recent work on segmentation. More Details to follow.
March 24, 2010 by 134.226.86.54 -
Added lines 15-23:


24th March 2010

Speaker Prof. Anil Kokaram
Details Anil gave an account about his recent visit to the west coast of the USA and how he talks about sigmedia when on tour.



March 24, 2010 by 134.226.86.54 -
Deleted line 177:
24th MarchAnil
March 24, 2010 by 134.226.86.54 -
Changed line 183 from:
28th AprilGavin/Dan Barry ???
to:
28th AprilTBA
March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 176 from:
to:
March 18, 2010 by 134.226.86.54 -
Added line 174:


March 18, 2010 by 134.226.86.54 -
Changed line 176 from:
DateSpeaker(s)
to:
DateSpeaker(s)
March 18, 2010 by 134.226.86.54 -
Changed line 182 from:
28th AprilGavin/Dan Barry
to:
28th AprilGavin/Dan Barry ???
March 18, 2010 by 134.226.86.54 -
Changed line 176 from:
!Date!Speaker(s)
to:
DateSpeaker(s)
March 18, 2010 by 134.226.86.54 -
Changed lines 175-185 from:

Help

to:
!Date!Speaker(s)
24th MarchAnil
31st MarchGary
7th AprilDan/Francois
14th AprilDavid
21st AprilRozenn
28th AprilGavin/Dan Barry
12th MayMarcin/Finian
19th MayRóisín/Ken
26th MayLuca/Felix
March 18, 2010 by 134.226.86.54 -
Added line 175:

Help

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 11 from:

to:
March 18, 2010 by 134.226.86.54 -
Changed line 11 from:
to:

March 18, 2010 by 134.226.86.54 -
Changed line 11 from:

Previous Talks

to:
Added lines 13-15:


Previous Talks


Changed lines 170-174 from:

to:


Upcoming Speakers

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed lines 133-137 from:

5th November 2009

Speaker Dr. Edward Jones, NUIG
Title Aspects of DSP Research at NUI Galway
to:

29th October 2009

Speaker Darren Kavanagh
Title Speech Segmentation with Applications in e-Learning
Changed line 140 from:
Abstract In this presentation, Dr. Edward Jones will discuss some DSP-related projects currently underway in Electrical & Electronic Engineering at NUI Galway. The talk will cover speech-related projects in robust speech recognition, and audio quality assessment, as well as current biomedical signal processing research including the use of ultra wide band radar for the early detection of breast cancer.
to:
Abstract His talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.
Changed lines 146-149 from:

29th October 2009

Speaker Darren Kavanagh
Title Speech Segmentation with Applications in e-Learning
to:

22nd October 2009

Speaker Kangyu Pan
Title Spot Analysis in Microscopy.
Changed line 152 from:
Abstract His talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.
to:
Abstract The formations of various memories require the present of different specify mRNP in the neuron cells. The development of fluorescent proteins and high resolution fluorescence imaging allow the biologists to locate the mRNP in a living specimen by the co-localization of the differently labeled protein markers. However, the quantitative interpretation of the labeled proteins is still heavily reliant on manual evaluation. In this talk, a novel shape modeling algorithm will be introduced for automating the detection and analysis of the proteins. The algorithm exploits a Gaussian mixture model to characterize the geometric information of the protein particles, and applies Split-and-Merge Expectation-Maximization alg0rithm for optimizing the parameters of the model.
Deleted lines 156-166:

22nd October 2009

Speaker Kangyu Pan
Title Spot Analysis in Microscopy.


Abstract The formations of various memories require the present of different specify mRNP in the neuron cells. The development of fluorescent proteins and high resolution fluorescence imaging allow the biologists to locate the mRNP in a living specimen by the co-localization of the differently labeled protein markers. However, the quantitative interpretation of the labeled proteins is still heavily reliant on manual evaluation. In this talk, a novel shape modeling algorithm will be introduced for automating the detection and analysis of the proteins. The algorithm exploits a Gaussian mixture model to characterize the geometric information of the protein particles, and applies Split-and-Merge Expectation-Maximization alg0rithm for optimizing the parameters of the model.



March 18, 2010 by 134.226.86.54 -
Changed lines 158-159 from:

29th October 2009

to:

22nd October 2009

Added lines 165-175:



15th October 2009

Speaker Mohamed Ahmed
Title Bayesian Inference for Transparent Blotch Removal.


Abstract Current blotch removal algorithms model the corruption as a binary mixture between the original, clean images and an opaque (dirt) field. This typically causes incomplete blotch removal that manifests as blotch haloes in reconstruction. The talk will start by introducing a new algorithm for removing blotches. The novelity of this algorithm is in treating blotches as semi-transparent objects. The second part of the the talk will discuss a quantitative approach for accessing the restoration quality of blotch removal algorithms. The idea here is to create a near ground-truth blotch mattes by exploring the transparency property of the supplied Infrared scans of blotches. Finally, the talk will conculude by providing a brief overview on current techniques for separating mixtures of natural images.
March 18, 2010 by 134.226.86.54 -
Changed lines 145-147 from:

29th October2009

to:

29th October 2009

Changed lines 152-164 from:
Abstract his talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.
to:
Abstract His talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.




29th October 2009

Speaker Kangyu Pan
Title Spot Analysis in Microscopy.


Abstract The formations of various memories require the present of different specify mRNP in the neuron cells. The development of fluorescent proteins and high resolution fluorescence imaging allow the biologists to locate the mRNP in a living specimen by the co-localization of the differently labeled protein markers. However, the quantitative interpretation of the labeled proteins is still heavily reliant on manual evaluation. In this talk, a novel shape modeling algorithm will be introduced for automating the detection and analysis of the proteins. The algorithm exploits a Gaussian mixture model to characterize the geometric information of the protein particles, and applies Split-and-Merge Expectation-Maximization alg0rithm for optimizing the parameters of the model.
March 18, 2010 by 134.226.86.54 -
Added lines 129-152:




5th November 2009

Speaker Dr. Edward Jones, NUIG
Title Aspects of DSP Research at NUI Galway


Abstract In this presentation, Dr. Edward Jones will discuss some DSP-related projects currently underway in Electrical & Electronic Engineering at NUI Galway. The talk will cover speech-related projects in robust speech recognition, and audio quality assessment, as well as current biomedical signal processing research including the use of ultra wide band radar for the early detection of breast cancer.




29th October2009

Speaker Darren Kavanagh
Title Speech Segmentation with Applications in e-Learning


Abstract his talk will present an approach for determining the temporal word boundaries in speech utterances.This approach uses methods such as Dynamic Time Warping, DTW and Principal Component Analysis, PCA.Showcase demonstrations of the Recitell application will be given towards the end of the presentation.
March 18, 2010 by 134.226.86.54 -
Changed lines 122-125 from:

19th November 2009

Speaker Craig Berry
Title Audio Visual Speech Recognition
to:

5th November 2009

Speaker Dr. Edward Jones, NUIG
Title Aspects of DSP Research at NUI Galway
Changed line 128 from:
Abstract Automatic Speech Recognition (ASR) is a highly enabling technology. Its value lies in making human interaction with machines more natural and e cient. The bimodal nature of human speech interpretation, with its use of both audio and visual cues, has prompted research into Audio-Visual Speech Recognition (AVSR) as a means of improving the accuracy and robustness of conventional audio only speech recognition systems. The extraction of robust visual features is an active research topic. This talk will give a brief overview of an Audio Visual Speech Recognition system, and will then present some work on colour based lip segmentation and Active Appearance Models. These techniques are used as the visual front end to an AVSR system.
to:
Abstract In this presentation, Dr. Edward Jones will discuss some DSP-related projects currently underway in Electrical & Electronic Engineering at NUI Galway. The talk will cover speech-related projects in robust speech recognition, and audio quality assessment, as well as current biomedical signal processing research including the use of ultra wide band radar for the early detection of breast cancer.
March 18, 2010 by 134.226.86.54 -
Changed line 104 from:
Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems
to:
Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems.
Added lines 120-131:


19th November 2009

Speaker Craig Berry
Title Audio Visual Speech Recognition


Abstract Automatic Speech Recognition (ASR) is a highly enabling technology. Its value lies in making human interaction with machines more natural and e cient. The bimodal nature of human speech interpretation, with its use of both audio and visual cues, has prompted research into Audio-Visual Speech Recognition (AVSR) as a means of improving the accuracy and robustness of conventional audio only speech recognition systems. The extraction of robust visual features is an active research topic. This talk will give a brief overview of an Audio Visual Speech Recognition system, and will then present some work on colour based lip segmentation and Active Appearance Models. These techniques are used as the visual front end to an AVSR system.



March 18, 2010 by 134.226.86.54 -
Changed lines 74-77 from:

10th December 2009

Speaker Prof. Anil Kokaram, Dr. Naomi Harte
Title Scientific Writing Forum.
to:

16th December 2009

Speaker Dan Ring
Title ICCV 2009 Review.
Changed line 80 from:
to:
Slides Dan's Slides
Changed lines 86-89 from:

4th December 2009

Speaker Stephen Adams
Title Binaural synthesis for Virtual Auditory Environments
to:

10th December 2009

Speaker Prof. Anil Kokaram, Dr. Naomi Harte
Title Scientific Writing Forum.
Changed lines 92-116 from:
Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems
to:
Slides Naomi's Slides, Anil's Slides




4th December 2009

Speaker Stephen Adams
Title Binaural synthesis for Virtual Auditory Environments


Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems




26th November 2009

Speaker Mohamed Ahmed
Title ICIP 2009 Review


Slides Mohamed's Slides
March 18, 2010 by 134.226.86.54 -
Changed lines 84-95 from:
to:


4th December 2009

Speaker Stephen Adams
Title Binaural synthesis for Virtual Auditory Environments


Abstract I'll be discussing the creation of VAEs using binaural techniques and the practical issues and problems encountered with headphone reproduction of binaural along with possible solutions to these problems



March 18, 2010 by 134.226.86.54 -
Changed lines 19-20 from:
Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the

outputs of an auditory nerve model for a range of SNHLs.

to:
Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs.
March 18, 2010 by 134.226.86.54 -
Changed lines 19-27 from:
Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The

development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the

to:
Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the
March 18, 2010 by 134.226.86.54 -
Changed lines 19-20 from:
Abstract Hearing loss research has traditionally been based on

perceptual criteria, speech intelligibility and threshold levels. The

to:
Abstract Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The
March 18, 2010 by 134.226.86.54 -
Changed lines 17-18 from:

\\

to:


Changed line 30 from:

\\

to:


March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed lines 83-86 from:

15th January 2010

Speaker Prof. Nick Kingsbury, Cambridge University
Title Iterative Methods for 3-D Deconvolution with Overcomplete Transforms, such as Dual-Tree Complex Wavelets.
to:

10th December 2009

Speaker Prof. Anil Kokaram, Dr. Naomi Harte
Title Scientific Writing Forum.
Changed lines 89-92 from:
Abstract Overcomplete transforms, such as complex wavelets, can offer more flexible signal representations than critically-sampled transforms. They have been shown to perform well in image denoising benchmarks, and we have therefore been developing iterative wavelet-based regularisation algorithms for more demanding applications such as image and 3D-data deconvolution. In this talk we will briefly describe the characteristics of complex wavelets that make them well-suited to such tasks, and then we will describe an algorithm for wavelet-based 3-dimensional image deconvolution which employs subband-dependent minimization and the dual-tree wavelet transform in an iterative Bayesian framework. This algorithm employs a prior based on an extended Gaussian Scale Mixtures (GSM) model that approximates an L0-norm, instead of the conventional L1-norm, to provide a sparseness constraint in the wavelet domain. Hence it introduces spatially varying inter-scale information into the deconvolution process and thus achieves improved deconvolution results and faster convergence.


Bio Nick Kingsbury is Professor of Signal Processing at the University of Cambridge, Department of Engineering. He has worked in the areas of digital communications, audio analysis and coding, and image processing. He has developed the dual-tree complex wavelet transform and is especially interested in the application of complex wavelets and related multiscale and multiresolution methods to the analysis of images and 3-D datasets.
to:
March 18, 2010 by 134.226.86.54 -
Added lines 63-77:




15th January 2010

Speaker Prof. Nick Kingsbury, Cambridge University
Title Iterative Methods for 3-D Deconvolution with Overcomplete Transforms, such as Dual-Tree Complex Wavelets.


Abstract Overcomplete transforms, such as complex wavelets, can offer more flexible signal representations than critically-sampled transforms. They have been shown to perform well in image denoising benchmarks, and we have therefore been developing iterative wavelet-based regularisation algorithms for more demanding applications such as image and 3D-data deconvolution. In this talk we will briefly describe the characteristics of complex wavelets that make them well-suited to such tasks, and then we will describe an algorithm for wavelet-based 3-dimensional image deconvolution which employs subband-dependent minimization and the dual-tree wavelet transform in an iterative Bayesian framework. This algorithm employs a prior based on an extended Gaussian Scale Mixtures (GSM) model that approximates an L0-norm, instead of the conventional L1-norm, to provide a sparseness constraint in the wavelet domain. Hence it introduces spatially varying inter-scale information into the deconvolution process and thus achieves improved deconvolution results and faster convergence.


Bio Nick Kingsbury is Professor of Signal Processing at the University of Cambridge, Department of Engineering. He has worked in the areas of digital communications, audio analysis and coding, and image processing. He has developed the dual-tree complex wavelet transform and is especially interested in the application of complex wavelets and related multiscale and multiresolution methods to the analysis of images and 3-D datasets.
March 18, 2010 by 134.226.86.54 -
Changed line 76 from:


to:
March 18, 2010 by 134.226.86.54 -
Changed lines 68-70 from:

4th February 2010

Speaker Prof. Nick Kingsbury, Cambridge\\
to:

15th January 2010

Speaker Prof. Nick Kingsbury, Cambridge University\\
March 18, 2010 by 134.226.86.54 -
Changed line 71 from:
Title TV Broadcast Analysis and Structuring: Advances and Challenges
to:
Title Iterative Methods for 3-D Deconvolution with Overcomplete Transforms, such as Dual-Tree Complex Wavelets.
Changed lines 74-76 from:
Abstract In order to make use of the large number of TV broadcasts through novel services like Catch-up TV or TV-on-Demand, TV streams have to be precisely and automatically segmented, annotated and structured. The exact start time and the exact end time of each of the broadcasted programs have to be determined. Each extracted program has then to be classified, annotated and indexed.

The aim of this talk is to provide an overview of novel TV services and to highlight the need for powerful audio-visual content-based analysis techniques in order to build these services. Technical constraints will be also discussed. Our work toward building a fully automatic system for TV broadcast structuring will be then described. Finally, open issues and challenges will be presented.

to:
Abstract Overcomplete transforms, such as complex wavelets, can offer more flexible signal representations than critically-sampled transforms. They have been shown to perform well in image denoising benchmarks, and we have therefore been developing iterative wavelet-based regularisation algorithms for more demanding applications such as image and 3D-data deconvolution. In this talk we will briefly describe the characteristics of complex wavelets that make them well-suited to such tasks, and then we will describe an algorithm for wavelet-based 3-dimensional image deconvolution which employs subband-dependent minimization and the dual-tree wavelet transform in an iterative Bayesian framework. This algorithm employs a prior based on an extended Gaussian Scale Mixtures (GSM) model that approximates an L0-norm, instead of the conventional L1-norm, to provide a sparseness constraint in the wavelet domain. Hence it introduces spatially varying inter-scale information into the deconvolution process and thus achieves improved deconvolution results and faster convergence.
Changed line 77 from:
Bio Sid-Ahmed Berrani received his Ph.D. in Computer Science in February 2004 from the University of Rennes 1, France. His Ph.D. work was carried out at INRIA, Rennes and was funded by Thomson R&D France. It was dedicated to similarity searches in very large image databases. The Ph.D. thesis of Sid-Ahmed Berrani received the SPECIF Award from the French Society of Education and Research in Computer Science. He then spent 6 months as a Research Fellow in the Sigmedia Group at the University of Dublin, Trinity College, where he worked on video indexing. Since November 2004, Sid-Ahmed Berrani has been a researcher at Orange Labs - France Telecom in Rennes, France. He is currently leading R&D activities on video indexing and analysis for media search services. In particular, he has focused on video analysis techniques for TV broadcast structuring and video fingerprinting.
to:
Bio Nick Kingsbury is Professor of Signal Processing at the University of Cambridge, Department of Engineering. He has worked in the areas of digital communications, audio analysis and coding, and image processing. He has developed the dual-tree complex wavelet transform and is especially interested in the application of complex wavelets and related multiscale and multiresolution methods to the analysis of images and 3-D datasets.
March 18, 2010 by 134.226.86.54 -
Changed lines 53-70 from:
Speaker Dr. Sed-Ahmed Berrani\\
to:
Speaker Dr. Sid-Ahmed Berrani, Orange Telecom
Title TV Broadcast Analysis and Structuring: Advances and Challenges


Abstract In order to make use of the large number of TV broadcasts through novel services like Catch-up TV or TV-on-Demand, TV streams have to be precisely and automatically segmented, annotated and structured. The exact start time and the exact end time of each of the broadcasted programs have to be determined. Each extracted program has then to be classified, annotated and indexed.

The aim of this talk is to provide an overview of novel TV services and to highlight the need for powerful audio-visual content-based analysis techniques in order to build these services. Technical constraints will be also discussed. Our work toward building a fully automatic system for TV broadcast structuring will be then described. Finally, open issues and challenges will be presented.

Bio Sid-Ahmed Berrani received his Ph.D. in Computer Science in February 2004 from the University of Rennes 1, France. His Ph.D. work was carried out at INRIA, Rennes and was funded by Thomson R&D France. It was dedicated to similarity searches in very large image databases. The Ph.D. thesis of Sid-Ahmed Berrani received the SPECIF Award from the French Society of Education and Research in Computer Science. He then spent 6 months as a Research Fellow in the Sigmedia Group at the University of Dublin, Trinity College, where he worked on video indexing. Since November 2004, Sid-Ahmed Berrani has been a researcher at Orange Labs - France Telecom in Rennes, France. He is currently leading R&D activities on video indexing and analysis for media search services. In particular, he has focused on video analysis techniques for TV broadcast structuring and video fingerprinting.




4th February 2010

Speaker Prof. Nick Kingsbury, Cambridge\\
March 18, 2010 by 134.226.86.54 -
Changed line 7 from:
to:


March 18, 2010 by 134.226.86.54 -
Changed line 7 from:


to:
March 18, 2010 by 134.226.86.54 -
Changed line 5 from:
Speaker Prof. Anil Kokaram
to:
Speaker Prof. Anil Kokaram
March 18, 2010 by 134.226.86.54 -
Changed line 4 from:

\\

to:


March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Deleted line 5:


March 18, 2010 by 134.226.86.54 -
Changed line 15 from:


to:
March 18, 2010 by 134.226.86.54 -
Changed line 6 from:

\\

to:


Changed lines 8-9 from:


\\

to:


Changed lines 10-11 from:


\\

to:



Changed line 13 from:

\\

to:


Changed line 15 from:
to:


March 18, 2010 by 134.226.86.54 -
Changed line 39 from:

\\

to:


March 18, 2010 by 134.226.86.54 -
Changed line 49 from:


to:
March 18, 2010 by 134.226.86.54 -
Changed lines 45-46 from:

\\

to:



March 18, 2010 by 134.226.86.54 -
Changed line 64 from:

\\

to:


March 18, 2010 by 134.226.86.54 -
Added line 50:


March 18, 2010 by 134.226.86.54 -
Changed lines 50-51 from:



to:
Changed line 56 from:


to:
March 18, 2010 by 134.226.86.54 -
Changed lines 47-48 from:


to:



Changed lines 50-51 from:

\\

to:



Changed lines 56-57 from:

\\

to:



March 18, 2010 by 134.226.86.54 -
Added line 59:


March 18, 2010 by 134.226.86.54 -
Changed line 59 from:

\\

to:


March 18, 2010 by 134.226.86.54 -
Deleted lines 44-45:
Bio He received the B.Sc. degree in information engineering at the SungKyunKwan University, Korea. He obtained the M.Sc. degree in School of informatics at the University of Edinburgh UK in 2004 and the Ph.D. degree in signal processing group at the University of Cambridge UK in 2008 respectively. In 2008, he moved to department of Engineering science, the University of Oxford, UK to do postdoctoral research. He is currently a Research Fellow with Statistics department, Trinity College Dublin, Ireland. His research interests include Bayesian statistics, Machine Learning, data mining, Network Security and Biomedical engineering. He has worked on applications in brain signals, cosmology, biophysics and multimedia.
Changed lines 46-47 from:

to:
Bio He received the B.Sc. degree in information engineering at the SungKyunKwan University, Korea. He obtained the M.Sc. degree in School of informatics at the University of Edinburgh UK in 2004 and the Ph.D. degree in signal processing group at the University of Cambridge UK in 2008 respectively. In 2008, he moved to department of Engineering science, the University of Oxford, UK to do postdoctoral research. He is currently a Research Fellow with Statistics department, Trinity College Dublin, Ireland. His research interests include Bayesian statistics, Machine Learning, data mining, Network Security and Biomedical engineering. He has worked on applications in brain signals, cosmology, biophysics and multimedia.
Added lines 48-50:

\\

Changed line 59 from:
to:

\\

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Deleted lines 30-31:

Added lines 32-34:

\\

Changed lines 51-54 from:

4th March 2010

Speaker Andrew Hines
Title
to:

4th February 2010

Speaker Dr. Sed-Ahmed Berrani
Title TV Broadcast Analysis and Structuring: Advances and Challenges
Changed lines 56-58 from:
Abstract
to:
Abstract In order to make use of the large number of TV broadcasts through novel services like Catch-up TV or TV-on-Demand, TV streams have to be precisely and automatically segmented, annotated and structured. The exact start time and the exact end time of each of the broadcasted programs have to be determined. Each extracted program has then to be classified, annotated and indexed.

The aim of this talk is to provide an overview of novel TV services and to highlight the need for powerful audio-visual content-based analysis techniques in order to build these services. Technical constraints will be also discussed. Our work toward building a fully automatic system for TV broadcast structuring will be then described. Finally, open issues and challenges will be presented.

Bio Sid-Ahmed Berrani received his Ph.D. in Computer Science in February 2004 from the University of Rennes 1, France. His Ph.D. work was carried out at INRIA, Rennes and was funded by Thomson R&D France. It was dedicated to similarity searches in very large image databases. The Ph.D. thesis of Sid-Ahmed Berrani received the SPECIF Award from the French Society of Education and Research in Computer Science. He then spent 6 months as a Research Fellow in the Sigmedia Group at the University of Dublin, Trinity College, where he worked on video indexing. Since November 2004, Sid-Ahmed Berrani has been a researcher at Orange Labs - France Telecom in Rennes, France. He is currently leading R&D activities on video indexing and analysis for media search services. In particular, he has focused on video analysis techniques for TV broadcast structuring and video fingerprinting.
Deleted lines 61-111:

4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract

March 18, 2010 by 134.226.86.54 -
Added line 47:
March 18, 2010 by 134.226.86.54 -
Changed line 46 from:
to:

\\

March 18, 2010 by 134.226.86.54 -
Changed line 45 from:
Bio
to:
Bio He received the B.Sc. degree in information engineering at the SungKyunKwan University, Korea. He obtained the M.Sc. degree in School of informatics at the University of Edinburgh UK in 2004 and the Ph.D. degree in signal processing group at the University of Cambridge UK in 2008 respectively. In 2008, he moved to department of Engineering science, the University of Oxford, UK to do postdoctoral research. He is currently a Research Fellow with Statistics department, Trinity College Dublin, Ireland. His research interests include Bayesian statistics, Machine Learning, data mining, Network Security and Biomedical engineering. He has worked on applications in brain signals, cosmology, biophysics and multimedia.
March 18, 2010 by 134.226.86.54 -
Changed lines 34-37 from:

4th March 2010

Speaker Andrew Hines
Title
to:

10th February 2010

Speaker Dr. JiWon Yoon
Title Bayesian Inference for Single Molecule Fluorescence Microscopic image processing
Changed lines 39-45 from:
Abstract
to:
Abstract Using fluorescence microscopy with single-molecule sensitivity, it is now possible to follow to movement of individual fluoro-phore tagged molecules such as proteins and lipids in the cell membrane with nano-meter precision. Diffusion or directed motion of molecules on the cell can be investigated to elucidate the structure of the cell membrane by tracking the single molecules. There are mainly three steps in processing data and tracking the molecules from the sequential images: filtering (de-noising), spot detection and tracking. In this talk, we will give a presentation on both filtering and tracking techniques.

First of all, we have recently developed a robust de-noising algorithm in a Gibbs scheme. This algorithm embeds Gaussian Markov Random Field (GMRF) prior to explain the properties of the images. Since this algorithm is based on Bayesian framework, we do have few systematic parameters to be tuned. The performance of this algorithm is compared with several conventional approaches including Gaussian filter, Weiner filter and Wavelet filter.

We also developed several multi-target tracking algorithms in a Bayesian framework. Roughly, we will retrieve the concept of single target tracking and multi-target tracking. Then, the marginalized Markov Chain Monte Carlo Data Association (MCMCDA) which is originally proposed by Oh is presented. Marginalized MCMCDA is a fully off-line system and it infers most systematic parameters which are commonly fixed in tracking society.

Bio
March 18, 2010 by 134.226.86.54 -
Changed line 20 from:
AbstractHearing loss research has traditionally been based on
to:
Abstract Hearing loss research has traditionally been based on
March 18, 2010 by 134.226.86.54 -
Changed line 18 from:
Title
to:
Title Measuring Sensorineural Hearing Loss with an Auditory Peripheral Model
Changed lines 20-30 from:
Abstract
to:
AbstractHearing loss research has traditionally been based on

perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory-periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. Model outputs can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated using two types of neurograms. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs.

March 18, 2010 by 134.226.86.54 -
Changed lines 15-16 from:

3rd March

to:

4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

Speaker Andrew Hines
Title


Abstract



4th March 2010

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 22 from:

to:

March 18, 2010 by 134.226.86.54 -
Added line 8:

\\

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 16 from:
Speaker Andrew Hines
to:
Speaker Andrew Hines\\
March 18, 2010 by 134.226.86.54 -
Deleted line 16:

\\

March 18, 2010 by 134.226.86.54 -
Deleted line 13:
March 18, 2010 by 134.226.86.54 -
Deleted line 4:
March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 7 from:
to:

\\

Deleted lines 8-9:
Abstract/Details Anil will be talking about his recent visit to the west coast of the USA.
Changed lines 10-11 from:

Previous Talks

to:
Abstract/Details Anil will be talking about his recent visit to the west coast of the USA.
Changed lines 12-15 from:
to:

Previous Talks


Changed line 19 from:
to:

\\

Changed line 21 from:
to:

\\

March 18, 2010 by 134.226.86.54 -
Changed lines 6-15 from:
Speaker

Prof. Anil Kokaram

Time & Venue

Printing House Hall - 11:30am 24th March

Abstract/Details

Anil will be talking about his recent visit to the west coast of the USA.

to:
Speaker Prof. Anil Kokaram Time & Venue Printing House Hall - 11:30am 24th March Abstract/Details Anil will be talking about his recent visit to the west coast of the USA.
Changed lines 18-23 from:
Speaker

Andrew Hines

Title

Abstract

to:
Speaker Andrew Hines Title Abstract
March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 3 from:

Next Week Talk

to:

Next Week's Talk

Changed line 6 from:

Speaker

to:
Speaker
Changed lines 9-10 from:

Time & Venue

to:
Time & Venue
Changed lines 13-14 from:

Abstract/Details

to:
Abstract/Details
Changed lines 21-23 from:

Date

Speaker

to:

3rd March

Speaker

Andrew Hines

Title

March 18, 2010 by 134.226.86.54 -
Changed lines 16-17 from:
to:


Added line 19:

\\

March 18, 2010 by 134.226.86.54 -
Added line 4:

\\

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed line 10 from:

Printing House Hall - 11:30am 24^th^ March

to:

Printing House Hall - 11:30am 24th March

March 18, 2010 by 134.226.86.54 -
Changed line 10 from:

Printing House Hall - 11:30am 24^{th} March

to:

Printing House Hall - 11:30am 24^th^ March

March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed lines 18-25 from:

\subsec

to:

Date

Speaker

Abstract


March 18, 2010 by 134.226.86.54 -
March 18, 2010 by 134.226.86.54 -
Changed lines 3-5 from:

Next Weeks Talk

Speaker

Prof. Anil Kokaram
to:

Next Week Talk

Speaker

Prof. Anil Kokaram

Time & Venue

Printing House Hall - 11:30am 24^{th} March

Abstract/Details

Anil will be talking about his recent visit to the west coast of the USA.

Previous Talks

\subsec

March 18, 2010 by 134.226.86.54 -
Changed lines 1-5 from:
to:

Next Weeks Talk

Speaker

Prof. Anil Kokaram
March 18, 2010 by 134.226.86.54 -
Added line 1:
Page last modified on October 11, 2019