2012-2013 Talks

See other talks

Upcoming Speakers

17-Oct-12Ian Kelly
24-Oct-12Andrew Hines, Naomi Harte, Frank Boland
31-Oct-12Francois Pitie
07-Nov-12Reading week
14-Nov-12Roisin Rowley-Brooke
21-Nov-12No talk - Science Gallery Visit
28-Nov-12Finnian Kelly
05-Dec-12Ken Sooknanan
12-Dec-12No Talk Scheduled
23-Jan-13Yun Feng Wang
30-Jan-13David Corrigan
6-Feb-13Marcin Gorzel
16-Apr-13Félix Raimbault
23-Apr-13Kangyu Pan
30-Apr-13No Talk - PHH in use
7-May-13Liam O'Sullivan
14-May-13Ailbhe Cullen
21-May-13Frank Boland
28-May-13Naomi Harte
4-Jun-13Ian Kelly
11-Jun-13No Talk
18-Jun-13No Talk
25-Jun-13Andrew Hines


Speaker Andrew Hines
Title Detailed Analysis of PESQ And VISQOL Behaviour in the Context of Playout Delay Adjustments Introduced by VOIP Jitter Buffer Algorithms
Time & Venue Printing House Hall - 12:00 25-Jun-13


This talk covers a detailed analysis of both PESQ and VISQOL model behavior, when tested against speech samples modified through playout delay adjustments. The adjustments are typical (in extent and magnitude) to those introduced by VoIP jitter buffer algorithms. In particular, the analysis examines the impact of speaker/sentence on MOS scores predicted by both models and seeks to determine if both models are able to correctly detect and quantify a playout delay adjustments and if so to also predict the impact on quality perceived by the user. The results showed speaker voice preference dominating subjective tests more than playout de- lay duration or location. By design, PESQ andVISQOL do not qualify speaker voice difference reducing their correlation with the subjective tests. In addition, it was found that PESQ is actually quite good at detecting playout delay adjustments but the impact of play- out delay adjustments on a quality perceived by the user is not well modelled. On the other hand, VISQOL model is better in predicting an impact of playout delay adjustments on a quality perceived by the user but there are still some discrepancies in the predicted scores. The reasons for those discrepancies are particularly analyzed and discussed.

Speaker Ian Kelly
Title Detecting arrivals in room impulse responses with dynamic time warping
Time & Venue Printing House Hall - 12:00 4-Jun-13


The detection of early reflections in room impulse responses (RIRs) is of importance to many algorithms including room geometry inference, mixing time determination and speech dereverberation. The detection of early reflections can be hampered by increasing pulse width, as the direct sound undergoes reflection and by overlapping of the reflections, as the pulse density grows.We propose the use of Dynamic Time Warping upon a direct sound pulse to better estimate the temporal distribution of arrivals in room impulse responses. Bounded Dynamic Time Warping is performed after an initial correlation of the direct sound with the remaining signal to further refine the arrival’s location and duration and to find arrivals which may otherwise not correlate well with the un-warped direct sound due to a change in the reflection’s shape. Dynamic Time Warping can also be used to help find overlapping reflections which may otherwise go unnoticed. Warping is performed via a set of warp matrices which can be combined together and can also be inverted via a left pseudo-inverse. This pseudo-inverse can be very quickly calculated based upon the properties of the warp matrices and how their transpose can be formed into a non square orthogonal matrix by the deletion of repeated rows.

Speaker Dr Naomi Harte
Title DSP – For the Birds!
Time & Venue Printing House Hall - 12:00 28-May-13


The songs of birds, like human voices, are important elements of their identity. In ornithology, distinguishing the songs of different populations is as vital as identifying morphological and genetic differences. This talk will explore how DSP and knowledge of speech processing can potentially transform the approach taken by scientists to comparing birdsongs. Using data gathered in Indonesia by the TCD Zoology Dept, the song from different subspecies of Black-naped Orioles and Olive-backed Sunbirds is examined. The song from different island populations is modelled with MFCCs and Gaussian Mixture Models. Analysing the performance of the classifiers on unseen test data can give an indication of song diversity.

These early stage results, which I will present at Interspeech later this Summer, show that a forensic approach to birdsong analysis, inspired by speech processing, may offer invaluable insights into cryptic species diversity as well as song identification at the subspecies level.

Speaker Prof Frank Boland
Title ‘How loud is that?’
Time & Venue Printing House Hall - 12:00 21-May-13


There has been a fundamental change to how the loudness of audio is measured. Broadcasters have responded to persistent and growing complaints from consumers to major jumps in audio levels at breaks within and between programmes. In 2010 the European Broadcasting Union introduced recommendations for new audio loudness level measurements to replace outmoded peak measurements. These measurements are now being adopted beyond the EU and in the wider audio-video industries. In this seminar the challenge of defining and measuring audio loudness will be introduced, as will the ‘loudness wars’ of the past decade. The signal processing, that forms the new three part measurements of audio loudness, will be explained and some recent user trials conducted by my audio research group will be presented.

Speaker Ailbhe Cullen
Title Automatic Classification of Affect from Speech
Time & Venue Printing House Hall - 12:00 14-May-13


To truly understand speech it is not enough to know the words that were spoken, we must also know how they were spoken. Emotion (or affect) recognition is a multidisciplinary research domain which aims to exploit non-verbal visual and cues to decode this paralinguistic information. This is still a relatively young field, and as such there remains uncertainty about many aspects of the recognition process, from labelling to feature selection to classifier design.

This talk focuses on the acoustic classifi cation of aff ect from speech using hidden Markov models. A number of feature sets are compared, some which have never before been used for emotion classi cation, in an attempt to discern the optimum features and classif er structure for the various affective dimensions. Probabilistic fusion is then used to combine the benefits of each individual classifi er.

The effect a particular type of phonation, known as creaky voice, on affect classi fication is also explored. Creak is used in the English language to signal certain emotions, and should thus aid classification, but tends to cause problems for automatic feature extraction routines. Finally, some novel features are applied to the classi fication task in order to better exploit creak.

Speaker Liam O'Sullivan
Title MorphOSC- A Toolkit for Building Sound Control GUIs with Preset Interpolation in the Processing Development Environment
Time & Venue Printing House Hall - 12:00 7-May-2013


MorphOSC is a new toolkit for building graphical user interfaces for the control of sound using morphing between parameter presets. It uses the multidimensional interpolation space paradigm seen in some other systems, but hitherto unavailable as open-source software in the form presented here. The software is delivered as a class library for the Processing Development Environment and is cross-platform for desktop computers and Android mobile devices. This talk positions the new library within the context of similar software, introduces the main features of the initial code release and details future work on the project.

Speaker Kangyu Pan
Title Shape Models for Image segmentation in Microscopy
Time & Venue Printing House Hall - 12:00 23-Apr-13


This project presents three model driven segmentation algorithms for extracting different types of microscopic objects based on prior shape information of the objects. The first part presents a novel Gaussian mixture shape modelling algorithm for protein particle detection and analysis in the images delivered from the biological study of memory formation.The new Gaussian mixture model (GMM) approach with a novel error-based split-and-merge expectation-maximization (eSMEM) algorithm not only estimates the number of candidate particles in a cluster spot (cluster of particles), but also parametrizes the shape characteristics of the particles for the later co-localization analysis. The second part presents a wavelet-based Bayesian segmentation model to reconstruct the shape the synapses (where the protein synthesis takes place) from a stack of image slices recorded with a confocal microscope. In order to tackle the problem of irregular luminance of the synapses and the presence of the ‘out-of-focus’ synaptic features, the segmentation model incorporates the ‘sharpness’ information of the objects, the global intensity histogram, and the inter-slice intensity behaviour. The last part presents a new active contour mode (Cellsnake) for segmenting the overlapped cell/fibre objects in the skeletal muscle images. The challenge of the segmentation is the high variation in the shapes of the fibre objects. In order to distinguish the fibres from overlapped objects and segment the candidate cell/fibre from each overlapped object, the outlined algorithm divides each separated region in the image into small ‘cell candidates’. Each ‘cell candidates’ is associated with an active contour (AC) model, and the deformation is constrained by the energy terms derived from the shapes of the cells. Finally, the ACs after the deformation are merged when the corresponding ‘cell candidates’ belong to the same fibre and hence segment the overlapped fibre/cell objects.

Speaker Félix Raimbault
Time & Venue Printing House Hall - 12:00 16-Apr-13


Speaker Marcin Gorzel
Title Optimised real-time rendering of audio in Virtual Auditory Environments
Time & Venue Printing House Hall - 12:00 6-Feb-13


This project looks at the problem of a capture or synthesis of acoustic events in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original space, otherwise known as a Virtual Auditory Environment (VAE). Of particular concern is identification and perceptually correct reconstruction of important acoustic cues that allow to localise sound object in the whole 3-D space, with a special emphasis on the perception of auditory distance. Such presentations can be realised with the use of both multichannel loudspeaker arrays and headphones. The latter are able to provide personalised sound field to a single user, minimising the influence of the listening environment and providing a better sense of immersion. However, one of the problems that needs to be addressed is the user interaction and how listener movements affect the experience. Such walk-through auralisations present several challenges for production engineers, most significant of which are the generation of correct room acoustic responses for a given source-listener position as well as identification of the most optimal sound reproduction schemes that can minimise the computational burden. The current framework considers the parametrisation of real-world sound fields and their subsequent real-time auralisation using a hybrid image source model/measurement-based approach. Two different models are constructed based on existing spaces with significantly different acoustic properties: a middle sized lecture hall and a large cathedral interior. Various optimisation techniques, including order reduction of Head Related Transfer Function using approximate factorisation and Room Impulse Response decomposition using directional analysis are incorporated and some important aspects of their perceptual impact investigated extensively by the means of subjective listening trials. Lastly, spatial localisation of sounding objects is affected not only by the auditory cues but also by other modalities such as vision. This is true particularly in the context of perception of distance where the number of auditory cues is limited in comparison to e.g. localisation in horizontal and vertical planes. This work also investigates the influence of vision on the perception of audio. In particular, the effect of incongruent audio-visual cues is explored in the context of the perception of auditory distance in photo-realistic VAEs.

Speaker David Corrigan
Title Depth perception of audio sources in stereo 3D environments
Time & Venue Printing House Hall - 12:00 30-Jan-13


In this paper we undertook perceptual experiments to determine the allowed differences in depth between audio and visual stimuli in stereoscopic-3D environments while being perceived as congruent. We also investigated whether the nature of the environment and stimuli affects the perception of congruence. This was achieved by creating an audio-visual environment consisting of a photorealistic visual environment captured by a camera under orthostereoscopic conditions and a virtual audio environment generated by measuring the acoustic properties of the real environment. The visual environment consisted of a room with a loudspeaker or person forming the visual stimulus and was presented to the viewer using a passive stereoscopic display. Pink noise samples and female speech were used as audio stimuli which were presented over headphones using binaural renderings. The stimuli were generated at different depths from the viewer and the viewer was asked to determine whether the audio stimulus was nearer, further away or at the same depth as the visual stimulus. From our experiments it is shown that there is a significant range of depth differences for which audio and visual stimuli are perceived as congruent. Furthermore, this range increases as the depth of the visual stimulus increases.

Speaker Yun Feng Wang
Title Double-Tip Effect Removal In Atomic Force Microscopy Images
Time & Venue Printing House Hall - 12:00 23-Jan-13


The Atomic Force Microscope (AFM) has enabled much progress in nanotechnology by capturing a material surface structure with nanoscale resolution. However, due to the imperfection of its scanning probe, some artefacts are induced during the scanning process. Here, we focus on a new type of artefact in AFM images called ‘double-tip’ effect. The ‘double-tip’ effect has a dramatic form of distortion compare with the traditional blurring artefacts. A novel deblurring framework, based on the Bayesian theorem and user interactive method, is carried out to identify an approach to remove this effect. The results have proven that our framework is successful at removing the ‘double-tip’ effect in the AFM artefact images and the details of the sample surface topography are also well preserved.

Speaker Ken Sooknanan
Title Towards Identifying Nephrops Burrows Automatically from Marine Video
Time & Venue Printing House Hall - 12:00 5-Dec-12


The Dublin Bay prawn is a commercially significant species of lobster throughout Europe and the UK. To regulate the fishing industry, governing bodies (e.g. The Marine Institute Ireland) carry out yearly underwater surveys to estimate its population. This estimation process mainly involves manually counting individual (or clusters) of the species burrows from underwater survey videos of the seabed. To improve this current tedious process we are exploring the possibility of identifying these burrows automatically. In this talk, a brief overview of the segmentation technique (i.e. ICM) that we are using to locate/detect these burrows will be presented.

Speaker Finnian Kelly
Title Eigen-Ageing Compensation for long-term Speaker Verification
Time & Venue Printing House Hall - 12:00 28-Nov-12


Vocal ageing causes speaker verification accuracy to worsen as the time lapse between model enrolment and verification increases. In this talk, a new approach to compensate for the ageing effect will be presented. The method is based on learning the dominant changes in speaker models with ageing, and exploiting this at the verification stage. An evaluation of the technique on a recently expanded ageing database of 26 subjects will be presented.

Speaker Roisin Rowley-Brooke
Title A non-parametric approach to document bleed-through removal. (aka the ink is greener from the other side..)
Time & Venue Printing House Hall - 12:00 14-Nov-12


Ink bleed-through degradation poses one of the most difficult problems in document restoration. It occurs when ink has seeped through from one side of the page and interferes with text on the other side. In this talk I will present recent work on a new framework for bleed-through removal including image preprocessing, region classification based on a segmentation of the 2d recto-verso intensity histogram and connected component analysis, and finally restoration of the degraded regions using exemplar-based image inpainting.

Speaker Francois Pitie and Gary Baugh
Title 2D to 3D Conversion For Animated Movies
Time & Venue AAP 2.0.2 - 12:00 31-Oct-12


In this talk we will present our research on developing postproduction tools for converting animated movies to stereoscopic 3D. The key to the stereoscopic 3D conversion it to utilise the depth information, that is generated for free by the animation software, to synthesize a novel left and right views of the scene. We will present our results (in 3D) and detail some of the image processing challenges of this approach.

Speaker Andrew Hines, Naomi Harte and Frank Boland
Title Sigmedia's European Research Networks: COST actions
Time & Venue Printing House Hall - 12:00 24-Oct-12


European Cooperation in Science and Technology (COST) – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. Sigmedia members are currently representing Ireland and participating in three COST actions. This seminar will give a brief introduction of each of the actions.

- ICT COST Action IC1105 Frank Boland 3D-ConTourNet - 3D Content Creation, Coding and Transmission over Future Media Networks

- ICT COST Action IC1106 Naomi Harte Integrating Biometrics and Forensics for the Digital Age

- ICT COST Action IC1003 Andrew Hines European Network on Quality of Experience in Multimedia Systems and Services (QUALINET)

Speaker Ian Kelly
Title Randomness in Acoustic Impulse Responses and its Effects on Factorization
Time & Venue Printing House Hall - 12:00 17-Oct-12


Head-related impulse responses (HRIRs) contain all of the necessary auditory information required for convincing spatial audio reproduction. It was recently proposed that the HRIRs could be factorized via an iterative least squares algorithm to yield a direction independent component and a set of reduced order direction dependent components. However further studies showed this minimization problem to have multiple minima. This short talk will cover my work on determining why multiple solutions occur by exploring the inherent randomness in the responses themselves. Furthermore consideration is given to how this behaviour can be exploited for equalization problems.

Page last modified on August 19, 2018