Audio, Acoustics, and Music Processing Group

Our research seeks to satisfy efficiently the spatial audio requirements of enhanced audio-visual presentations. We are working to significantly simplify the synthesis of virtual audio sources whilst maintaining correct spatial localisation and impression. The work explores the delivery of spatial audio to distributed audiences using loudspeaker arrays and to individual listeners using headphones. Our research into simplification of the synthesis has provided novel interpolation and synthesis algorithms as well as measurement configurations for multichannel room impulse responses optimised for multiple loudspeaker reproduction configurations. The development of these algorithms is based on the evaluation and refinement of results achieved by the Group over the past 5 years. In both listener scenarios, our work has explored the problems associated with designing filter functions that are required to model acoustic transmission paths and the design and implementation of methods for evaluating the listener’s perception of synthesised sources. This processed audio delivers to the listener the perception that the audio source is at a specific relative physical location, with true depth and directionality, as well the acoustic perception and timbre of the environment in which the source is located. The solutions to the problems identified and addressed in this work have many applications in areas such as Games, Home Entertainment, Digital Cinema and eLearning.

THRIVE - Tracking Headphones Realise Immersive Virtual Environments

The system consists of novel signal processing and physical prototypes that utilize head tracking for real-time rendering of audio over headphones. The developed prototype provides the listener with the experience of an audio source that is spatially associated with a video display, regardless of their listening orientation. The system has commercial potential in both the e-Learning and video game industries, by enabling the creation of personal immersive 3D audio and video presentations with accurate associations in space and time.

Technical background

  • Spatial audio scene analysis and synthesis – microphone and loudspeaker array processing. Ambisonic encoding and decoding
  • Digital signal processing for interpolation and reduced the complexity Room Impulse Responses, Head Related Impulse Responses and Headphone/Loudspeaker Equalisers
  • Head tracking technology using Inertial Measurement Units, Infra-Red and Face Detection
  • Binaural audio for headphones using knowledge from above to engineer the delivery of spatially accurate audio sources that are stable with head rotation

Optimised Real-time Rendering of Auditory Events in Immersive Virtual Enviroments

This project looks at the problem of a synthesis or a capture of real-world acoustic events in reverberant spaces and their subsequent plausible reproduction in a virtual version of the original space, otherwise known as a Virtual Auditory Environment (VAE). Of particular concern here is the identification and a perceptually correct reconstruction of important acoustic cues that allow humans to localise sound object in 3-D space including distance.

An important aspect in the quest for realism in such auditory scene synthesis is user interaction. That is, how the movements of a person listening to the virtual auditory scene directly influences the scene presentation. Such ‘walk-through auralisations’ present several challenges for production engineers, most significant of which are the generation of correct room acoustic responses due to given source-listener position, various sound reproduction schemes or computational challenges.

The current framework considers the parameterisation of real-world sound fields and subsequent real-time auralisation using a hybrid image source model/measurement-based auralisation approach. Two different models are constructed based on existing spaces with significantly different acoustic properties: middle sized lecture hall and large cathedral interior. Various optimisation techniques, including order reduction of Head Related Transfer Function using factorisation or Room Impulse Response decomposition using Directional Analysis are incorporated and their perceptual effect investigated extensively by the means of subjective listening trials.

Lastly, spatial localisation of sounding objects is affected not only by auditory cues but also by other modalities such as vision. This is true particularly in the context of distance perception where the number of auditory cues is limited in comparison to e.g. localisation in horizontal and vertical planes. This project also investigates the influence of vision on the perception of audio. In particular, the effect of incongruent audio-visual cues is explored in the context of the perception of auditory distance in photorealistic virtual reality environments.

Figure 1: Virtual version of a musical performance in the middle-sized lecture hall
Figure 2: Virtual acoustic environment - cathedral interior

Efficient synthesis of virtual auditory environments

In personal audio systems headphones are commonly used to deliver the VAE and studies have shown that spectral coloration and phase distortion, caused by the headphone transfer function (HpTF), can distort the audio reaching the listener’s eardrums. Headphone equalization can be viewed as a deconvolution problem. The equalization is performed by extracting the headphone response function from the binaural responses using deconvolution or equivalently approximate factorization. This allows an equalization to be performed without any additional computational cost by taking advantage of similar zero behavior in both the headphone response and the binaural impulse response. Both have coefficient distributions similar to that of random polynomials. The zeros of random polynomials are in general uniformly distributed around the unit circle on the argand plane.

The mixing time of a room impulse response (RIR) denotes the point of transition from early reflections to a diffuse reverberation tail. As an impulse response progresses individual reflections gradually become less distinguishable as the reflection density increases along with a general increase in the diffuseness of the decaying response. Under the assumption that early arrivals can be approximately modeled by scaled and warped versions of the direct sound I have been using Dynamic Time Warping in order to better estimate the temporal distribution of arrivals in room impulse responses. Furthermore this allows an estimation of the mixing time for such signals based upon the correlation between the detected arrival and the direct sound.

Page last modified on June 06, 2012