Skip Trinity Banner Navigation

Skip to main content »

Trinity College Dublin

Skip Main Navigation
Music and Media Technologies

EWD Research Project
Perceptually motivated joint time-frequency analysis

The research focus is the design and application of a perceptually motivated method of joint time-frequency (TF) analysis intended for the analysis of acoustic signals and systems. TF analysis, which expresses a signals energy in terms of a 3-dimensional function of time, frequency and intensity, has been widely used for the study of non-stationary signals and systems. In acoustics, TF analysis has for example been used to for the analysis of music and speech and for the analysis of systems such as loudspeakers and equalisation units.

 

In general, high time-frequency resolution is desirable for investigating physical properties of signals. However, such resolution does not reflect the processing capabilities of the human auditory system which exhibits many non-linear effects due to its physiology. These have a marked influence on our perception of sound.

 

One of the most important features of the auditory system, which has been widely studied, is that of auditory masking. Masking occurs when a relatively weak signal component is rendered inaudible (masked) by a relatively strong signal component (masker). This can occur if signal and masker are either temporally or spectrally close. Masking is therefore classifed as either temporal or spectral; both are typically modeled in terms of auditory temporal and spectral resolutions. These auditory resolutions are dependent on several factors including frequency and signal level.

 

Therefore if we wish to study the perceptual effects of a signal, masking needs to be incorporated in our method of analysis. Using TF analysis it is possible to jointly model both spectral and temporal masking on a single TF distribution and thus see their combined effects. This is only possible if the time and frequency resolution of the distribution can be independently controlled, something which is not possible using linear methods of TF analysis such as the spectrogram or the wavelet transform. Bilinear TF methods do allow independent control of time and frequency resolution but suffer from cross-term interference which may be suppressed by smoothing in TF using a smoothing kernel.

 

The technique developed here is based on Cohen's Class of TF distributions and uses the combination of the Wigner distribution and an ear-response based smoothing kernel leading to its designation as the EarWig distribution (EWD). The EWD is intended as a tool for acousticians and audio engineers for the purpose of analysing the perceptual effects of acoustic syatems and signals. The EWD smoothing kernel accurately incorporates models of temporal and spectral masking as detailed in the psychoacoustic literature. In practically all cases, cross-term interference is also suppressed to a dynamic range sufficient to encompass that of the auditory system (over 100dBs).

 

200 SMask

The above image shows a simulation of the model on a signal commonly used to illustrate spectral masking - a complex tone. Here, the signal is the first 10 harmonics of a 200Hz fundamental. Each harmonic is represented by a ripple along the time axis. The depth of the ripples decreases with frequecy - this is because the ear's critical bandwidth increases with frequency. Note that all inter-harmonic crossterms are completely masked by spectral masking, even for the 200-400Hz.

 

200 Tmask

The image above displays the same distribution from a different perspective, illustrating temporal masking. The signal length is 100ms, starting at time t=0 and continuing to time t=100ms. However both forward and backward masking exist outside the signal duration, forward masking being more pronounced.

 

 

 

Last updated: Dec 09 2009