Speech Quality

Measuring Speech Quality in Real Time Communications

User experience is a vital element when viewing content online or in real-time speech communications. Subjective measures of speech quality, commonly using the Mean Opinion Score (MOS), continue to be widely used to assess speech output in multimedia systems. Objective measures developed for POTS have not translated well to internet based communications.

Andrew's PhD research demonstrated the usefulness of mathematical models of the auditory nerve in accurately predicting speech intelligibility. Real human listener test results were reproduced for a word repetition task, yielding simulated performance intensity curves with accuracy within the bounds of human results. The method is based on the use of NSIM (Neurogram Similarity Index Measure). This measure is derived from SSIM (Structured Similarity Index Measure) and compares degraded time-frequency representations of auditory nerve responses to speech stimuli to the response for the original reference speech. As such, it is a full reference measure. In essence, NSIM has removed the human listener in listener tests, and replaced it with a model of the auditory nerve and a systematic method for deriving intelligibility from the model outputs.

This project will develop NSIM further for the measurement of speech quality by adapting NSIM for use as a full reference metric of speech quality. The project will assess the ability of NSIM to accurately measure quality. Performance will be benchmarked against MOS and existing objective measures. The second stage of the project will look at reduced reference measures and the application potential of the model to real scenarios where no “original” exists, typical in online content.

nsim_pic
Figure 1: Speech Intelligibility Testing
neurogram_pic
Figure 2: Neurograms
Page last modified on July 24, 2012