Multimodal interaction for scene boundary detection

S.Krinidis,S. Tsekeridou, I.Pitas

IEEE-Eurasip workshop on Nonlinear Signal and Image Processing, NSIP01, USA.

Abstract: A scene boundary detection method is presented, which analyzes both aural and visual information sources and accounts for their inter­relations and coincidence to semantically identify video sce­ nes. Audio analysis focuses on the segmentation of the audio source into three types of semantic primitives, i.e. silence, speech and music. Further processing on speech segments aims at locating speaker changing points. Video analysis attempts to segment the video source into shots. Results from single source segmentation are in some cases suboptimal. Audio­visual interaction achieves to either enhance single source findings or extract high level seman­ tic information. The aim of this paper is to identify semantically meaningful video scenes by exploiting the temporal correlations of both sources based on the observation that semantic changes are characterized by significant changes in both information sources. Experimentation has been carried on several TV sequences com­ posed of many different in­content scenes with plenty of commer­ cials in­between.