IWAENC 2006 -- All papers by session

Speaker location and acoustic event detection given a distributed microphone network (PDF, 6 MO)
Plenary Talk 1

Maurizio Omologo (ITC-irst)
September 12, 2006 at 09H30

Abstract

During the last two decades, research on Speaker Location (SLOC) made significant advances and allowed to develop real applications in different fields.

A traditional approach is based on the adoption of a two-step based procedure. In this case, a first crucial step consists in deriving a Time Difference Of Arrival (TDOA) at each microphone pair. One of the most common techniques to derive such delay estimate is based on applying the well known Generalized Cross Correlation - Phase Transform (GCC-PHAT) function.

This approach has been extended by addressing contexts in which microphone pairs are distributed in the given environment in order to ensure a good coverage of any spatial region. In this case, the application of a single-step procedure based on maximizing a Global Coherence Field (GCF), or Steered Response Power (SRP) - PHAT, represents an effective alternative solution to the SLOC problem.

Our more recent research has conducted us to the definition of the so-called Oriented Global Coherence Field (OGCF), which is very effective in estimating both the speaker position and the head orientation. As a result, there is an increase in SLOC system robustness and accuracy, although it can be influenced by the speaking directivity as well.

In this talk, I will introduce OGCF and show its performance when applied in a real noisy and reverberant environment. Experimental results are mostly related to activities conducted during the last two years under the CHIL European Project and under NIST/CLEAR benchmarking tasks.

The second part of the talk will address the generic problem of detecting, localizing and eventually classifying a given acoustic event, or recognizing a speech sequence. Also in this case, information extracted from OGCF represents a relevant cue for detection of an acoustic activity produced either by a speaker or by any other coherent source. The event classification is then performed by using Hidden Markov Models (HMMs).

The presentation will include some video-clips showing the behaviour of real demonstrators realized at ITC-irst laboratories.