IWAENC 2006 -- Proceedings

Blind speech separation by combining beamformers and a time frequency binary mask

Author(s)

Jan Cermak, (NTT Communication Science Laboratories)
Shoko Araki, (NTT Communication Science Laboratories)
Hiroshi Sawada, (NTT Communication Science Laboratories)
Shoji Makino, (NTT Communication Science Laboratories)

Topics

Sound enhancement and sound separation
Microphone arrays and array signal processing

Get the paper in PDF format

Acrobat Reader (version 5 minimum) is necessary to read this document.

Abstract

This paper describes a new method for blind speech separation (BSS) of convolutive mixtures. Our approach is based on a widely used speech enhancement method called beamforming. We utilize this technique for BSS by combining a beamformer and a time-frequency binary mask (TFBM) in one system. We propose two different approaches using the same basis but with a different setup. The first approach is designed for (over-)determined configurations, i.e. the number of sensors is equal to or greater than the number of sources. The second approach is designed for underdetermined configurations, i.e. the sources outnumber the sensors. Experimental results show that the proposed approach provides better results than the sole use of a conventional TFBM or a conventional beam-former.