Independent component analysis (ICA) is a statistical algorithm that separates independent audio sources without knowing recoding conditions such as room size, reverberation time, and locations of sources and microphones. This task is often called "blind source separation (BSS)" or "blind audio source separation (BASS)." In ICA, we assume statistical independence of audio sources, where the sources are assumed to be unrelated each other. ICA calculates a demixing system (matrix W in fig.) for the separation, where the demixing system corresponds to the inverse system of the mixture (matrix A in fig.).
ICA can separate audio sources when the sources are mixed with no reverberation (instantaneous mixture). However, reverberation exists in a practical case, and the mixing system becomes a convolutive mixture. Since it is difficult to calculate deconvolution with only the observation, the observed signal is transformed into the time-frequency domain via a short-time Fourier transform (STFT), where the convolutive mixture is transformed into the instantaneous mixture, and ICA is applied. This BASS is called frequency-domain ICA (FDICA).
In FDICA, the observed signal is now a three-dimensional tensor that consists of time, frequency, and microphone (channel) axes. ICA is applied to each frequency, and frequency-wise demixing matrices are estimated to separate the sources. Since the signal components are complex-valued, a source distribution assumed in ICA for utilizing independent assumption becomes a complex distribution.
ICA cannot inherently determine an order of estimated (separated) sources. For example, when we apply ICA to a violin and piano mixture, first and second estimated sources could be either "violin and piano" or "piano and violin." This phenomenon causes a serious problem in FDICA.
Since FDICA employs independent ICAs in each frequency, the order of estimated sources becomes inconsistent (nonaligned) along the frequency axis. This problem is called "the permutation problem." Even if we apply inverse STFT to the nonaligned estimated sources, the sources are remixed. Thus, a permutation solver is required, which aligns the order of the estimated source for all frequency components.
To solve the problem mentioned above, another assumption is required to FDICA to avoid encountering the permutation problem, namely, both the separated source components and the order of them should simultaneously be estimated. Independent vector analysis (IVA) is a popular method that assumes "co-occurrence of the frequency components of the same source" and avoids the permutation problem.
As a more detailed source assumption, we proposed to introduce the "co-occurrence of the time-frequency components of the same source" into FDICA, where time-frequency co-occurrence has a low-rank structure. This approach is called independent low-rank matrix analysis (ILRMA) and is a unification if FDICA and nonnegative matrix factorization (NMF). NMF models the time-frequency structure of the estimated source, and the frequency-wise demixing matrices are estimated without causing the permutation problem. Since the source model is more precise compared with that of IVA, the source separation accuracy tends to be better in many cases. In addition, iterative projection (IP) is introduced to ILRMA, where IP is a fast and convergence-guaranteed optimization algorithm for the demixing matrix in ICA and IVA. Thanks to IP-based parameter optimization, ILRMA achieves satisfactory computational efficiency in a practical situation.
In the following demonstration, music sources are mixed using room impulse responses simulating the recording environment, and the separated results of IVA and ILRMA are presented. Also, we compare the performance of a state-of-the-art algorithm for BASS called multichannel NMF (NMF).
All the mixture signals are produced by convoluting E2A impulse response (rev. time: 300 ms) included in the RWCP database. We also show examples of computational times for each method, where the calculation was performed by Intel Core i7-4790 (3.6GHz) with MATLAB 8.3 (64-bit) and an iteration number of IVA, MNMF, and ILRMA was set to 200. In practical use, 30 and 100 iterations are sufficient for IVA and ILRMA, respectively. For MNMF, 200 iterations are still insufficient.
All the music signals in this demonstration were obtained from SiSEC and used only for an academic research purpose.