Nonnegative matrix factorization (NMF) is a mathematical algorithm that decomposes a nonnegative matrix (matrix X in fig.) into two nonnegative matrices (matrices T and V in fig.). NMF is interpreted as low-rank approximation and can be used for extracting meaningful patterns and unsupervised learning.
When we apply NMF to audio signals, we decompose a time-frequency matrix (matrix X in fig.). As a result, several bases (timbre parts, matrix T in fig.) and their activations (volume and values, matrix V in fig.) are obtained. Audio editing or audio source separation can be achieved by editing or modifying these parts and activations. Also, automatic music transcription (producing music scores from sound) may be realized by NMF with post-processing.
Audio source separation is a technique to separate each of the audio sources in the mixture. In particular, a supervised approach is often used, where some sample sound files of the target source signal are prepared and trained. In the context of NMF, "supervised NMF" is the most popular algorithm and provides better separation performance compared with unsupervised approaches.
In supervised NMF, a sample sound signal is decomposed by NMF as a training stage, and the timbre parts (matrix T in fig.) are obtained in advance. Then, in a separation stage, other matrices (matrices G, H, and U in fig.) are estimated while the pre-trained timbre parts are fixed. As a result, the target source components (TG in fig.) and the other source components (HU in fig.) are separated, and their audio signals can be obtained by transforming them into time-domain waves.
In supervised NMF, we must prepare the sample sound file of the target source as supervision. For example, if we want to separate piano sound from music, the sample sound of completely the same piano (the same player, model, and playing style) is required in the training stage. This is a big problem because, basically, we only have already mixed music, and the individual sound sources are not obtainable. Typical sample sound files that users can prepare are, e.g., the sound played by themselves or artificial sounds synthesized by MIDI/DAW software.
To solve the problem mentioned above, we proposed a new algorithm called "Supervised NMF with basis deformation." This algorithm can extract the source components that have not "the same" but "similar" timbres to the pre-trained sample sound. In the proposed method, we introduce a new term (matrix D in fig.) that deforms the pre-trained timbre parts, where this term can have both positive and negative values. Since excessive deformation degrades the source separation performance, the value range in the deformation term is constrained.
In this demonstration, we produced music signals and sample sound using different MIDI synthesizers, where the sample sound consists of a two-octave scale sound (24 notes). The proposed algorithm can extract the target source, which has similar timbers to the sample sound.
本楽曲の著作権はヤマハ株式会社が保有しております.無断で複製,頒布を行なうと著作権法違反となりますので,ご注意くださいますようお願い申し上げます.
Copyright © 2014 Yamaha Corporation. All rights reserved.