3
$\begingroup$

I compute the NMF over a spectrogram (magnitude of STFT of a signal) in this way:

W,H = NMF(MyStft,r) # where r=2 is the rank 

W and H respectively contain the spectrum and the temporal information.

My question is why? What's the logic behind that?

$\endgroup$
5
  • 1
    $\begingroup$ Imagine that you are trying to separate some chords from a piano recording. Assuming you set your NMF parameters correctly, ideally, you will end up with a decomposition where $W$ will contain each chord and $H$ will tell the time activations for these chords. Alternatively, you could learn separate decompositions for each instrument, save the $W$ matrix for each instrument, stack those matrices together and perform decomposition on a recording which contains those instruments, while freezing the $W$ matrix. You can separate instruments now thanks to disjoint time-frequency representation. $\endgroup$ Commented Aug 31, 2022 at 13:54
  • $\begingroup$ @jojek following this explanation, since the original matrix X had n columns, and im making a rank 2 decomposition, you would agree that the cords are no long the same in the W matrix. So if W is a m x 2 matrix it means i have exactly 2 columns. what are those columns. what do they represent in regards to the original signal. $\endgroup$ Commented Aug 31, 2022 at 15:33
  • $\begingroup$ They will represent some optimal combination of frequencies which minimizes the loss function for your decomposition. In theory, if you set rank to be equal to the number of chords, you might get one chord per column as their frequency content. In case of rank is very low (like 2 in your case), the factorization will be just approximate. Each column of $W$ will be some spectrum, which is added together and multiplied by time activations to approximate the input spectrogram. $\endgroup$ Commented Aug 31, 2022 at 18:42
  • $\begingroup$ This might be an example. In general, rank is important. $\endgroup$ Commented Aug 31, 2022 at 18:45
  • $\begingroup$ Is it true that NMF tends to find a certain «structure» by factoring into the product of two smaller matrixes? That (say) a saxophone playing a tune tends to have repeated timbre that is «trigged» like a sampler (only more variability), and NMF is one method that tends to approximate this structure well (for some definition of «well»)? $\endgroup$ Commented Jul 28, 2023 at 10:39

1 Answer 1

-1
$\begingroup$

Welcome to DSP SE!

A Spectrogram is a time-frequency representation: a representation of the spectrum of frequencies of a signal as it varies with time.
You can think of it as a non-negative (when you take the absolute value as you mentioned) 2D matrix (think columns = time, rows = frequencies, with each entry in the matrix containing the amplitude of a particular frequency component at a particular time instant).

NMF is a group of algorithms that aims at expressing a non-negative matrix $X$ as a factor of 2 non-negative matrices $W$ and $H$. In our case, NMF tries to separate the time-frequency information contained in our Spectrogram $X$ into 2 matrices: one for frequency, one for time:

  • $X$ is the spectrogram magnitude, so it can be factorized into
  • $W$, a set of basis spectral vectors, and
  • $H$, a set of temporal activation weights.

I know these terms can be quite confusing. May I suggest you study this great tutorial: Bill Connelly - Nonnegative Matrix Factorization for Dummies? If you still have questions (maybe a little more precise than the one you asked), feel free to edit or add to your original question ;)

$\endgroup$
4
  • $\begingroup$ Yes i get this part. and it seems this is what all the papers and articles are saying. but my question is what is W. if W is now some m by nComponents. what its relationship with the origianal signal. are the columns of W also signals? because im not getting this idea of basis or spectral signatures. $\endgroup$ Commented Aug 31, 2022 at 15:35
  • $\begingroup$ Have you gone through the tutorial I linked? $\endgroup$ Commented Aug 31, 2022 at 15:55
  • $\begingroup$ In this context "basis" is a word for the collection of individual parts something is made up of. If NMF is applied to faces, for example, the basis "vectors" (which can be visualized as matrices) may look like this: spsc.tugraz.at/collections/assets/NMFFaces.png. $\endgroup$ Commented Sep 1, 2022 at 21:23
  • 2
    $\begingroup$ @Jdip, I think the question is the other way around. You explain why it is reasonable to get such decomposition while I think the question is, among so many options, why does this decomposition is chosen? How come that indeed the matrix in the result matches the time and frequency. This has more to do with the efficiency of the representation. $\endgroup$ Commented May 28, 2023 at 16:56

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.