Return to Answer

deleted 4 characters in body

edited May 10, 2018 at 14:55

14.6k
2
33
85

So the first step is to uncorrelateduncorrelate the data. This is done by left-multiplying $X$ with the inverse of the square root of the correlation matrix:

The matrix $A$ transformed the rectangle (of dimensions the sine amplitudes) as a diamond, then the uncorrelation transformed the diamond into a (rotated) rectanglesquare.

So now the question is how to measure "Gaussianity"? There are several techniques, some based on neguentropynegentropy, some based on kurtosis. Kurtosis is the fourth moment, and is one measure of Gaussianity. A normal distribution has a kurtosis of 3, so we can define a cost function as:

The results is not too bad either, see the histograms (notnote that the mix, in the second line, is pretty close to a Gaussian!):

Source Link

answered Feb 12, 2018 at 15:51

anderstood

14.6k
2
33
85

I recently discovered this interesting problem, known as cocktail party problem. My understanding is that spectacular source separations, such as these, can presently be achieved with neural network or equally elaborate techniques. However I'd like to summarize my understanding of a much more basic algorithm based on Independent Component Analysis, which is OK for simpler examples. Anton Antonov's answer proposed an implementation of this algorithm, presumably much more sophisticated than what follows, but I wanted to try by myself from scratch.

The Problem

There are two independent sound sources. We could record them independently and stack the results in a vector $X_0$. But instead, we recorded them while both sources were playing, resulting in a measurement

$$X=AX_0$$ where $A$ is a $2\times 2$ matrix called mixing matrix. The objective is to estimate $X_0$ from the measurements $X$ only.

Source generation

For the first example, let's generate two sines with different frequencies and amplitudes:

X0 = {1.3*Table[Sin[2345.*t + 0.1], {t, Subdivide[2, 10000]}], Table[Sin[2000.*t], {t, Subdivide[2, 10000]}]}; fs = 5000; Audio[#, SampleRate -> fs] & /@ X0

We can check that the source signals are far from Gaussian, and that will help.

GraphicsRow[Histogram /@ X0]

We can also check that the covariance is almost diagonal, meaning the source signals are decorrelated:

Covariance[Transpose@X0] (* {{0.845071, -0.000948025}, {-0.000948025, 0.499964}} *)

As a remainder, covariance the matrix whose coefficient $i,j$ is

$$\dfrac{1}{n-1} \sum_{k=1}^n (x_i - \overline{x}_i)(x_j-\overline{x}_j)$$

Mixing source signals

A = {{.3, .7}, {1.6, 1}}; X = A.X0; audios = Audio[#, SampleRate -> fs] & /@ X

Resolution

Whitening (or sphering) While the source were uncorrelated, we can check the measurements are not:

Covariance[Transpose@X] (* {{0.320641, 0.754263}, {0.754263, 2.66031}} *)

So the first step is to uncorrelated the data. This is done by left-multiplying $X$ with the inverse of the square root of the correlation matrix:

Y = MatrixPower[Covariance[Transpose@X], -.5].(X - Mean /@ X)

This operation has a geometrical interpretation if we display $X_0$, $X$, or $Y$ as a list of points:

GraphicsRow[ListPlot[Transpose@#, PlotRange -> Full, AspectRatio -> Automatic, ImageSize -> Small] & /@ {X0, X, Y}]

The matrix $A$ transformed the rectangle (of dimensions the sine amplitudes) as a diamond, then the uncorrelation transformed the diamond into a (rotated) rectangle.

Rotating Now we need to rotate $Y$ in order to recover $X_0$, or more precisely something in the form diagonal matrix times $X_0$ (indeed, the present algorithm is not unique, there is no control on the amplitude of the estimated sources).

In the present case, the angle could be found easily, but in real life example $Y$ does not correspond to a rectangle. So the idea, and that's the key idea of this method, is to find the angle that correspond to "least Gaussian" signals. It relies on the Central Limit theorem: a linear combination of two independent random variables is more Gaussian than the random variables themselves.

This can be seen below: the linear combinations $X$ and $Y$ are closer to Gaussians than $X0$:

GraphicsGrid[Histogram /@ # & /@ {X0, X, Y}]

So now the question is how to measure "Gaussianity"? There are several techniques, some based on neguentropy, some based on kurtosis. Kurtosis is the fourth moment, and is one measure of Gaussianity. A normal distribution has a kurtosis of 3, so we can define a cost function as:

J[theta_] = Kurtosis /@ (RotationMatrix[theta].Y) - 3 // Simplify

Then we maximize the non-Gaussianity to obtain $Z$, our estimation of $X_0$:

thetaOpt = theta /. Last@FindMaximum[Norm@J[theta], {theta, 1.2}] Z = RotationMatrix[thetaOpt].Y;

The estimation of $A$ is:

Atilde = Inverse[RotationMatrix[thetaOpt].MatrixPower[Covariance[Transpose@X], -.5]

To check the results:

 Audio[#, SampleRate -> fs] & /@ # & /@ {X0, X, Z}

It's not perfect, but not too bad!

Other examples

You can try by mixing the instruments, as in Anton Antunov's answer:

tl = 3; fs = 44000; audio = Audio[#, SampleRate -> fs] & /@ {SoundNote["C", {0, tl}, "Piano", SoundVolume -> 1], SoundNote["C", {0, tl}, "SynthVoice", SoundVolume -> 1]}; X0 = (AudioData /@ audio)[[All, 1]]; A = {{1.2, .6}, {.4, 1.1}}; X = A.X0; Audio[#, SampleRate -> fs] & /@ X

The results is not too bad either, see the histograms (not that the mix, in the second line, is pretty close to a Gaussian!):

GraphicsGrid[Histogram[#, 100] & /@ # & /@ {X0, X, Z}]

You can also try this cost function:

J2[theta_] = With[{y = RotationMatrix[theta].Y}, 1/12*Mean[#^3]^2 + 1/48*Kurtosis[#]^2 & /@ y] // Simplify

Unfortunately, this code yields very poor results on a more complex examples such as section 1.

References