3
$\begingroup$

I am trying to visualise the output of an fft. I have taken the microphone output of my Android phone using "audiorecord". The input data is sampled at 32 kHz and I take 128 samples at a time from the record buffer to process.

Each 128 samples is passed to org.apache.commons.maths3.transform.FastFourierTransformer(DftNormalization.STANDARD)and I obtain the real and imaginary parts like this:

val complex = transformer.transform(window, TransformType.FORWARD) for (index in complex.indices) { val rr = complex[index].real val ri = complex[index].imaginary tempConversion[index] = Math.sqrt((rr * rr) + (ri * ri)) } 

I then take log(10) of the data values (as the range is large) and plot them:

fft screen grab

If I play high or low pitched sounds near the microphone, the U shaped plot shape is similar, but the "floor" of the plot rises.

Questions:

  1. Does the plot of the data appear correct?
  2. I am trying to create a "graphic equaliser" type visualisation of the audio captured by the microphone, can you suggest how I should process the fft data further so I can show the frequency spectrum of the captured audio data?

Many thanks for the replies, I have understood a lot more about how to proceed, and I think I'm making progress.

  1. Firstly I took the microphone sampling out of the equation, and created a dummy data set to analyse.

     val f1:Float = 10000f val f2:Float = 15000f val FFT_N = 1024 val FFT_input = FloatArray(FFT_N) for (i in 0..< FFT_N){ val k = 100 * sin(2*Math.PI*f2*i/(FFT_N-1)) + 100 * sin(2*Math.PI*f1*i/(FFT_N-1)) FFT_input[i] = k.toFloat() } return FFT_input 
  2. I upped the fft size to 1024, which surprisingly didn't have an adverse effect on the speed the code runs at.

  3. I created a function to discard the fft bins that correspond to the Nyquist frequency and above, so from an fft of 1024 I'm left with the first 511 bins of data.

  4. I decided not to plot the y axis (bin values) on a logarithmic scale, the dummy data plot looks much better, with 2 clearly discernible peaks.

Dummy data plot showing 2 peaks

  1. I added a Hamming window to the data before calculating the fft, I think it is improving the data plot by reducing spectral leakage but have not spent much time looking at this.

Now I know the data graphing is working reasonably well, I tried putting real microphone data back into the system.

As the existing comments mention (thanks), there appears to be low frequency noise. The screen shot below shows this clearly. Could anyone suggest a way of filtering out this noise, or at least minimising it? My guess is it's being introduced by the microphone preamp circuitry of the phone? One thing to note is it isn't consistent, as in the "noise" fizzes on the graph. It's generally affecting the first 10 bins, with the highest value in bin zero, being at about 10,000 ish.

low frequency noise

$\endgroup$
2
  • $\begingroup$ 32kHz sampling and 128 bins will resolve to 250Hz per bin. That's more than the octave between A-220 and A-440. So for starters, you need more bins. The strong signal around 0Hz indicates that you need to remove the mean, and you probably need to window. $\endgroup$ Commented Jun 3 at 14:46
  • $\begingroup$ Thanks Tim. I added windowing and upped the bins to 1024. Just to clarify, to filter the strong signal around 0Hz is it a case of obtaining an average "noise" value from the first few noisy bins and then subtracting it from the bins before plotting? Or is there a cleverer way of doing this? Please see my updated plot added above. And thanks for the feedback. $\endgroup$ Commented Jun 4 at 14:19

3 Answers 3

4
$\begingroup$

Spectrum is not trivial and I typically recommend a college level class to understand the mathematically fundamentals of the different four types of Fourier Transforms and how they relate to specific signal properties.

Your data is probably "not wrong" but certainly not helpful to look at.

Some pointers

  1. Your frequency resolutions is way too low.
  2. The spectrum of the DFT is conjugate symmetric and periodic, you should only plot the first half for a real input signal
  3. Human perception of pitch is logarithmic, so you should use a logarithmic frequency axis
  4. For many audio spectrum displays is customary to group the linear spaced DFT frequency grid into logarithmically spaced frequency bands (octave, 1/3 octave, etc).
  5. You should consider using a window to mitigate spectral leakage. Whether you need overlap as well depends on the requirements of your application.
  6. The android audio front end is messy with a lot of poorly documented processing steps. The microphone itself can also be all over the place (iOS manages this better). If you want even a minimum level "accuracy" you will somehow have to compensate for this or calibrate the whole setup.
$\endgroup$
1
  • $\begingroup$ Thanks for the comments Hilmar, I really appreciate them. I upped the fft size to 1024 without melting my phone, added windowing and discarded the data points at or above the Nyquist frequency (only plotting the first 511 bins now). The plots are much better (I added an update to my question above). They now show a strong signal at around 0Hz which I assume is being added my the microphone preamp, could this be the cause? And is there a clever way to minimise it messing up the data before I plot it? I will of course bin the data into frequency bands as you suggest. $\endgroup$ Commented Jun 4 at 14:26
3
$\begingroup$

Your window 128 samples is far too small. The frequency response of the window will be wider than the pitch causing interference (overlap) of the harmonics.

You also should also do zero padding to "zoom in" i.e. more interpolation of the frequencies

$\endgroup$
0
$\begingroup$

You only want to plot half the output data, the results are mirrored past the Nyquist frequency and not useful for what you are trying to do.

Digital signal processing is a large subject and there are a lot of things you could do. You seem to be basically on the right track for what you are trying to do.

I would recommend starting with windowing, then maybe some post FFT time averaging, play around with min and max hold and different types of averaging.

Equalization is a more advanced subject.

Many real world sounds have a lot of overtones, so you should expect to see different parts of the floor raise. If you use more points you will see this better at the expense of having poorer time resolution (update rate in your case).

Some other fun things to do to visualize the data is to make a spectrogram or a density display. Keep track of all the FFTs and display them together over time.

$\endgroup$
1
  • $\begingroup$ Good ideas, I like the sound of time averaged ffts. I will definitely experiment once I get the hang of the basics. My plots are already looking better, I added a couple of new ones to the question above. It's definitely improving my Android coding skills :-) $\endgroup$ Commented Jun 4 at 14:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.