Is mean squared error or absolute error on two spectrograms of two different wav files enough to compare how similar they are?

Question

I've been working on a genetic algorithm to figure out how to play a guitar chord just by listening to it. I have it all working now, but the system I am using to compare how 'similar' a guess is to the target seems less than robust. Here is the basic workflow of the algorithm:

I have a wav file and a corresponding spectrogram of the 'target' chord which I am trying to have the algorithm recreate
I create a population of 'chords'. I simulate how a person would place their fingers on a fretboard and then convert it to a more readable set of pitches.
Then, I run each chord through the simulation. I play what that chord would sound like through a MIDI library and, at the same time, record the sound to convert into a spectrogram. This is not ideal because then I have to wait for each chord to play in isolation, but to my knowledge, there is no way to just dream up a spectrogram, I have to just record it.
Then I assign each chord a 'fitness' value by doing either mean squared or absolute error pixel-wise between the target spectrogram and each chord's spectrogram. I also cut any silence before or after the wav file before turning them into a spectrogram and take the average across the time dimension for each spectrogram (so time is not a factor, and instead of being a plot of frequencies vs time, it is just a list of frequencies and their corresponding amplitudes)

However, I have noticed that this algorithm for finding the fitness of each organism doesn't work great. Even when I run the same chord shape through multiple times, I get a lot of fluctuation each time as to what fitness it receives. It should be relatively the same volume each time, so I can't imagine normalization will help, but I could try it. Any other ideas about how to get it more reliable? Are spectrograms not the way to go? Or maybe there is a smarter option than pixel-wise error? Thanks!

you can certainly "dream up" a spectrogram. After all, chords consists of notes and they can be converted to frequencies/tones. Tone generators are not difficult to implement. — dsp_user
– dsp_user, Commented Apr 9, 2020 at 9:50

Max · Accepted Answer · 2020-04-09 07:36:05Z

My suggestion would be to partition the spectrum according to the well tempered scale, one frequency band for each half tone. You generate a pattern of 1s and 0s for each chord you want to be able to test, with a maximum of six 1s. When a chord is played, just do a FFT of the whole thing, normalize the energy and check each band for a threshold. Your "fitness" value then could be directly calculated from the correct strings vs. wrong strings in some way.

Stack Exchange Network

Is mean squared error or absolute error on two spectrograms of two different wav files enough to compare how similar they are?

1 Answer 1

Hot Network Questions

Is mean squared error or absolute error on two spectrograms of two different wav files enough to compare how similar they are?

1 Answer 1

Related

Hot Network Questions