1
$\begingroup$

I have been trying audio classification on the UrbanSound8k dataset and MPSSC snore classification dataset. I am using the approach of transfer learning by extracting features from AlexNet and VGG19 pre-trained on ImageNet. I am then feeding these features to an SVM. Weirdly, I obtain better performance for both the datasets when using the viridis colormap as opposed to giving the same 2D grayscale spectrogram array in each of the 3 channels. One thing I don't understand is how does a colormap add any information which wasn't present in the original spectrogram?

I went through answers such as Do I need 3 RGB channels for a spectrogram CNN? which say that training a CNN has similar performance when using different colormaps. Is the same true for pre-trained networks too?

$\endgroup$

2 Answers 2

2
$\begingroup$

VGG was trained on ImageNet, which is composed of primarily color images, so it's unsurprising that a network which is very good at extracting features from and classify color images produces better results when you feed it in a color image versus a greyscale one.

$\endgroup$
0
$\begingroup$

This technique is called pseudo coloring, and has been explored a little bit in the litterature, also outside of pretrained networks.

For example in Sound Event Recognition in Unstructured Environments using Spectrogram Image Processing, PhD thesis by Jonathan William Dennis.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.