4
$\begingroup$

This is what I am trying to do:

Download MNIST images:

resource = ResourceObject["MNIST"]; trainingData = ResourceData[resource, "TrainingData"]; testData = ResourceData[resource, "TestData"]; SeedRandom[42]; rctraindata = RandomChoice[trainingData, 1000]; rctraindatat = 1. - Map[Flatten, Map[ImageData, rctraindata[[All, 1]]]]; 

Add Guassian noise:

nrctraindatat = Table[rctraindatat[[i]] + RandomVariate[NormalDistribution[0, 0.2], 784], {i, Length[rctraindatat]}]; 

Calculate the distance between clean and noisy image:

EuclideanDistance[rctraindatat[[1]], nrctraindatat[[1]]] (*5.82*) 

Scale the noisy image as follows:

scaled = (nrctraindatat[[1]]-Min[Flatten[nrctraindatat]])/(Max[Flatten[nrctraindatat]] - Min[Flatten[nrctraindatat]]); 

The idea behind scaling is that the clean images are within (0,1) and I wanted the noisy image to be in the same range to calculate the distance between them. Calculate the distance again:

EuclideanDistance[rctraindatat[[1]], scaled] (*8.78*) 

Which is the correct way to calculate the distance between a corrupted and a clean image and why? PSNR and SSIM are usually used as metrics. Do I need to rescale the noisy image before calculating PSNR or SSIM?

$\endgroup$
5
  • 2
    $\begingroup$ Rescaling the way you do seems wrong, since a noise outlier at one image point completely changes the rescaled image globally. But the trouble starts before that: If $[0,1]$ is the natural interval for data, what sense does it make to model noise using naive addition? If an image point has value $1$, and there is $0.3$ noise, what does the calculation $1+0.3 = 1.3$ mean? It may be a wrong model for various reasons. An amateurish fix would be to model noise using something like addnoise[x_,noise_] := 1/Pi*ArcTan[Tan[Pi/2*(2*x-1)]+noise]+1/2, which maps $[0,1]$ to $[0,1]$. $\endgroup$ Commented Nov 8, 2022 at 15:39
  • $\begingroup$ A function you might find useful in the future is Rescale. eg foo = Rescale[nrctraindatat[[1]], MinMax@Flatten@nrctraindatat]; then foo === scaled (*True*) $\endgroup$ Commented Nov 8, 2022 at 16:04
  • $\begingroup$ related: mathematica.stackexchange.com/q/30091/60568 mathematica.stackexchange.com/q/159735/60568 $\endgroup$ Commented Nov 8, 2022 at 16:50
  • $\begingroup$ Possibly a dumb question, but would it make sense to use CosineDistance for this? $\endgroup$ Commented Dec 8, 2022 at 17:00
  • $\begingroup$ Also why aren't you just using Mathematica's built-in ImageDistance ? You also have RandomImage and ImageAdd so you don't need to use ImageData $\endgroup$ Commented Dec 3, 2023 at 22:25

1 Answer 1

2
$\begingroup$

I'm not a machine learning expert but it looks like the issue could be related to how you are standardizing your noisy data, which changes the Mean and StandardDeviation of the data. Furthermore the amount of noise you are adding (0.2), relative to how much noise the data has StandardDeviation@Flatten@rctraindatat (*0.31*) seems high.

To preserve the boundaries of the data (0,1) I would Clip the data after adding noise.

 SeedRandom[42]; (*get data*) resource = ResourceObject["MNIST"]; trainingData = ResourceData[resource, "TrainingData"]; testData = ResourceData[resource, "TestData"]; (*make training data*) $trainingDataSampleSize = 1000 rctraindata = RandomChoice[trainingData, $trainingDataSampleSize]; rctraindatat = 1. - Map[Flatten, Map[ImageData, rctraindata[[All, 1]]]]; (* make noisy data sets*) noise = With[ { pixelLength = Length[rctraindatat[[1]]], sampleSize = $trainingDataSampleSize, noiseMean = 0, noiseStandardDeviation = (StandardDeviation@Flatten@rctraindatat)/ 10 RandomVariate[NormalDistribution[noiseMean,noiseStandardDeviation], {sampleSize, pixelLength}] (* faster than Table*) ]; noisyData = rctraindatat + noise; clippedData = Clip[noisyData, {0, 1}]; scaledData = (noisyData - Min[Flatten[noisyData]])/(Max[Flatten[noisyData]] - Min[Flatten[noisyData]]); Through[{Mean, StandardDeviation, MinMax}[Flatten@rctraindatat]] (*{0.130405, 0.307826, {0., 1.}}*) Through[{Mean, StandardDeviation, MinMax}[Flatten@noisyData]] (*{0.130386, 0.309359, {-0.151885, 1.11268}}*) Through[{Mean, StandardDeviation, MinMax}[Flatten@clippedData]] (*{0.139699, 0.302406, {0., 1.}}*) Through[{Mean, StandardDeviation, MinMax}[Flatten@scaledData]] (*{0.223216, 0.244637, {0., 1.}}*) 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.