2

I am attempting to a smooth this data set and produce a single representative curve with error bars. The method to acquire the data points was discretized with a fairly coarse step. I do not have much programming experience but am trying to learn. I read that a Gaussian filter might be a good option. Any help would be appreciated. Optical dilatometer data for shrinkage of a ceramic pellet

Here is an example data set:

Time (min) Non-Normalized Shrinkage Normalized Shrinkage 200 93 1.021978022 202 92 1.010989011 204 92 1.010989011 206 92 1.010989011 208 92 1.010989011 210 92 1.010989011 212 91 1 214 90 0.989010989 216 90 0.989010989 218 90 0.989010989 220 88 0.967032967 222 88 0.967032967 224 87 0.956043956 226 86 0.945054945 228 86 0.945054945 230 86 0.945054945 232 86 0.945054945 234 86 0.945054945 236 85 0.934065934 238 84 0.923076923 240 83 0.912087912 242 83 0.912087912 244 83 0.912087912 246 82 0.901098901 248 83 0.912087912 250 82 0.901098901 252 81 0.89010989 254 81 0.89010989 256 82 0.901098901 258 82 0.901098901 260 79 0.868131868 262 80 0.879120879 264 80 0.879120879 

I found this code snippet online somewhere but I do not know how to implement it or whether it is even what I'm looking for.

def smoothListGaussian(list,degree=5): window=degree*2-1 weight=numpy.array([1.0]*window) weightGauss=[] for i in range(window): i=i-degree+1 frac=i/float(window) gauss=1/(numpy.exp((4*(frac))**2)) weightGauss.append(gauss) weight=numpy.array(weightGauss)*weight smoothed=[0.0]*(len(list)-window) for i in range(len(smoothed)): smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight) return smoothed 
0

2 Answers 2

5

Typically, you'd use a library for this, rather than implementing it yourself.

I'm going to use scipy.ndimage for this instead of scipy.signal. If you've had a signal processing class, you'd probably find the scipy.signal approach more intuitive, but if you haven't it will likely seem confusing. scipy.ndimage provides a straight-forward, one-function-call gaussian_filter, as opposed to having to understand a few more signal processing conventions.

Here's a quick example, using the data you posted in your question. This assumes that your data is regularly sampled (it is: every 2 units in time).

import numpy as np import matplotlib.pyplot as plt import scipy.ndimage time, _, shrinkage = np.loadtxt('discrete_data.txt', skiprows=1).T fig, ax = plt.subplots() ax.plot(time, shrinkage, 'ro') ax.plot(time, scipy.ndimage.gaussian_filter(shrinkage, 3)) plt.show() 

enter image description here

Most of this is fairly straight-forward, but you might notice the "magical" value of 3 that I've specified in scipy.ndimage.gaussian_filter(shrinkage, 3). That's the sigma parameter of the gaussian function in samples. Because your data is sampled every 2 units in time, that's a sigma of 6 units.

The sigma parameter is exactly analogous to the standard deviation in a "bell curve" normal distribution. The larger you make it, the broader the gaussian function will be, and the smoother your curve will be. By trial and error, a value of 3 seems about right for this particular dataset, but you should experiment and see what you think looks best.

One more final note: There are a lot of different ways to approach this problem. A gaussian filter is a reasonable solution, but there are many, many others. If the exact result is very important, you should probably compare several methods and see which works best for your particular dataset.


In your comment, you asked about saving the smoothed data to a file instead of plotting it. Here's a quick example of one way you might go about that:

import numpy as np import scipy.ndimage time, _, shrinkage = np.loadtxt('discrete_data.txt', skiprows=1).T smoothed = scipy.ndimage.gaussian_filter(shrinkage, 3) np.savetxt('smoothed_data.txt', np.c_[time, smoothed]) 
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you very much. I will attempt to implement this. The exact solution doesn't matter, it's more for presentation purposes. I have multiple curves from different samples and I'm going to be plotting them one on top of the other.
Sadly I'm having some difficulties with the matplotlib.pyplot library. ImportError: No module named matplotlib.pyplot I tried changing the preamble to point to where matplotlib is installed but that didn't seem to work. It would actually be more beneficial to just get a plot of the gaussian datapoints out that I can then plot. How would I do that?
@SeattleFreezer - How did you install matplotlib? Is it possible that you have more than one python executable installed, and you installed matplotlib for one but not the other?
I installed matplotlib with easy_install matplolib That is absolutely possible. I have a mac and over the past year I've been messing around with different things in python. I would like a clean installation but from what I've read I could mess things up pretty easily if I don't do it properly.
What's presumably happening is that easy_install is pointing to one python executable and you're calling another one with you run your script. Check the output of which python, and call the full path of the python executable that you installed matplotlib with (it won't be what which python outputs). For example, if you're using Anaconda, and you installed it in your home directory, you might call $HOME/anaconda/bin/python.
|
0

If your data set is finite, I would consider looking into Gaussian Process Regression (GPR) with a radial basis function. This would achieve a similar result to smoothing the function with a gaussian filter, but has two important benefits:

  1. It can automatically select the "magic" standard deviation of the filter, meaning that the output estimate will optimally fit your data.
  2. It will provide you with an estimate of how confident it is that its output fits your data - give you optimal error bars.

Here is a example of GPR being used to estimate a sine wave: enter image description here

If you think this could solve your problem, I'd recommend looking into the GPy library in Python: https://nbviewer.jupyter.org/github/SheffieldML/notebook/blob/master/GPy/index.ipynb

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.