I'm trying to create a distribution based on some data I have, then draw randomly from that distribution. Here's what I have:
from scipy import stats import numpy def getDistribution(data): kernel = stats.gaussian_kde(data) class rv(stats.rv_continuous): def _cdf(self, x): return kernel.integrate_box_1d(-numpy.Inf, x) return rv() if __name__ == "__main__": # pretend this is real data data = numpy.concatenate((numpy.random.normal(2,5,100), numpy.random.normal(25,5,100))) d = getDistribution(data) print d.rvs(size=100) # this usually fails I think this is doing what I want it to, but I frequently get an error (see below) when I try to do d.rvs(), and d.rvs(100) never works. Am I doing something wrong? Is there an easier or better way to do this? If it's a bug in scipy, is there some way to get around it?
Finally, is there more documentation on creating custom distributions somewhere? The best I've found is the scipy.stats.rv_continuous documentation, which is pretty spartan and contains no useful examples.
The traceback:
Traceback (most recent call last): File "testDistributions.py", line 19, in print d.rvs(size=100) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py", line 696, in rvs vals = self._rvs(*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py", line 1193, in _rvs Y = self._ppf(U,*args) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py", line 1212, in _ppf return self.vecfunc(q,*args) File "/usr/local/lib/python2.6/dist-packages/numpy-1.6.1-py2.6-linux-x86_64.egg/numpy/lib/function_base.py", line 1862, in call theout = self.thefunc(*newargs) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/stats/distributions.py", line 1158, in _ppf_single_call return optimize.brentq(self._ppf_to_solve, self.xa, self.xb, args=(q,)+args, xtol=self.xtol) File "/usr/local/lib/python2.6/dist-packages/scipy-0.10.0-py2.6-linux-x86_64.egg/scipy/optimize/zeros.py", line 366, in brentq r = _zeros._brentq(f,a,b,xtol,maxiter,args,full_output,disp) ValueError: f(a) and f(b) must have different signs
Edit
For those curious, following the advice in the answer below, here's code that works:
from scipy import stats import numpy def getDistribution(data): kernel = stats.gaussian_kde(data) class rv(stats.rv_continuous): def _rvs(self, *x, **y): # don't ask me why it's using self._size # nor why I have to cast to int return kernel.resample(int(self._size)) def _cdf(self, x): return kernel.integrate_box_1d(-numpy.Inf, x) def _pdf(self, x): return kernel.evaluate(x) return rv(name='kdedist', xa=-200, xb=200)
randoms = getDistribution(Mydata)and thenrandoms = randoms.rvs(size=1000)does it perform the threedefinside the class? i.e calculating pdf, integrating it etc?kernelto do that. For example, something like how we specify a PDF function and then use the PDF function to create randoms using Metropolis Hastings.