2
$\begingroup$

I have a discrete random variable with PDF of (just a simplified example)

Piecewise[{{1/3, x == 5 || x == 10 || x == 15}}, 0] 

Actually, in my code this is a MixtureDistribution of a number of discrete distributions.

What I need is to obtain a multiple convolution (for example a 100-fold) of independent identically distributed random variables, distributed according to the provided distribution.

I tried all known to me ways:

1) TransformedDistribution

TransformedDistribution[Plus @@ Flatten[Table[tX[v], {v, 0, vMax}]], Flatten[Table[tX[v] \[Distributed] resF, {v, 0, vMax}]]] 

2) multiple DiscreteConvolve

DiscreteConvolve[PDF[resF, x], PDF[resF, x], x, y] 

where resF is the provided initial distribution.

The first method provides the needed result, but it takes way too long and the time grows exponentially - approx 60 seconds to obtain a ~10-fold convolution, a few minutes for the next.

The second method seem to provide the correct answer for a couple of first convolutions (and it take longer compared to the first method), but the next convolutions give 0.

Just to compare I wrote a simple head-on program on C# to do the task and it takes only ~2-3 second to do the 300-fold convolution. And there is way to optimize it by removing values, higher than the threshold on the go.

Ultimately, I need the probability that the final convolution will be smaller than a provided threshold, i.e. CDF.

Can someone tell me why is it taking so long or if I'm doing this the wrong way.

Thanks!

EDIT

Some code that I test:

ClearAll[resFF, vMax]; AbsoluteTiming[ resF = EmpiricalDistribution[{1/3, 1/3, 1/3} -> {5, 10, 15}]; vMax = 10; PDF[TransformedDistribution[ Plus @@ Flatten[Table[tX[v], {v, 1, vMax}]], Flatten[Table[tX[v] \[Distributed] resF, {v, 1, vMax}]]], x] ] 

Here resF is the distribution I mentioned in the beginning, and vMax is the number of convolutions I want to obtain. So this code gives the PDF of a 10-fold convolution.

On my machine it takes 3 seconds to do the 10-fold convolution, 8 seconds for 11-fold and 25 seconds for 12-fold. As I need more than 100-fold one, this performance is not sufficient.

EDIT 2

As Jim Baldwin said, I missed the fact that I can obtain the result using the generating functions of discrete random variables.

Thus, the code I came up with that gives me the desired value of vMax-fold convolution CDF in point 1000 is:

ClearAll[resF, gen, vMax]; vMax = 150; resF[x_] := Piecewise[{{1/3, x == 5 || x == 10 || x == 15}}, 0]; gen = GeneratingFunction[resF[n], n, x]; Total@Select[CoefficientList[(gen)^vMax, x][[1 ;; 1000]], # > 0 &] 

However, in this case I have to scale the initial data, if the outcomes of RV are not integer.

PS: This is out of my own curiosity, but still - if anyone know why the TransformedDistribution solution works so slow, please, tell me. Thanks!

$\endgroup$
10
  • $\begingroup$ It will make your question more attractive if you explain your abbreviations. Please do not create new tags unless it seems clear that there's a benefit (e.g. there are already many questions which could have had the tag) and the name is self explanatory (rv??) $\endgroup$ Commented Oct 28, 2016 at 14:28
  • $\begingroup$ @Szabolcs , yes, you are right. I should expand them. RV stands for random variable. $\endgroup$ Commented Oct 28, 2016 at 14:29
  • 1
    $\begingroup$ Your code makes no sense as you have not defined tX[v], nor vMax, nor resF. It's just gobbledegook at the moment, leaving the reader to guess. $\endgroup$ Commented Oct 28, 2016 at 15:00
  • 1
    $\begingroup$ You should also give a complete example. For instance, you give the PDF as a formula and not as a Mathematica distribution function. Also, because you state that these are all identical independent random variables, why not use generating functions? The generating function of the sum of $n$ random variables is just the generating function of a single random variable raised to the $n$-th power. You could then use the functions Coefficient or CoefficientList to get the probabilities of interest. $\endgroup$ Commented Oct 28, 2016 at 15:04
  • $\begingroup$ @wolfies Provided the code with explanations. $\endgroup$ Commented Oct 28, 2016 at 15:28

1 Answer 1

4
$\begingroup$

Probability generating functions might be the way to go. Here's the pgf (probability generating function) of a single random variable:

pgf = t^5/3 + t^10/3 + t^15/3 

The probability generating function for the sum of n independent and identically distributed random variables is

pgf^n 

You can get the probability for any value of the resulting sum (as I assume what you mean by the convolution is that you want the distribution of the sum of the n random variables) by using CoefficientList and a bit of clean-up:

pgf = t^5/3 + t^10/3 + t^15/3; cl = CoefficientList[pgf^10, t]; x = Range[0, Length[cl] - 1]; pdf = Transpose[{x, cl}]; Select[pdf, #[[2]] > 0 &]] 
{{50, 1/59049}, {55, 10/59049}, {60, 55/59049}, {65, 70/19683}, {70, 205/19683}, {75, 484/19683}, {80, 950/19683}, {85, 1580/19683}, {90, 2255/19683}, {95, 8350/59049}, {100, 8953/59049}, {105, 8350/ 59049}, {110, 2255/19683}, {115, 1580/19683}, {120, 950/ 19683}, {125, 484/19683}, {130, 205/19683}, {135, 70/19683}, {140, 55/59049}, {145, 10/59049}, {150, 1/59049}} 

As far as timing...Here is an example for n=100:

Timing[pgf = t^5/3 + t^10/3 + t^15/3; cl = CoefficientList[pgf^100, t]; x = Range[0, Length[cl] - 1]; pdf = Transpose[{x, cl}]; Select[pdf, #[[2]] > 0 &];] (* {0.0312002`,Null} *) 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.