Multiple discrete RV convolutions performance

Question

I have a discrete random variable with PDF of (just a simplified example)

Piecewise[{{1/3, x == 5 || x == 10 || x == 15}}, 0]

Actually, in my code this is a MixtureDistribution of a number of discrete distributions.

What I need is to obtain a multiple convolution (for example a 100-fold) of independent identically distributed random variables, distributed according to the provided distribution.

I tried all known to me ways:

1) TransformedDistribution

TransformedDistribution[Plus @@ Flatten[Table[tX[v], {v, 0, vMax}]], Flatten[Table[tX[v] \[Distributed] resF, {v, 0, vMax}]]]

2) multiple DiscreteConvolve

DiscreteConvolve[PDF[resF, x], PDF[resF, x], x, y]

where resF is the provided initial distribution.

The first method provides the needed result, but it takes way too long and the time grows exponentially - approx 60 seconds to obtain a ~10-fold convolution, a few minutes for the next.

The second method seem to provide the correct answer for a couple of first convolutions (and it take longer compared to the first method), but the next convolutions give 0.

Just to compare I wrote a simple head-on program on C# to do the task and it takes only ~2-3 second to do the 300-fold convolution. And there is way to optimize it by removing values, higher than the threshold on the go.

Ultimately, I need the probability that the final convolution will be smaller than a provided threshold, i.e. CDF.

Can someone tell me why is it taking so long or if I'm doing this the wrong way.

Thanks!

EDIT

Some code that I test:

ClearAll[resFF, vMax]; AbsoluteTiming[ resF = EmpiricalDistribution[{1/3, 1/3, 1/3} -> {5, 10, 15}]; vMax = 10; PDF[TransformedDistribution[ Plus @@ Flatten[Table[tX[v], {v, 1, vMax}]], Flatten[Table[tX[v] \[Distributed] resF, {v, 1, vMax}]]], x] ]

Here resF is the distribution I mentioned in the beginning, and vMax is the number of convolutions I want to obtain. So this code gives the PDF of a 10-fold convolution.

On my machine it takes 3 seconds to do the 10-fold convolution, 8 seconds for 11-fold and 25 seconds for 12-fold. As I need more than 100-fold one, this performance is not sufficient.

EDIT 2

As Jim Baldwin said, I missed the fact that I can obtain the result using the generating functions of discrete random variables.

Thus, the code I came up with that gives me the desired value of vMax-fold convolution CDF in point 1000 is:

ClearAll[resF, gen, vMax]; vMax = 150; resF[x_] := Piecewise[{{1/3, x == 5 || x == 10 || x == 15}}, 0]; gen = GeneratingFunction[resF[n], n, x]; Total@Select[CoefficientList[(gen)^vMax, x][[1 ;; 1000]], # > 0 &]

However, in this case I have to scale the initial data, if the outcomes of RV are not integer.

PS: This is out of my own curiosity, but still - if anyone know why the TransformedDistribution solution works so slow, please, tell me. Thanks!

It will make your question more attractive if you explain your abbreviations. Please do not create new tags unless it seems clear that there's a benefit (e.g. there are already many questions which could have had the tag) and the name is self explanatory (rv??) — Szabolcs
– Szabolcs, Commented Oct 28, 2016 at 14:28
@Szabolcs , yes, you are right. I should expand them. RV stands for random variable. — Andrew S.
– Andrew S., Commented Oct 28, 2016 at 14:29
Your code makes no sense as you have not defined tX[v], nor vMax, nor resF. It's just gobbledegook at the moment, leaving the reader to guess. — wolfies
– wolfies, Commented Oct 28, 2016 at 15:00
You should also give a complete example. For instance, you give the PDF as a formula and not as a Mathematica distribution function. Also, because you state that these are all identical independent random variables, why not use generating functions? The generating function of the sum of $n$ random variables is just the generating function of a single random variable raised to the $n$-th power. You could then use the functions Coefficient or CoefficientList to get the probabilities of interest. — JimB
– JimB, Commented Oct 28, 2016 at 15:04

JimB · Accepted Answer · 2016-10-28 15:42:00Z

Probability generating functions might be the way to go. Here's the pgf (probability generating function) of a single random variable:

pgf = t^5/3 + t^10/3 + t^15/3

The probability generating function for the sum of n independent and identically distributed random variables is

pgf^n

You can get the probability for any value of the resulting sum (as I assume what you mean by the convolution is that you want the distribution of the sum of the n random variables) by using CoefficientList and a bit of clean-up:

pgf = t^5/3 + t^10/3 + t^15/3; cl = CoefficientList[pgf^10, t]; x = Range[0, Length[cl] - 1]; pdf = Transpose[{x, cl}]; Select[pdf, #[[2]] > 0 &]]

{{50, 1/59049}, {55, 10/59049}, {60, 55/59049}, {65, 70/19683}, {70, 205/19683}, {75, 484/19683}, {80, 950/19683}, {85, 1580/19683}, {90, 2255/19683}, {95, 8350/59049}, {100, 8953/59049}, {105, 8350/ 59049}, {110, 2255/19683}, {115, 1580/19683}, {120, 950/ 19683}, {125, 484/19683}, {130, 205/19683}, {135, 70/19683}, {140, 55/59049}, {145, 10/59049}, {150, 1/59049}}

As far as timing...Here is an example for n=100:

Timing[pgf = t^5/3 + t^10/3 + t^15/3; cl = CoefficientList[pgf^100, t]; x = Range[0, Length[cl] - 1]; pdf = Transpose[{x, cl}]; Select[pdf, #[[2]] > 0 &];] (* {0.0312002`,Null} *)

Stack Exchange Network

Multiple discrete RV convolutions performance

1 Answer 1

Hot Network Questions

Multiple discrete RV convolutions performance

1 Answer 1

Related

Hot Network Questions