Can I have some help choosing a very low-sample estimator?

Question

I want to forecast what next semester's finances may look like, regarding my campus job. I get paid bi-weekly, and have eight past data points: 358.75, 476.50, 482.50, 479.50, 253.50, 484.00, 475.00, 391.50. I removed the outlier 253.50 because it was due to a week school break, which won't happen this semester, leaving me with $n=7$. The least and most I could make back then were \$242 and \$484. This time, I predict that will be \$270 and \$540, because I have better jobs and know how much I can work each one.

I have written a Python script that takes a paycheck estimator (my main concern), and adds up 8 paychecks I know are coming millions of times, histograms them, and highlights the 10th percentile (So as to say, "I have a 90% probability I will make more than this", for a safe budgeting strategy). I believe what I've made is called a "Monte-Carlo Simulation" but please correct me if that's not quite it.

I have used three different estimators so far:

A histogram with 3 bins, made in Excel, after scaling the old data by $\frac {\max_{new}} {\max_{old}}$ so the maxima would be where I expect.
Truncated Kernel-Density-Estimation with Scott bandwidth choice.
Beta distribution with method of moments, normalizing the sample set to [0, 1], finding $\mu$ and $\sigma^2$ and solving for $\alpha$ and $\beta$ from these, then scaling to [270, 540]

Below are the histograms and 10th percentiles produced by each of these, with 1 million trials, again each involving adding 8 paychecks together:

Histogram PDF. 10th percentile: \$3344
Truncated KDE. 10th percentile: \$3559
Beta method of moments. 10th percentile: \$3855

I will say, the Beta looks ridiculously confident in very high values and plotting it looks like an infinite spike at \$540, though its mean and variance are the same as the past samples.

The histogram was my own idea, it seemed intuitive to me, and it is apparently a legitimate estimator. Regarding KDE and Beta, however, this is my first time even hearing about them. Could I get some help understanding what to do in this situation, and what may have the least bias and MSE against the truth for my case? My estimators are kind of agreeing, but they are different, and I want to know the truth. Thank you.

Welcome to Cross Validated! Why do your paychecks differ at all? Looking at my pay history over the past few months, the amount varies by a cent of two, not by a hundred bucks. Do you, for instance, earn tips as a bartender or have erratic hours? Can you forecast your tips or your hours? $//$ Also worth considering is if you want to forecast take-home pay or the amount before deductions. I can see arguments for both. — Dave
– Dave, Commented Aug 15 at 14:34
Erratic hours, that's right. I have tutoring and grading jobs whose hours vary by how busy my employing professor's students are. I had two different jobs, each of which I could work different hours per week, maximum 20 overall. No deductions by the way, it's very low pay and fully exempt by the university payroll. This time it will be the same style of work, in more advanced classes, paying me more, but I still have two different jobs whose hours may vary every week. — HydroPage
– HydroPage, Commented Aug 15 at 14:40
Goodwin's law applies: "If the name of the method contains more words than the number of observations that were used to test it, then it's wise to put any plans to adopt the method on hold." You only have eight data points, and one of them is atypical. So you can't even do any useful kind of holdout testing, and especially not for a quantile forecast, which is what you are doing. So just go with the simplest possible approach and use the empirical 10% quantile: 358.75. ... — Stephan Kolassa
– Stephan Kolassa, Commented Aug 15 at 20:12
... Seriously, you don't know the truth, and nobody does, perhaps you win the lottery or fall of a ladder. There are better and worse ways of forecasting, and with the little data you have, the empirical quantile is very probably the best you can do, even if it does not look very sophisticated. You could take a look at our resources for forecasting here: stats.stackexchange.com/q/559908/1352 — Stephan Kolassa
– Stephan Kolassa, Commented Aug 15 at 20:14
@StephanKolassa Thank you Dr. Kolassa, that's exactly the kind of opinion I was searching for. At what number of data points do you think it becomes genuinely interesting to compare techniques? Also, my Numpy "percentile" function tells me \$378.40, on the data set for the 10th percentile, which differs from what you said. Which is appropriate? I assume 378.40 is an interpolation of some sort on Numpy's part. — HydroPage
– HydroPage, Commented Aug 15 at 20:21

Stephan Kolassa · Accepted Answer · 2025-08-20 07:05:44Z

Goodwin's (2011, Foresight) law applies:

If the name of the method contains more words than the number of observations that were used to test it, then it's wise to put any plans to adopt the method on hold.

You only have eight data points, and one of them is atypical. So you can't even do any useful kind of holdout testing, and especially not for a quantile forecast, which is what you are doing. So just go with the simplest possible approach and use the empirical 10% quantile: 358.75.

Note that this is the smallest one of your seven valid data points. I don't even know whether this number corresponds to any of the "standard" ways of calculating empirical quantiles (Hyndman & Fan, 1996, which Dave is referring to), which will make a difference especially for extremal quantiles and low numbers of data points. But my point is that it very probably doesn't matter how exactly we determine a 10% quantile out of just seven data points, because of the exact reason given above, and:

Seriously, you don't know the truth, and nobody does, perhaps you win the lottery or fall of a ladder. There are better and worse ways of forecasting, and with the little data you have, the empirical quantile is very probably the best you can do, even if it does not look very sophisticated.

In the comments, you ask:

At what number of data points do you think it becomes genuinely interesting to compare techniques?

On the one hand, it's a question of having enough data points to do proper holdout testing. You would need (IMO) at least twenty data points to more-or-less reliably use three or five of them to evaluate expectation forecasts... but you are looking at 10% quantile forecasts, and for those you really need lots more data. I won't let myself be nailed down to an exact number, but unless you can expect a similar number of three to five observations below your 10% quantile forecast (so you would need 30-50 holdout data points), I would not trust any comparison very much. If 10% quantile forecast A exceeds 3 out of 30 holdout data points, but forecast B only exceeds 2, I would not be comfortable saying that A is better than B...

And of course, on the other hand it very much depends on whether you have any features. Quantiles can vary enormously between regimes. So you would really need to have a certain minimum number of holdout observations within each regime that you want to forecast. Similar points apply if you have numerical features, especially at extremal values of those features.

You could take a look at our resources for forecasting here: Resources/books for project on forecasting models.

Can you explain what you mean by “using three of five of them”? What would you use the rest for? Why wouldn’t I use every sample I’ve got? I haven’t heard of the term holdout testing before. Thanks — HydroPage
– HydroPage, Commented Aug 17 at 13:00
Sorry, it should have been "three OR five", corrected now... The idea is that if you want to choose between forecasting methods, or between different parameterizations, or initializations, or anything else, you should never look at in-sample fits, because it is extremely easy to overfit in-sample. Instead, you "hold out" the last couple of data points, fit your models to the initial (say) 80% of your data, then forecast into your holdout sample, and assess the error here. Much less chance of overfitting. Holdout testing is one of the few things that professional forecasters agree on. — Stephan Kolassa
– Stephan Kolassa, Commented Aug 20 at 7:08
If you then want to use the best model (as assessed on that holdout sample) for "real" forecasting, then you would refit it on the entire sample, of course. — Stephan Kolassa
– Stephan Kolassa, Commented Aug 20 at 7:09
What you are describing in your post essentially is evaluating the quality of three different quantile forecasting algorithms in-sample. Trivially, simply taking an empirical quantile will "work" in-sample, as in having the correct coverage. You are using multiple more complex methods. The problem here really is that (a) you have too little data to do a real out-of-sample assessment, and (b) you have so little data that it starts to matter exactly how you determine empirical quantiles (per Hyndman & Fan). — Stephan Kolassa
– Stephan Kolassa, Commented Aug 20 at 16:01
I see. Thank you again for your time Dr. Kolassa, I will read more about forecasting and holdouts. This is very interesting. — HydroPage
– HydroPage, Commented Aug 20 at 16:05

Stack Exchange Network

Can I have some help choosing a very low-sample estimator?

1 Answer 1

Linked

Hot Network Questions

Can I have some help choosing a very low-sample estimator?

1 Answer 1

Linked

Related

Hot Network Questions