Timeline for How to determine if my data split is appropriate for my data size?
Current License: CC BY-SA 4.0
12 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Dec 26, 2021 at 3:50 | comment | added | Khosraw Azizi | All code for the model is here: github.com/Khosraw/… The function for the split is commented there for you to see. @Navin | |
| Dec 23, 2021 at 18:42 | comment | added | Navin | 50MB is not large, it’s tiny. And there are two different “splits”: The first is your train/validation split (which might as well be 99%/1% + 100-fold cross validation since your dataset is so tiny). The other split is how many points you’re predicting on each eval (e.g. at inference the model sees 80 points and predicts the next 20 in the time series) which is pretty arbitrary and totally depends on your application. I think you’re confusing the two ratios. Post the code you used to split the data set and each time series in it. | |
| Dec 11, 2021 at 4:36 | vote | accept | Khosraw Azizi | ||
| Dec 8, 2021 at 0:31 | comment | added | G__ | Dont think the data type matters. You’ve already reported a substantial (28%) swing by changing the split. Above is a suggestion to help understand this. | |
| Dec 6, 2021 at 13:51 | history | edited | Khosraw Azizi | CC BY-SA 4.0 | added 35 characters in body |
| Dec 6, 2021 at 13:49 | comment | added | Khosraw Azizi | It's a time-series problem so not a lot will change. | |
| Dec 5, 2021 at 17:43 | comment | added | G__ | Try repeating a few times w different 80/20 splits. See much variance? Maybe you got an unlucky split the first time… | |
| Dec 5, 2021 at 13:01 | answer | added | spectre | timeline score: 4 | |
| Dec 5, 2021 at 12:37 | history | became hot network question | |||
| Dec 5, 2021 at 8:28 | answer | added | eliangius | timeline score: 5 | |
| S Dec 5, 2021 at 4:04 | review | First questions | |||
| Dec 5, 2021 at 5:14 | |||||
| S Dec 5, 2021 at 4:04 | history | asked | Khosraw Azizi | CC BY-SA 4.0 |