Timeline for How to determine if my data split is appropriate for my data size?

Current License: CC BY-SA 4.0

12 events

when toggle format	what		by	license	comment
Dec 26, 2021 at 3:50	comment	added	Khosraw Azizi		All code for the model is here: github.com/Khosraw/… The function for the split is commented there for you to see. @Navin
Dec 23, 2021 at 18:42	comment	added	Navin		50MB is not large, it’s tiny. And there are two different “splits”: The first is your train/validation split (which might as well be 99%/1% + 100-fold cross validation since your dataset is so tiny). The other split is how many points you’re predicting on each eval (e.g. at inference the model sees 80 points and predicts the next 20 in the time series) which is pretty arbitrary and totally depends on your application. I think you’re confusing the two ratios. Post the code you used to split the data set and each time series in it.
Dec 11, 2021 at 4:36	vote	accept	Khosraw Azizi
Dec 8, 2021 at 0:31	comment	added	G__		Dont think the data type matters. You’ve already reported a substantial (28%) swing by changing the split. Above is a suggestion to help understand this.
Dec 6, 2021 at 13:51	history	edited	Khosraw Azizi	CC BY-SA 4.0	added 35 characters in body
Dec 6, 2021 at 13:49	comment	added	Khosraw Azizi		It's a time-series problem so not a lot will change.
Dec 5, 2021 at 17:43	comment	added	G__		Try repeating a few times w different 80/20 splits. See much variance? Maybe you got an unlucky split the first time…
Dec 5, 2021 at 13:01	answer	added	spectre		timeline score: 4
Dec 5, 2021 at 12:37	history	became hot network question
Dec 5, 2021 at 8:28	answer	added	eliangius		timeline score: 5
S Dec 5, 2021 at 4:04	review	First questions
Dec 5, 2021 at 5:14
S Dec 5, 2021 at 4:04	history	asked	Khosraw Azizi	CC BY-SA 4.0