Questions tagged [train-test-split]

Question 1

When using machine learning algorithms for regressions, I know that the prediction of the final model will be best when the features are within the ranges used for training, to avoid extrapolation. ...

Question 2

The MNIST dataset can be obtained directly using Keras by running the following lines of Python code. ...

Question 3

I'm working with LDA on a Portuguese news corpus (~800k documents with an average of 28 words each after cleaning the data), and I’m trying to evaluate topic quality using perplexity. When I compute ...

Question 4

I’m working on a time-series prediction problem where my goal is to predict the occurrence of a complication for patients based on sequential data. 🔍 Current Approach: I have sequential data for each ...

Question 5

If I am using a GridSearchCV to find hyper parameters on a training set; if I were to run a CalibriatedClassifierCV to tune my probabilities, would it suffice to fit the CalibraitedClassifierCV with ...

Question 6

I am working on a time series prediction problem using an LSTM model. My dataset consists of 27 different items, each with unique IDs, and roughly the same number of samples per item. There are around ...

Question 7

I have a CSV file which can be converted into a PyG graph data object for an edge classification task. Before doing that, I thought of adding some features using NetworkX library. However, since after ...

Question 8

I have a dataset with variables collected years ago, and many variables collected this year as outcome variables. I want to combine all the variables collected this year to get one outcome, e.g. ...

Question 9

I am trying to validate my processes in terms of how I am engaging in model stacking for binary classification. Say I have two models as my base models, models A and B both with different classifiers ...

Question 10

How does the test set in k-fold cross-validation have any purpose? The most common argument in favor of a test set I can find is to not have any data leakage between training and testing. But you don'...

Question 11

Introduction When training a model a "sample" usually refers to the data used to fit the model, so... Sample: Data used for training model Out-of-sample: Data not used for training model Out-...

Question 12

I have an OHLCV* dataset that starts on 01-01-2000 and ends on 31-12-2003 and I want to evaluate a model, say an SVM regressor. In other words, given some daily features describing the dynamics of the ...

Question 13

I am working with a time-dependent sequential dataset, specifically a record of machine breakdowns over a period of time. My dataset includes data from the sensors of several machines until they fail ...

Question 14

I have just 25 observations. I'm not sure would it possible to test & train the data. For example 15 observations for train and 10 observations for test set. 15 observations is so small for ...

Question 15

I am trying to understand the proof that reporting CV performance during model selection as performance estimate is optimistically biased. The steps in the proof are the following: Let $p_i, \pi_i$ ...

Stack Exchange Network

Questions tagged [train-test-split]

Should the minimum and maximum of each feature be contained in the train set for machine learning?

Why is the Keras MNIST dataset split into training and test samples of lengths 60k and 10k respectively?

LDA perplexity with train-test split leads to absurd results (best model = 1 topic)

Should I Include Post-Event Data During Training for Time-Series Prediction Models?

Calibrated Classifier on Training Data [closed]

How to properly split train/val sets for time series LSTM prediction with multiple unique items?

When to perform node/edge graph feature extraction in graph learning pipeline (PyTorch Geometric)?

Identify predictors for clustering output?

Model Stacking Train Test Split Methdods

Purpose of test set in cross-validation

Should out-of-sample validation also be out-of-time for time-series?

Splitting training and test set on a time series problem

What are the appropriate data splitting techniques for time-dependent sequential datasets, such as breakdown records over time?

test & train for very very small data

What is the performance of a "meta" learner that performs internally CV for model selection?

Hot Network Questions