Skip to main content

Questions tagged [train-test-split]

The train-test split is a method used to estimate the performance of machine learning algorithms that are applicable for prediction-based Algorithms/Applications.

3 votes
2 answers
67 views

When using machine learning algorithms for regressions, I know that the prediction of the final model will be best when the features are within the ranges used for training, to avoid extrapolation. ...
n6r5's user avatar
  • 209
1 vote
1 answer
91 views

The MNIST dataset can be obtained directly using Keras by running the following lines of Python code. ...
user3728501's user avatar
2 votes
1 answer
94 views

I'm working with LDA on a Portuguese news corpus (~800k documents with an average of 28 words each after cleaning the data), and I’m trying to evaluate topic quality using perplexity. When I compute ...
O Basile's user avatar
0 votes
0 answers
38 views

I’m working on a time-series prediction problem where my goal is to predict the occurrence of a complication for patients based on sequential data. 🔍 Current Approach: I have sequential data for each ...
Farzad X's user avatar
1 vote
0 answers
42 views

If I am using a GridSearchCV to find hyper parameters on a training set; if I were to run a CalibriatedClassifierCV to tune my probabilities, would it suffice to fit the CalibraitedClassifierCV with ...
user54565's user avatar
0 votes
0 answers
132 views

I am working on a time series prediction problem using an LSTM model. My dataset consists of 27 different items, each with unique IDs, and roughly the same number of samples per item. There are around ...
Rai's user avatar
  • 43
1 vote
0 answers
60 views

I have a CSV file which can be converted into a PyG graph data object for an edge classification task. Before doing that, I thought of adding some features using NetworkX library. However, since after ...
SuperFluo's user avatar
0 votes
0 answers
50 views

I have a dataset with variables collected years ago, and many variables collected this year as outcome variables. I want to combine all the variables collected this year to get one outcome, e.g. ...
NPpsy's user avatar
  • 43
2 votes
1 answer
108 views

I am trying to validate my processes in terms of how I am engaging in model stacking for binary classification. Say I have two models as my base models, models A and B both with different classifiers ...
user54565's user avatar
2 votes
2 answers
360 views

How does the test set in k-fold cross-validation have any purpose? The most common argument in favor of a test set I can find is to not have any data leakage between training and testing. But you don'...
Linrael's user avatar
  • 21
8 votes
1 answer
551 views

Introduction When training a model a "sample" usually refers to the data used to fit the model, so... Sample: Data used for training model Out-of-sample: Data not used for training model Out-...
Esben Eickhardt's user avatar
0 votes
0 answers
45 views

I have an OHLCV* dataset that starts on 01-01-2000 and ends on 31-12-2003 and I want to evaluate a model, say an SVM regressor. In other words, given some daily features describing the dynamics of the ...
tir's user avatar
  • 1
1 vote
0 answers
75 views

I am working with a time-dependent sequential dataset, specifically a record of machine breakdowns over a period of time. My dataset includes data from the sensors of several machines until they fail ...
user386164's user avatar
3 votes
2 answers
838 views

I have just 25 observations. I'm not sure would it possible to test & train the data. For example 15 observations for train and 10 observations for test set. 15 observations is so small for ...
Leila ali's user avatar
  • 189
2 votes
1 answer
100 views

I am trying to understand the proof that reporting CV performance during model selection as performance estimate is optimistically biased. The steps in the proof are the following: Let $p_i, \pi_i$ ...
Antonios Sarikas's user avatar

15 30 50 per page
1
2 3 4 5
9