Skip to main content

Questions tagged [data-leakage]

5 votes
1 answer
69 views

I'm using an early stopping for XGBClassifier. The fitting looks like this (simplified): ...
Jakub Małecki's user avatar
3 votes
2 answers
144 views

I'm training a classifier on the DAIGT dataset. The objective is to differentiate human from AI text and so this is a binary classification problem. As a baseline before I move onto an LLM classifier, ...
saladmobster's user avatar
1 vote
1 answer
163 views

If I am using XGBoost with GridSearchCV, how should I choose my evaluation set? Note, I am referring to eval_set within the model params. My current implementation is using GridSearchCV in order to ...
user54565's user avatar
6 votes
1 answer
1k views

Currently my classification model is doing too well on all of the train, validation, and test datasets. I'm assuming there is a data leakage in the features, and therefore I've computed the ...
haneulkim's user avatar
  • 487
0 votes
1 answer
313 views

There are lots of websites saying time series split may cause data leakage. The idea for time series splits is to divide the training set into two folds at each iteration on condition that the ...
Ellen's user avatar
  • 1
0 votes
1 answer
79 views

Sorry if this is the wrong SE, but in my mind it made the most sense to ask this here. My question is related to specifically collecting information about a target demographic, not individuals which ...
Justin T's user avatar
  • 101
1 vote
1 answer
87 views

I have a dataset with ~40k records and 16 columns (including the target) and I want to understand the correct process behind whole data science proccess. This is what I did: Performed an EDA which ...
pustelnikk's user avatar
0 votes
1 answer
100 views

I'm working on a project that aims to classify JIRA issues into their relevant owner group. An issue has the following text features: Summary Description Comments all of which are text based. During ...
Ben's user avatar
  • 209

15 30 50 per page
1
2 3 4 5