I'm using an early stopping for XGBClassifier. The fitting looks like this (simplified):
# X_train, y_train, X_test, y_test - data split model = XGBClassifier(early_stopping_rounds=10, eval_metric="logloss") model.fit( X_train, y_train, eval_set = [(X_train, y_train), (X_test, y_test)] ) As you can see, the test dataset is used for early stopping. Can this be interpreted as data leakage? In my opinion it's not, since there's no direct information transfer from outside the training set to the fitting procedure and evaluation on the test set may only cause the fitting procedure stops too early/too late/just in time. But I'm not an expert in XGBoost training and I'm not sure if I'm correct.
I read this related topic LightGBM eval_set - what to do when I fit the final model (there's no test data left) but it's not exactly answering my question.