Skip to main content
added 101 characters in body
Source Link
The Great
  • 2.8k
  • 3
  • 24
  • 49

I am working on a binary classification problem using balanced bagging random forest, neural networks and boosting techniques. my dataset size is 977 and class proportion is 77:23.

I had 61 features in my dataset. However, after lot of feature selection activities, I arrived at 5 features. SoBut yes, these 5 features were identified using random forest estimator in RFECV, Borutapy etc. So, with 5 features, I thought that my Xgboost model will not overfit and provide me better performance in test set but still the Xgboost model overfits and produces poor results on test set. However, Random forest does similar performance on both train and test. Can help me understand why does this happen?

performance shown below for train and test

Random Forest - train data

enter image description here

Random Forest - test data

enter image description here

roc_auc for random forest - 81


Xgboost - train data

enter image description here

Xgboost - test data

enter image description here

roc_auc for xgboost - 0.81

I am working on a binary classification problem using balanced bagging random forest, neural networks and boosting techniques. my dataset size is 977 and class proportion is 77:23.

I had 61 features in my dataset. However, after lot of feature selection activities, I arrived at 5 features. So, with 5 features, I thought that my Xgboost model will not overfit and provide me better performance in test set but still the Xgboost model overfits and produces poor results on test set. However, Random forest does similar performance on both train and test. Can help me understand why does this happen?

performance shown below for train and test

Random Forest - train data

enter image description here

Random Forest - test data

enter image description here

roc_auc for random forest - 81


Xgboost - train data

enter image description here

Xgboost - test data

enter image description here

roc_auc for xgboost - 0.81

I am working on a binary classification problem using balanced bagging random forest, neural networks and boosting techniques. my dataset size is 977 and class proportion is 77:23.

I had 61 features in my dataset. However, after lot of feature selection activities, I arrived at 5 features. But yes, these 5 features were identified using random forest estimator in RFECV, Borutapy etc. So, with 5 features, I thought that my Xgboost model will not overfit and provide me better performance in test set but still the Xgboost model overfits and produces poor results on test set. However, Random forest does similar performance on both train and test. Can help me understand why does this happen?

performance shown below for train and test

Random Forest - train data

enter image description here

Random Forest - test data

enter image description here

roc_auc for random forest - 81


Xgboost - train data

enter image description here

Xgboost - test data

enter image description here

roc_auc for xgboost - 0.81

Source Link
The Great
  • 2.8k
  • 3
  • 24
  • 49

why my boosting model overfits even with just 4 features out of 61?

I am working on a binary classification problem using balanced bagging random forest, neural networks and boosting techniques. my dataset size is 977 and class proportion is 77:23.

I had 61 features in my dataset. However, after lot of feature selection activities, I arrived at 5 features. So, with 5 features, I thought that my Xgboost model will not overfit and provide me better performance in test set but still the Xgboost model overfits and produces poor results on test set. However, Random forest does similar performance on both train and test. Can help me understand why does this happen?

performance shown below for train and test

Random Forest - train data

enter image description here

Random Forest - test data

enter image description here

roc_auc for random forest - 81


Xgboost - train data

enter image description here

Xgboost - test data

enter image description here

roc_auc for xgboost - 0.81