I've been fitting a binary logistic extreme gradient boosted model using different random samples of the data as training and computing the Gini index (coefficient). However, as I increase the proportion of data used for training, the Gini index increases, and vice versa. I've tried different random seeds and the result is consistent (small variations). Some examples of the Gini index and training proportion (TP) used:
TP 10%: Gini 0.004 TP 60%: Gini 0.243 TP 80%: Gini 0.288 TP 90%: Gini 0.309 TP 100%: Gini 0.320 (Note that for binary classification, Gini 0.5 indicates the worst predictive performance possible).
This seems counterintuitive to me. More training data should in general lead to better predictions. For what its worth, putting the question in Google, AI Overview says "The Gini index, a measure of inequality, generally decreases as the proportion of training data increases, not the other way around. This is because a larger training set provides a more accurate representation of the underlying distribution, leading to a more robust and stable estimate of inequality..."
The Gini index behaves as expected otherwise (decreases if more predictors are included in the model, etc.) What else might explain this odd behaviour?