I understand that using 100% of the dataset and doing k-fold cross validation instead of train_test_split would eliminate that randomness the latter method have in splitting and thus potentially avoid overfitting.
But I have seen that this is not the best practice when it comes to k-fold CV. What I have seen is that we should split the dataset (say 80% training, 20% testing), and then perform k-fold CV on the 80% training (Image below).
What I am confused about is that with this method, we are again potenially falling into a random split by using train_test_split before CV. Why is this method then considered best practice generally ? What am I missing ?
