1
$\begingroup$

I have been practicing with the following dataset: http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength

for building a prediction model based on a MLP, but I have some doubts if the approach followed is the correct. I wanted to tune the activation function based on the following models: identity, logistic, tanh and relu. So what I did is the following:

First I divided my dataset in 80/20/20 for training, validation and test; and for what I know the hyperparameter tuning is in the validation set. So my pseudocode for the validation part is like the following:

for each item in activation function list model=MLP(activation=item,solver="adam") with 1000 iterations fit(Xtrain,ytrain) plot(lossCurve) with the training data fit(Xval,yval) plot(lossCurve) with the validation data end for 

with this loop I found that the "best" activation function was "relu". I am putting two graphs as an example:

enter image description here

enter image description here

After that I got "adam" and "relu" as hyperparameters and the I tried them with the training and testing set, so roughly I did this:

model=MLP(activation="relu",solver="adam") fit(Xtrain,ytrain) plot(lossCurve) with the training data fit(Xtest,ytest) plot(lossCurve) with the test data 

and the curve I get was the following:

enter image description here

What I wanted to know is if my approach is the correct. I ask this because it is not so easy to find examples of loss curves using scikit. I think because in many Internet tutorials the hyperparameter tuning is made by GridSearch or using CV, and the ones that use loss curves are implemented en Keras or TensorFlow.

I wanted to force my model to obtain a curve like this:

enter image description here

which is an overfitted model and just for the sake of learning. So I was wondering, do all models overfit? or what is happening in my tests? Maybe I made something wrong.

Any help would be greatly appreciated.

Thanks

$\endgroup$

1 Answer 1

1
$\begingroup$

It appears that your problem is relatively easy for a deep learning model to perform at 100% correct. That is why your loss curves for both train and validation drop to zero. There is nothing more to do because the problem is solved.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.