Skip to main content

Questions tagged [training-error]

0 votes
0 answers
47 views

An interesting question I stumbled upon today is this: Suppose I train models $m_1, m_2, m_3, \ldots, m_N$, where each $m_i$, $i = 1, \ldots, N$ is associated with a hyperparameter $i$. All models are ...
Your neighbor Todorovich's user avatar
1 vote
1 answer
135 views

This question is inspired by a blog post by https://www.argmin.net/p/in-defense-of-typing-monkeys and several rumors I've heard from other people who works in machine learning. The gist of it is that ...
Your neighbor Todorovich's user avatar
0 votes
0 answers
80 views

I have a dataset that has been split into 2 parts, train and test set. After training a model with the training set to classify between class 0 and 1, I used the sklearn roc_curve to calculate the ...
Eric Wang's user avatar
0 votes
0 answers
47 views

I’m training a machine learning model and optimizing the regularization hyperparameters to ensure the model generalizes well. During training, I include regularization terms in the loss function to ...
eshaan's user avatar
  • 1
4 votes
1 answer
140 views

I'm recently teaching an undergraduate introduction to statistics course, but as required by program director, need to add some machine learning materials to it. I'm wondering what is the appropriate ...
ExcitedSnail's user avatar
  • 3,090
0 votes
0 answers
50 views

I'm trying to create a multi-class classification model using RNNs. The input data has a sequence length of 90 and consists of 5 features, normalized to the [0,1] range. Here's the network ...
Mangi222's user avatar
2 votes
1 answer
109 views

I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using default Adam with default learning. (I am working on 1 demention data) I ...
Rahim Brahimi's user avatar
1 vote
0 answers
34 views

Looking for references on which training error goes to 0. As an example, suppose that we have a linear model $y = X \beta_0 + \epsilon,$ where $\beta_0$ is bounded and each row of $X$ is generated ...
Alan Chung's user avatar
1 vote
1 answer
113 views

To my knowledge there are two types of learning curves, those that show the progression of the performance as the amount of epochs progresses, and those that show the performance progression as the ...
Valo's user avatar
  • 51
1 vote
0 answers
178 views

I am trying to understand the difference for training RSS and test RSS for smoothing splines e.g. $$\hat{g}=\arg \min_g \left(\Sigma_{i=1}^{n}(y_{i} - g(x_i))^2 + \lambda \int [g^{(m)}(x)]^{2}dx \...
therickster's user avatar
1 vote
0 answers
89 views

During training I get unusual behavior of my model. Peaks show up in the both validation and training errors that I could not understand. I use MSE as a loss 1 and similar behavior appears in other ...
JeanAR's user avatar
  • 11
3 votes
0 answers
504 views

I'm trying to build a simple linear regression model $y=ax+b$ using pytorch, where $y$ is the number of cells increased on Day $n$ and $x$ is the number of cells ...
Jack's user avatar
  • 71
4 votes
1 answer
3k views

Usually it is called over-fitting when the test error is higher than the training error. Does that imply that it is called under-fitting when the training error is higher than the test error? Also ...
Just a stat student's user avatar
0 votes
0 answers
102 views

How is the PRESS statistic calculated in a k-fold cross-validation? I know how it is done in the leave-one-out scenario. Is it still summed over all training samples, just that there are now k-many ...
dinaue's user avatar
  • 1
1 vote
0 answers
183 views

For a rigorous empirical analysis, I am training a model with three different seeds - 0, 1 and 2. In each case, I found that the model obtained through early stopping (lowest validation loss) had an ...
Dhruv Mullick's user avatar

15 30 50 per page
1
2 3 4 5 6