What is a Learning Curve in machine learning? [closed]

Question

Closed. This question is not about programming or software development. It is not currently accepting answers.

This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.

Closed 12 months ago.

The community reviewed whether to reopen this question 12 months ago and left it closed:

Opinion-based Because this question may lead to opinionated discussion, debate, and answers, it has been closed. You may edit the question if you feel you can improve it so that it requires answers that include facts and citations or a detailed explanation of the proposed solution. If edited, the question will be reviewed and might be reopened.

Improve this question

I want to know what a learning curve in machine learning is. What is the standard way of plotting it? I mean what should be the x and y axis of my plot?

Never heard of a learning curve. Do you mean a ROC curve? en.wikipedia.org/wiki/Receiver_operating_characteristic — Stompchicken
– Stompchicken, Commented Jan 6, 2011 at 17:07
No, learning curve and ROC curve are not synonymous, as I attempt to describe below. — MattBagg
– MattBagg, Commented Dec 5, 2012 at 2:16
@MattBagg: you are absolutely right, I rolled back to before the edit. — Amro
– Amro, Commented Jan 20, 2013 at 1:18
See Analysis and Optimization of Convolutional Neural Network Architectures — Martin Thoma
– Martin Thoma, Commented Aug 1, 2017 at 5:42

Amro · Accepted Answer · 2019-06-25 13:19:44Z

59

It usually refers to a plot of the prediction accuracy/error vs. the training set size (i.e: how better does the model get at predicting the target as you the increase number of instances used to train it)

learning-curve

Usually both the training and test/validation performance are plotted together so we can diagnose the bias-variance tradeoff (i.e determine if we benefit from adding more training data, and assess the model complexity by controlling regularization or number of features).

bias-variance

edited Jun 25, 2019 at 13:19

answered Jan 7, 2011 at 2:45

Amro

125k25 gold badges250 silver badges466 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

The Disco Spider Over a year ago

There's also a more current article: scikit-learn.org/stable/modules/learning_curve.html

Jason Over a year ago

The Wikipedia entry mentioned an alternative type of learning curve that is the performance v.s. number of iterations. I get a bit confused about these 2 definitions. For the performance-size definition, for each training size x, the y-axis value is obtained from the model that has been trained as much as possible (e.g. by feeding in the x samples multiple times till convergence), or trained using only one pass of the x samples?

Jason Over a year ago

... continuing from previous: For the performance-iteration definition, it must be quite computationally heavy for stochastic training, isn't it? Because for each input sample, one has to predict all training samples and get the average score, then it would be scaling with n^2.

Amro Over a year ago

performance-iterations: you train your model over the entire training set and you plot the loss function on each iteration of the current model measured on the full train/validation set. The model optimization is iterative so the longer you let the algorithm run the more likely it is to improve, and we use such a plot to decide when to stop learning as the model converges or becomes too sensitive to training data losing generalization over the validation set.

Amro Over a year ago

performance-samples: you train your model over an increasing subset size of the training data and you plot the loss function of the current model measured on the full train/validation set. You'd normally train the model until convergence each time (using the same fixed criteria to determine convergence). It can be used to find out if the model is underfitting (we could use more data) or overfitting (we need to tweak regularization to improve generalization and be less sensitive to noisy training data).

desertnaut · Accepted Answer · 2024-10-04 11:29:27Z

Notice that learning curve and ROC curve are not synonymous.

As indicated in the other answers to this question, a learning curve conventionally depicts improvement in performance on the vertical axis when there are changes in another parameter (on the horizontal axis), such as training set size (in machine learning) or iteration/time (in both machine and biological learning). One salient point is that many parameters of the model are changing at different points on the plot. Other answers here have done a great job of illustrating learning curves.

(There is also another meaning of learning curve in industrial manufacturing, originating in an observation in the 1930s that the number of labor hours needed to produce an individual unit decreases at a uniform rate as the quantity of units manufactured doubles. It isn't really relevant but is worth noting for completeness and to avoid confusion in web searches.)

In contrast, Receiver Operating Characteristic curve, or ROC curve, does not show learning; it shows performance. An ROC curve is a graphical depiction of classifier performance that shows the trade-off between increasing true positive rates (on the vertical axis) and increasing false positive rates (on the horizontal axis) as the discrimination threshold of the classifier is varied. Thus, only a single parameter (the decision / discrimination threshold) associated with the model is changing at different points on the plot. This ROC curve (from Wikipedia) shows performance of three different classifiers.

ROC curve, see previous link for CC licensing

There is no learning being depicted here, but rather performance with respect to two different classes of success/error as the classifier's decision threshold is made more lenient/strict. By looking at the area under the curve, we can see an overall indication of the ability of the classifier to distinguish the classes. This area-under-the-curve metric is insensitive to the number of members in the two classes, so it may not reflect actual performance if class membership is unbalanced. The ROC curve has many subtitles and interested readers might check out:

Fawcett, Tom. "ROC graphs: Notes and practical considerations for researchers." Machine Learning 31 (2004): 1-38.

Swets, John A., Robyn M. Dawes, and John Monahan. "Better decisions through Science." Scientific American (2000): 83.

Steve Tjoa · Accepted Answer · 2011-01-07 00:21:40Z

Some people use "learning curve" to refer to the error of an iterative procedure as a function of the iteration number, i.e., it illustrates convergence of some utility function. In the example below, I plot mean-square error (MSE) of the least-mean-square (LMS) algorithm as a function of the iteration number. That illustrates how quickly LMS "learns", in this case, the channel impulse response.

Scott Gottreu · Accepted Answer · 2011-05-29 22:43:47Z

Basically, a machine learning curve allows you to find the point from which the algorithm starts to learn. If you take a curve and then slice a slope tangent for derivative at the point that it starts to reach constant is when it starts to build its learning ability.

Depending on how your x and y axis are mapped, one of your axis will start to approach a constant value while the other axis's values will keep increasing. This is when you start seeing some learning. The whole curve pretty much allows you to measure the rate at which your algorithm is able to learn. The maximum point is usually when the slope starts to recede. You can take a number of derivative measures to the maximum/minimum point.

So from the above examples you can see that the curve is gradually tending towards a constant value. It initially starts to harness its learning through the training examples and the slope widens at maximum/mimimum point where it tends to approach closer and closer towards the constant state. At this point it is able to pick up new examples from test data and find new and unique results from data. You would have such x/y axis measures for epochs vs error.

Emma Zhang · Accepted Answer · 2018-06-22 18:30:16Z

In Andrew's machine learning class, a learning curve is the plot of the training/cross-validation error versus the sample size. The learning curve can be used to detect whether the model has the high bias or high variance. If the model suffers from high bias problem, as the sample size increases, training error will increase and the cross validation error will decrease and at last they will be very close to each other but still at a high error rate for both training and classification error. And increasing the sample size will not help much for high bias problem.

If the model suffers from high variance, as the keep increasing the sample size, the training error will keep increasing and cross-validation error will keep decreasing and they will end up at a low training and cross-validation error rate. So more samples will help to improve the model prediction performance if the model suffer from high variance.

erotavlas · Accepted Answer · 2020-06-23 12:02:47Z

How can you determine for a given model whether more training points will be helpful? A useful diagnostic for this are learning curves.

• Plot of the prediction accuracy/error vs. the training set size (i.e.: how better does the model get at predicting the target as you the increase number of instances used to train it)

• Learning curve conventionally depicts improvement in performance on the vertical axis when there are changes in another parameter (on the horizontal axis), such as training set size (in machine learning) or iteration/time

• A learning curve is often useful to plot for algorithmic sanity checking or improving performance

• Learning curve plotting can help diagnose the problems your algorithm will be suffering from

Personally, the below two links helped me to understand better about this concept

Learning Curve

Sklearn Learning Curve

Elie Sokhon · Accepted Answer · 2019-08-25 08:54:54Z

use this code to plot :

# Loss Curves plt.figure(figsize=[8,6]) plt.plot(history.history['loss'],'r',linewidth=3.0) plt.plot(history.history['val_loss'],'b',linewidth=3.0) plt.legend(['Training loss', 'Validation Loss'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.title('Loss Curves',fontsize=16) # Accuracy Curves plt.figure(figsize=[8,6]) plt.plot(history.history['acc'],'r',linewidth=3.0) plt.plot(history.history['val_acc'],'b',linewidth=3.0) plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Accuracy',fontsize=16) plt.title('Accuracy Curves',fontsize=16)

note that history = model.fit(...)

Tidyquant · Accepted Answer · 2020-01-14 11:12:33Z

It is a Graph that compares the performance of a model on preparing and testing data over a changing number of training instances and these are a generally utilized as analytic instrument in machine learning for calculations that learn from a training dataset incrementally. It allows us to verify when a model has learning as much as it can about the data.

There are three kinds of expectations to Learning curves absorb information

Bad Learning Curve: High Bias
Bad Learning Curve: High Variance
Ideal Learning Curve

user13320096 · Accepted Answer · 2020-04-16 10:13:28Z

In simple terms, the learning curve is a plot between the number of instances and a metric such as loss or accuracy. This plot shows the journey learning with the gain of experience and hence is named learning curve. Learning curves are widely used in machine learning for algorithms that learn (optimize their internal parameters) incrementally over time, such as deep learning neural networks.

Paritosh Yadav · Accepted Answer · 2018-06-14 05:29:26Z

Example X= Level y=salary

X Y 0 2000 2 4000 4 6000 6 8000

Regression gives accuracy 75% it is a state line polynomial gives accuracy 85% because of the curve

Collectives™ on Stack Overflow

What is a Learning Curve in machine learning? [closed]

10 Answers 10

5 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

5 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related