Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Machine Learning

DA 5230 – Statistical & Machine Learning Lecture 5 – Logistic Regression Maninda Edirisooriya manindaw@uom.lk

Classification • When the Y variable of a Supervised Learning problem is of several discreate classes (e.g.: Color, Age groups) the problem is known as a Classification problem • A Classification problem has to predict/select a certain Category (or a Class) as the dependent variable • When there are only 2 classes to be classified, it is known as a Binary Classification problem E.g.: Predicting a person’s gender (either as male or female) by testosterone concentration in blood, height and bone density

Binary Classification • Output classes of a binary classification can be represented by either • Boolean values, True or False (or Positive or Negative) • Numbers 1 or 0 • True or 1 value is used for the Positive Class for one class which is generally the class we want to analyze • False or 0 value is used for the Negative Class for the other class • E.g.: For classifying a tumor as malignant (a cancer) or benign (not a cancer) by the tumor size, being malignant can be taken as the Positive class and the benign class as the Negative class

Binary Classification - Example 0 (Benign) 1 (Malignant) X Y

Binary Classification – with Linear Regression 0 (Benign) 1 (Malignant) X Y Linear Regression Classifier 0.5 Malignant Benign

Binary Classification – Problem with LR 0 (Benign) 1 (Malignant) X Y Linear Regression Classifier 0.5 Malignant Benign Misclassified

Binary Classification – Requirement 0 (Benign) 1 (Malignant) X Y Linear Regression Classifier 0.5 Malignant Benign Required Regression Classifier (Variant of Unit Step Function)

Binary Classification – Requirement 0 (Benign) 1 (Malignant) X Y Linear Regression Classifier 0.5 Malignant Benign Not Differentiable here for Gradient Descent Required Regression Classifier (Variant of Unit Step Function)

Binary Classification – Requirement 0 (Benign) 1 (Malignant) X Y Linear Regression Classifier 0.5 Malignant Benign Continuous Regression Classifier

Logistic/Sigmoid Function • Sigmoid function: 𝐟 𝐳 = 𝟏 𝟏+𝐞−𝐳 Z = 0 ⇒ f(Z) = 0.5 0 < f(Z) < 1 • A Non-linear function • This is a continuous alternative for the Unit Step Function Z f(Z)

Logistic Regression Like Linear Regression say, Z = β0 + β1*X1 + β2*X2 + ... + βn*Xn Logistic Function, f Z = 1 1+e−z f X = 1 1 + e−(β0 + β1∗X1 + β2∗X2 + ... + βn∗Xn) In vector form, f X = 1 1 + e−βTX where β0 = β0*X0 taking X0 = 1 This is the function of Logistic Regression.

Logistic Regression - Prediction Let’s take predictions as f(X) = ቊ 1 (or Positive) if, f x ≥ 0.5 0 (or Negative) if, f x < 0.5 f(X) = ൞ Positive ⇒ f X ≥ 0.5 ⇒ 1 1+e−βTX ≥ 0.5 ⇒ βTX ≥ 0 Negative ⇒ f X < 0.5 ⇒ 1 1+e−βTX < 0.5 ⇒ βTX < 0 Here, βTX = β0 + β1*X1 + β2*X2 + ... + βn*Xn

Prediction Example Take a classification problem with 2 independent variables where, f(X) = 1 1+e−(β0 + β1∗X1 + β2∗X2) Negative Positive X2 Z = β0 + β1*X1 + β2*X2 (Decision boundary) Z > 0 Positive Z < 0 Negative X1

Non-linear Classification Taking polynomials of X values (as discussed in Polynomial Regression) can classify non-linear data points with non-linear decision boundaries E.g.: f(X) = 1 1+e− (β0 + β1∗X1 2 + β2∗X2 2) Negative Positive X2 Z = β0 + β1∗X1 2 + β2∗X2 2 (Decision boundary) Z > 0 Positive Z < 0 Negative X1

Binary Logistic Regression – Cost Function Cost for a single data point is known as the Loss Take the Loss Function of Logistic Regression as L{f(X)} L f X , Y = ቊ − log f(X) if Y = 1 − log 1 − f(X) if Y = 0 L f X , Y = −Y log f(X) −(1 − Y) log 1 − f(X) Cost function: J(β) = 1 n σ𝑖=1 n L f x , Y J(β) = 1 n ෌𝑖=1 n [−Y log f(X) − (1 − Y) log 1 − f(X) ] This Cost Function is Convex (has a Global Minimum)

Multiclass Logistic Regression • Up to now we have looked at Binary Classification problems where there can be only two outcomes/categories/classes as the Y variable • When there are more than 2 classes available (only one of them is positive for any given data point) the problem becomes a Multiclass Classification problem • One way to handle Multiclass Classification is using the Binary Classifiers known as One-vs-All (OvA), also known as one-vs-rest (OvR) • It trains multiple binary classifiers, each one predicting the confidence (probability) of one class against the rest, and the highest class is selected

Multiclass Logistic Regression • OvA can be used • When you want to use different binary classifiers (e.g., SVMs or logistic regression) for each class • When available memory is limited or need to highly parallelize • There is another technique for Multiclass Logistic Regression by simply generalizing the binary classification problem of the Logistic Regression • This General form of Classifier is known as the Softmax Classifier • There, the Softmax Function is used instead of the Sigmoid function when there are multiple classes

Softmax Function • The name Softmax is used, as it is a continuous function approximation to the Maximum Function, where only one class (maximum) is allowed to be considered as Positive • Softmax function is used instead of the Maximum Function to make the function differentiable • Softmax Function: S(Xi) = 𝐞𝐱𝐢 ෍ 𝐣=𝟏 𝐧 𝐞 𝐱𝐣 where i is any data point and j is the index of the dimension of the vector Xi

Softmax Function • Softmax function exponentially highlights the value in the dimension where the value is maximum, while suppressing all other dimensions • Output values of a vector from a Softmax function sums to 1 • E.g.: Input Vector Output Vector Softmax Function

Softmax Regression • Like Z = βTX is the used for binary classification, Zk = βk TX is used for Multiclass classification, where k is the index of the class • Note that there K number of β vectors exists as model parameters • Like Y is used for binary classification where there is only a single dependent variables, Multiclass classification has K dependent variables, each denoted by Yk and its estimator ෡ 𝐘𝐤 ෡ 𝐘𝐤 = 𝐞𝐙𝒌 ෍ 𝐣=𝟏 𝐊 𝐞 𝐙𝐣

Softmax Regression Loss function: L f X , Y = -log(෡ Yk) = -log( eZ𝑘 ෍ j=1 K e Zj ) = -log( eβk TX ෎ j=1 K e βj TX ) Cost function (Cross Entropy Loss): J(β) = − ා 𝑖=1 N Σk=1 K I[Yi = k]log( eβk TX ෎ j=1 K e βj TX )

One Hour Homework • Officially we have one more hour to do after the end of the lectures • Therefore, for this week’s extra hour you have a homework • Logistic Regression is the basic building block of Deep Neural Networks (DNN). Softmax classifiers are used as it is in DNNs as the final classification layer • Go through the slides and get a clear understanding on Logistic and Softmax Regressions • Refer external sources to clarify all the ambiguities related to it • Good Luck!

Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Machine Learning

More Related Content

Similar to Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Machine Learning

More from Maninda Edirisooriya

Recently uploaded

Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Machine Learning