How should I improve my Vectorized Gradient descent linear regression model?

Question

I wrote a vectorized Gradient descent implementation of the linear regression model. The Dataset looks something like:

It's Not Working properly as I am getting negative R Squared error I don't understand why ?? Should I decrease Alpha or No. of Iterations or is there any problem in my implementation what should I do?

My Regression plot looks something like below I don't know why I am getting such a line.

My Cost Function Error plot with respect to the number of iterations in Gradient descent looks something like below

R Squared Error is: -3.744682246118262

My Code Snippet:

import numpy as np from sklearn.model_selection import train_test_split import seaborn as sns import matplotlib.pyplot as plt import pandas as pd def CostFunction(Theta,DataMatrix): Size = DataMatrix.shape[0] Error = 0 for i in range(0,Size): Feature = np.vstack(([1],np.array(DataMatrix[i][:-1]).reshape(-1,1))) Error += (np.transpose(Theta).dot(Feature) - DataMatrix[i][-1]) ** 2 return (1/(2*Size))*Error def GradientDescent(Theta,Alpha,DataMatrix,Iterations): Progress = [] Iterate = 0 Size = DataMatrix.shape[0] Error = np.zeros((DataMatrix.shape[1],1)) while(Iterations): for i in range(0,Size): Feature = np.vstack(([1],np.array(DataMatrix[i][:-1]).reshape(-1,1))) #Last Entry is Label Thats Why Error += (np.transpose(Theta).dot(Feature) - DataMatrix[i][-1])*Feature Theta -= Alpha*(1/Size)*Error if(Iterations % 10 == 0): Progress.append([Iterate,CostFunction(Theta,DataMatrix)]) Iterate += 10 Iterations -= 1 return [Theta,Progress] def ProgressCurve(Progress): Progress = [[i[0],i[1].ravel()[0]] for i in Progress] sns.lineplot(x = np.array(Progress)[:,0],y = np.array(Progress)[:,1],marker = '*') plt.show() def Prediction(Theta,Test): Predicted = [] for i in range(0,Test.size): Feature = np.vstack(([1],np.array(Test[i]).reshape(-1,1))) Predicted.append(np.transpose(Theta).dot(Feature)) return Predicted def Error_Metric(Actual,Predicted): Actual = np.array(Actual,dtype = 'float64').reshape(-1,1) Predicted = np.array(Predicted,dtype = 'float64').reshape(-1,1) Error = (Actual - Predicted) ** 2 Variance = (Actual - np.mean(Actual)*np.ones((Actual.shape[0],1))) ** 2 return (1 - np.sum(Error)/np.sum(Variance)) def RegressionLine(X,Y,Orig_X,Orig_Y): Y = [i[0].ravel()[0] for i in Y] sns.scatterplot(x = Orig_X,y = Orig_Y,color = "blue") sns.lineplot(x = X,y = Y,color = "red") plt.show() X = 2*np.random.rand(1000) Y = 4 + 3*X + np.random.randn(1000) X_Train,X_Test,Y_Train,Y_Test = train_test_split(X,Y,test_size = 0.3,random_state = 0) DataFrame = pd.DataFrame() DataFrame['X'] = X_Train DataFrame['Y'] = Y_Train DataMatrix = DataFrame.as_matrix() ThetaParams = np.random.randn(2,1) Theta,Progress = GradientDescent(ThetaParams,0.001,DataMatrix,50) Prediction_Out = Prediction(Theta,np.array(X_Test)) Error = Error_Metric(Y_Test,Prediction_Out) ProgressCurve(Progress) RegressionLine(X_Test,Prediction_Out,X,Y) print(Error)

Ilker Kurtulus · Accepted Answer · 2019-11-30 13:06:37Z

It seems like number of iterations is very small compare to learning rate. (50 vs 0.001). The optimizer can't converge when we look at the fitted line plot.

Try to increase number of iterations, it will help you. You can check this beautiful work and my another answer about the problem.

Stack Exchange Network

How should I improve my Vectorized Gradient descent linear regression model?

1 Answer 1

Linked

Hot Network Questions

How should I improve my Vectorized Gradient descent linear regression model?

1 Answer 1

Linked

Related

Hot Network Questions