2
$\begingroup$

I wrote a vectorized Gradient descent implementation of the linear regression model. The Dataset looks something like:enter image description here

It's Not Working properly as I am getting negative R Squared error I don't understand why ?? Should I decrease Alpha or No. of Iterations or is there any problem in my implementation what should I do?

My Regression plot looks something like below I don't know why I am getting such a line. enter image description here

My Cost Function Error plot with respect to the number of iterations in Gradient descent looks something like below enter image description here

R Squared Error is: -3.744682246118262

My Code Snippet:

import numpy as np from sklearn.model_selection import train_test_split import seaborn as sns import matplotlib.pyplot as plt import pandas as pd def CostFunction(Theta,DataMatrix): Size = DataMatrix.shape[0] Error = 0 for i in range(0,Size): Feature = np.vstack(([1],np.array(DataMatrix[i][:-1]).reshape(-1,1))) Error += (np.transpose(Theta).dot(Feature) - DataMatrix[i][-1]) ** 2 return (1/(2*Size))*Error def GradientDescent(Theta,Alpha,DataMatrix,Iterations): Progress = [] Iterate = 0 Size = DataMatrix.shape[0] Error = np.zeros((DataMatrix.shape[1],1)) while(Iterations): for i in range(0,Size): Feature = np.vstack(([1],np.array(DataMatrix[i][:-1]).reshape(-1,1))) #Last Entry is Label Thats Why Error += (np.transpose(Theta).dot(Feature) - DataMatrix[i][-1])*Feature Theta -= Alpha*(1/Size)*Error if(Iterations % 10 == 0): Progress.append([Iterate,CostFunction(Theta,DataMatrix)]) Iterate += 10 Iterations -= 1 return [Theta,Progress] def ProgressCurve(Progress): Progress = [[i[0],i[1].ravel()[0]] for i in Progress] sns.lineplot(x = np.array(Progress)[:,0],y = np.array(Progress)[:,1],marker = '*') plt.show() def Prediction(Theta,Test): Predicted = [] for i in range(0,Test.size): Feature = np.vstack(([1],np.array(Test[i]).reshape(-1,1))) Predicted.append(np.transpose(Theta).dot(Feature)) return Predicted def Error_Metric(Actual,Predicted): Actual = np.array(Actual,dtype = 'float64').reshape(-1,1) Predicted = np.array(Predicted,dtype = 'float64').reshape(-1,1) Error = (Actual - Predicted) ** 2 Variance = (Actual - np.mean(Actual)*np.ones((Actual.shape[0],1))) ** 2 return (1 - np.sum(Error)/np.sum(Variance)) def RegressionLine(X,Y,Orig_X,Orig_Y): Y = [i[0].ravel()[0] for i in Y] sns.scatterplot(x = Orig_X,y = Orig_Y,color = "blue") sns.lineplot(x = X,y = Y,color = "red") plt.show() X = 2*np.random.rand(1000) Y = 4 + 3*X + np.random.randn(1000) X_Train,X_Test,Y_Train,Y_Test = train_test_split(X,Y,test_size = 0.3,random_state = 0) DataFrame = pd.DataFrame() DataFrame['X'] = X_Train DataFrame['Y'] = Y_Train DataMatrix = DataFrame.as_matrix() ThetaParams = np.random.randn(2,1) Theta,Progress = GradientDescent(ThetaParams,0.001,DataMatrix,50) Prediction_Out = Prediction(Theta,np.array(X_Test)) Error = Error_Metric(Y_Test,Prediction_Out) ProgressCurve(Progress) RegressionLine(X_Test,Prediction_Out,X,Y) print(Error) 
$\endgroup$

1 Answer 1

1
$\begingroup$

It seems like number of iterations is very small compare to learning rate. (50 vs 0.001). The optimizer can't converge when we look at the fitted line plot.

Try to increase number of iterations, it will help you. You can check this beautiful work and my another answer about the problem.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.