Here I am going to model a linear regression between var and res variables. var and res are fake data. I do the regression using lm function which is a very convenient linear regression function in R and compare it with a regression which was impelemted using LMS (least mean square) learning algorithm.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.0.1
Here, I define the functions that I need.
# least-squares cost function
cost <- function(X, y, theta) {
# computes the cost of using theta as the parameter for linear regression
# to fit the data points in X and y
sum((X %*% theta - y)^2)/(2 * length(y))
}
delta <- function(x, y, theta) {
error <- (x %*% theta - y)
delta <- t(x) %*% error/length(y)
return((delta))
}
# gradient descent update algorithm
gradescent <- function(x, y, theta, alpha) {
# Performs gradient descent to learn theta = gradescent(X, y, alpha)
theta <- theta - alpha * delta(x, y, theta)
return(theta)
}
I start with lm function.
data_size<-1000 #number of samples
var <- runif(data_size, 1, 10) # var is random variable which comes from a uniform distribution
res <- var + rnorm(data_size) + 5 #res is function of var and noise
data<-data.frame(var,res)
fitted_model <- lm( res ~ var ) # y=theta0+theta1*x+epsilon
theta_lm<-coef(fitted_model); #least squares parameter extimates
print(theta_lm) #print the parameters
## (Intercept) var
## 4.944 1.016
summary( fitted_model )$r.squared # R^2 coefficient of regression
## [1] 0.8789
mean(residuals( fitted_model )^2) #RSS (sum of squared residual) normalized by sample size
## [1] 0.9774
# scatter plot of response variable vs var overlayed with estimated regression line
print(ggplot( data ,aes( var , res))+geom_point()+geom_abline( intercept = theta_lm [1] , slope = theta_lm [2] , size =2 , color =I( "red ")))
Now I want to implement linear regression from scratch using LMS (least mean square) learning algorithm. In this algorithm, using a search algorithm (gradient descnent), an optimal parameter for the regression is found. Optimal parameter is a parameter that minimizes the cost function.
num_iters <- 1200 #
Var <- cbind(1, matrix(var)) # column of 1 was added for intercept coefficient
cost_history <- double(num_iters) # to keep the
theta_history <- list(num_iters)
theta_grad <- matrix(c(0, 0), nrow = 2) # Initialize the parameters
alpha = 0.05 # set learning rate
for (i in 1:num_iters) {
theta_grad <- gradescent(Var, res, theta_grad, alpha)
cost_history[i] <- cost(Var, res, theta_grad)
theta_history[[i]] <- theta_grad
}
plot(res ~ var) #scatter plot of data (var and response)
# plot converging fit
for (i in c(1, 3, 6, 10, 14, seq(20, num_iters, by = 10))) {
abline(coef = theta_history[[i]], col = rgb(0.8, 0, 0, 0.3))
}
print(theta_grad)
## [,1]
## [1,] 4.944
## [2,] 1.016
plot(cost_history[1:100], type = "line", col = "blue", lwd = 2, main = "Cost function",
ylab = "cost", xlab = "Iterations")
## Warning: plot type 'line' will be truncated to first character
As this was showsn, with updating the parameters with gradient descent for 1200 times, the model converges to the one that lm in R had generated.