Linear Regression implementing LMS algorithm

Here I am going to model a linear regression between var and res variables. var and res are fake data. I do the regression using lm function which is a very convenient linear regression function in R and compare it with a regression which was impelemted using LMS (least mean square) learning algorithm.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.0.1

Here, I define the functions that I need.

# least-squares cost function
cost <- function(X, y, theta) {
    # computes the cost of using theta as the parameter for linear regression
    # to fit the data points in X and y
    sum((X %*% theta - y)^2)/(2 * length(y))
}


delta <- function(x, y, theta) {
    error <- (x %*% theta - y)
    delta <- t(x) %*% error/length(y)
    return((delta))
}

# gradient descent update algorithm
gradescent <- function(x, y, theta, alpha) {
    # Performs gradient descent to learn theta = gradescent(X, y, alpha)
    theta <- theta - alpha * delta(x, y, theta)
    return(theta)
}

I start with lm function.

data_size<-1000 #number of samples
var <- runif(data_size, 1, 10) # var is random variable which comes from a uniform distribution
res <- var + rnorm(data_size) + 5 #res is function of var and noise
data<-data.frame(var,res)
fitted_model <- lm( res ~ var ) # y=theta0+theta1*x+epsilon
theta_lm<-coef(fitted_model); #least squares parameter extimates
print(theta_lm) #print the parameters

## (Intercept)         var 
##       4.944       1.016


summary( fitted_model )$r.squared  # R^2 coefficient of regression

## [1] 0.8789

mean(residuals( fitted_model )^2) #RSS (sum of squared residual) normalized by sample size

## [1] 0.9774

# scatter plot of response variable vs var overlayed with estimated regression line
print(ggplot( data ,aes( var , res))+geom_point()+geom_abline( intercept = theta_lm [1] , slope = theta_lm [2] , size =2 , color =I( "red ")))

plot of chunk unnamed-chunk-3

Now I want to implement linear regression from scratch using LMS (least mean square) learning algorithm. In this algorithm, using a search algorithm (gradient descnent), an optimal parameter for the regression is found. Optimal parameter is a parameter that minimizes the cost function.

num_iters <- 1200  #
Var <- cbind(1, matrix(var))  # column of 1 was added for intercept coefficient
cost_history <- double(num_iters)  # to keep the
theta_history <- list(num_iters)
theta_grad <- matrix(c(0, 0), nrow = 2)  # Initialize the parameters
alpha = 0.05  # set learning rate

for (i in 1:num_iters) {

    theta_grad <- gradescent(Var, res, theta_grad, alpha)
    cost_history[i] <- cost(Var, res, theta_grad)
    theta_history[[i]] <- theta_grad

}

plot(res ~ var)  #scatter plot of data (var and response)

# plot converging fit
for (i in c(1, 3, 6, 10, 14, seq(20, num_iters, by = 10))) {
    abline(coef = theta_history[[i]], col = rgb(0.8, 0, 0, 0.3))
}

plot of chunk unnamed-chunk-4

print(theta_grad)

##       [,1]
## [1,] 4.944
## [2,] 1.016


plot(cost_history[1:100], type = "line", col = "blue", lwd = 2, main = "Cost function", 
    ylab = "cost", xlab = "Iterations")

## Warning: plot type 'line' will be truncated to first character

plot of chunk unnamed-chunk-4

As this was showsn, with updating the parameters with gradient descent for 1200 times, the model converges to the one that lm in R had generated.