Multilinear regression implemented in R. The code is coming from the Machine learning course by Andrew Ng on Coursera. Initially, (as part of the weekly exercise) it was done in Matlab/Octave. This is a transposition in R.
Note: before using those functions, a column of 1 should be added to X in order to deal with Theta0
Note: Theta should be a vector. X a matrix and y a vector.
Cost_function <- function(X, y, Theta, Lambda= 0){
m <- dim(X)[1]
assign("m", m, .GlobalEnv)
Hypothesis <- X %*% Theta
assign("Hypothesis", Hypothesis, .GlobalEnv)
J <- 0
J <- (1/ (2 * m)) * t((X %*% Theta) - y) %*% ((X %*% Theta) - y)
# with regularization. put lambda to 0 if not needed
Reg <- (Lambda / (2 * m)) * sum(Theta[2: length(Theta)]^2)
J <- J + Reg
return(J)
}
Methods: if the features are on the same scale, then gradient descent converge more quickly. So this is to be used with gradient descent
Both are implemented below in one shot
Implementation note: when normalizing the features, it is important to store the values used for normalization - the mean value and the standard deviation used for the computation
After learning the parameters from the model, we often want to predict the prices of houses we have not seen before.
Given a new x value (living room area and number of bed rooms), we must first normalize x using the mean and standard deviation that we had previously computed from the training set.
FeatScal_MeanNorm <- function(X){
X_norm <- X
mu <- vector(mode="numeric", length= dim(X)[2])
sigma <- vector(mode="numeric", length= dim(X)[2])
for(i in 1 : dim(X)[2]){
avg <- mean(X[ ,i])
S <- max(X[,i]) - min(X[,i])
X_norm[ ,i] <- (X[ ,i] - avg) / S
# save the mean etc... in another matrix
mu[i] <- avg
sigma[i] <- S
}
assign("mu", mu, .GlobalEnv)
assign("sigma", sigma, .GlobalEnv)
assign("X_norm", X_norm, .GlobalEnv)
print(X_norm)
}
If J increases, Alpha needs to be lower like 0.0001.
Gradient_descent <- function(X, y, Theta, Alpha, nb_iteration, Lambda=0){
Theta_grad <- Theta
J <- Cost_function(X, y, Theta_grad, 0)
df_learning <- data.frame(Iteration= integer(), J= integer()) #dataframe for learning rate
for (i in 1 : nb_iteration){
#Theta_grad <- Theta_grad - (Alpha / m) * (t(X) %*% ((X %*% Theta_grad) - y)) # vectorization without regularization
# Gradient terms
# Notice how we use full vectorization here
# The trick is to compute the summatory term within the gradient function
# using a multiplication of the transpose of the input examples and
# the errors in our predictions. The transpose of X gives us all the feature
# i for all the examples in the i row and so on ...
# Also, since we DO NOT apply regularization for theta1, we use set the
# first element of a copy of theta as 0
pred <- X %*% Theta_grad
grad <- ((t(X) %*% (pred - y)) / m) + ((Lambda/m) * c(0, Theta_grad[2:length(Theta_grad)]))
Theta_grad <- Theta_grad - Alpha * grad
J <- Cost_function(X, y, Theta_grad, Lambda)
df_learning[i, ] <- c(i, J)
}
print(paste("local min J", J))
assign("df_learning", df_learning, .GlobalEnv) #save df in order to do the plot
return(Theta_grad) # nb: code stop after return function
}
The regularization objective is to reduce overfiting.
Normal equation does not need feature scaling!
Norm_equ <- function(X, y, Lambda = 0){
library(MASS) #for the function ginv
# Creating the matrix needed if lambda is used
n <- dim(X)[2]
Mat <- diag(c(rep(1,n)))
Mat[1, ] <- 0
ifelse(Lambda == 0, Theta_grad <- ginv(t(X) %*% X) %*% t(X) %*% y,
Theta_grad <- ginv(t(X) %*% X + Lambda * Mat) %*% t(X) %*% y)
assign("Theta_grad", Theta_grad, .GlobalEnv)
return(Theta_grad) # nb : code stop after return function
}
We will use the mtcars dataset for this example, with mpg as our dependant variable.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
X <- as.matrix(mtcars[,2:7])
Y <- mtcars[,1]
Gradient_descent(X,Y, c(0,0,0,0,0,0),0.00001,800)
## [1] "local min J 29.1175717142582"
## [,1]
## cyl 0.08612188
## disp -0.02429728
## hp 0.06951919
## drat 0.13716588
## wt 0.04651859
## qsec 0.64248892
plot(df_learning$Iteration, df_learning$J,
main = "Learning rate",
xlab = "Nb iterations",
ylab = "J cost")
Gradient_descent(X,Y, c(0,0,0,0,0,0),0.00001,800,50)
## [1] "local min J 29.6709653789403"
## [,1]
## cyl 0.08625318
## disp -0.02418372
## hp 0.06969790
## drat 0.13640546
## wt 0.04626597
## qsec 0.63893032
plot(df_learning$Iteration, df_learning$J,
main = "Learning rate",
xlab = "Nb iterations",
ylab = "J cost")
Norm_equ(X,Y,0) # without regularization
## Warning: package 'MASS' was built under R version 3.3.3
## [,1]
## [1,] 0.14804477
## [2,] 0.01271651
## [3,] -0.01063254
## [4,] 3.26954359
## [5,] -4.75444330
## [6,] 1.19404896
Cost_function(X,Y, Theta_grad,0) # cost function
## [,1]
## [1,] 2.884699
Norm_equ(X,Y,5) # with regulariation
## [,1]
## [1,] 0.252642608
## [2,] -0.009190186
## [3,] -0.005295806
## [4,] 2.382375503
## [5,] -2.479987066
## [6,] 1.163583137
Cost_function(X,Y, Theta_grad,0) # cost function
## [,1]
## [1,] 3.493251