Regularization

library(glmnet)
set.seed(42)

Preparing the Data for the Model Prediction

n <- 1000
p <- 5000

real_p <- 15

# Creating matrix with 1000 values for 5000 parameters.
x<- matrix(rnorm(n*p),nrow = n, ncol = p)

## Creating the response as a value as a combination od first 15 values with the random noise

y <- apply(x[,1:real_p], 1, sum)+rnorm(n)

## y is the value to be predicted using the 5000 feature parameters.

Creating Train and Test sets.

train_rows <- sample(1:n, .66*n)
x.train <- x[train_rows, ]
x.test <- x[-train_rows, ]

y.train <- y[train_rows]
y.test <- y[-train_rows]

Ridge Regression

alpha0.fit <- cv.glmnet(x.train, y.train, type.measure="mse", alpha=0,family="gaussian")


alpha0.predicted <- predict(alpha0.fit , s = alpha0.fit$lambda.1se, newx =x.test)


ss1<-mean((y.test- alpha0.predicted)^2)
coef0<-sum(coef(alpha0.fit)!=0)

The Sum of squared residuals (variance) for RidgeRegression is 14.884588. The Parameters are not reduced at all in ridge Regression 5001

Lasso Regression

alpha1.fit<- cv.glmnet(x.train,y.train, 
                       type.measure = "mse",alpha= 1,family = "gaussian" )


alpha1.predicted <- predict(alpha1.fit , s = alpha1.fit$lambda.1se, newx =x.test)


ss2<-mean((y.test- alpha1.predicted)^2)
coef1<-sum(coef(alpha1.fit)!=0)

The Sum of squared residuals (variance) for RidgeRegression is 1.1847005. and the Parameters are reduced to 31 parameters

Elastic- net Regression

alpha0.5.fit<- cv.glmnet(x.train,y.train, 
                       type.measure = "mse",alpha= 0.5,family = "gaussian" )


alpha0.5.predicted <- predict(alpha0.5.fit , s = alpha0.5.fit$lambda.1se, newx =x.test)


ss3<-mean((y.test- alpha0.5.predicted)^2)
coef3<-sum(coef(alpha0.5.fit)!=0)

The Sum of squared residuals (variance) for RidgeRegression is 1.2379696.

The Parameters redued to 55

Little Experiment for Different Values of alpha

list.of.fits <- list()

for (i in 0:10){
  fitname <- paste0("alpha",i/10)
  
  list.of.fits[[fitname]]<- cv.glmnet(x.train,y.train, alpha = i/10 , family = "gaussian")
  
}
results <- data.frame()

for (i in 0:10){
    fit.name <- paste0("alpha", i/10)
  
  ## Use each model to predict 'y' given the Testing dataset
  predicted <- 
    predict(list.of.fits[[fit.name]], 
      s=list.of.fits[[fit.name]]$lambda.1se, newx=x.test)
  
  ## Calculate the Mean Squared Error...
  mse <- mean((y.test - predicted)^2)
  
  parameters <- sum(coef(list.of.fits[[fit.name]]) != 0)
  
  ## Store the results
  temp <- data.frame(alpha=i/10, mse=mse, fit.name=fit.name, parameters = parameters)
  results <- rbind(results, temp)
}
results
##    alpha       mse fit.name parameters
## 1    0.0 14.918840   alpha0       5001
## 2    0.1  2.256924 alpha0.1        255
## 3    0.2  1.472927 alpha0.2        163
## 4    0.3  1.362394 alpha0.3         81
## 5    0.4  1.259794 alpha0.4         82
## 6    0.5  1.252103 alpha0.5         45
## 7    0.6  1.253330 alpha0.6         31
## 8    0.7  1.212927 alpha0.7         33
## 9    0.8  1.184028 alpha0.8         37
## 10   0.9  1.182919 alpha0.9         33
## 11   1.0  1.184701   alpha1         31

The Mean squared error is less for alpha = 1 , means the Lasso regression is the best Regression model.

Look at the Parameters are reduced when the alpha value is getting Increases.