MAT 4376 Project 3 Presentation
Problem Statement
The Lasso problem in Lagrangian form, we are aimed to:
\[\text{minimize}_{\beta \in \mathbf{R}^{p}}\{ \frac{1}{2N}||y-X \beta ||^{2}_{2} +\lambda ||\beta||_{1}\}\]
\(\lambda\) is a tunable parameter, we often want to find the best lambda possible for our model
Introduction to the Dataset Selected
Data set was downloaded on Kaggle (Link can be found on the Reference Page)
This data set has 507 observations and 25 variables including 24 scalar variables and 1 dummy variable
This dataset tracks the body dimensions and characteristics of the respondents, such as: sex, age, weight, height, knee diameter in centimeters and so on
The dummy variable is sex
Introduction to the Dataset Selected
This dataset is split into two samples, one consists of female respondents only, another one consists of male respondents only
Why?
The female sample is being used for developing and validating purposes, the male sample is being used for testing purposes
If my function works for the first sample, then it should also work for the second sample, as cv.glmnet comes with randomness, it is crucial to validate the results with more confidence, and I am also using the Monte Carlo method to validate my outputs versus the outputs produced by cv.glmnet
The dependent variable is respondents weight in kilograms, denoted as \(Y_{\text{Female}}\) it is a \(260 \times 1\) column vector, and also \(Y_{\text{Male}}\) it is a \(247 \times 1\) column vector
Denote the design matrix for the female sample as \(X_{\text{Female}}\) , it is a \(260 \times 23\) matrix, and denote the design matrix for the male sample as \(X_{\text{Male}}\) , it is a \(247 \times 23\) matrix
Introduction to the dataset selected
Table1
Cross Validation
\(1\) . Divide the dataset \(n\) into \(m\) disjoint sets \(D_{1}, \cdots , D_{m}\) of size \(n/m\) each.
\(2\) . For each \(\lambda \in \Lambda\) , evaluate \(\hat{\beta}_{LASSO}(\lambda)\) the LASSO estimator based on the dataset \(\frac{D}{D_{h}}, h=1,\cdots, m.\)
Each \(D_{h}\) is treated as a test dataset , while \(D/D_{h}\) as a training dataset .
\(3\) . Thus, for each \(\lambda \in \Lambda\) we get \(h\) LASSO estimator, making in total \(h \times q\) LASSO estimators.
\(4\) . Define the loss function
\(CV(\lambda)=\sum^{m}_{h=1} \sum_{i:(X_{i},Y_{i}) \in D_{h}} (Y_{i}-X^{T}_{i} \hat{\beta}^{-(h)}_{LASSO}(\lambda))^{2}\)
\(5\) . Choose \(\lambda\) that minimizes the loss function
Implementation
library (glmnet)
library (dplyr)#### You need to get dplyr loaded
My_cv_Lasso<- function (X,Y,nfolds){
##### Step 1
n<- nrow (X)
store<- data.frame (Y,X)%>%
mutate (fold= sample (rep (1 : nfolds,length.out= n)))
#### Step 2
obj_glmnet<- glmnet (X,Y,family = "gaussian" ,
alpha= 1 )
lambda_candidate<- obj_glmnet$ lambda
cv_lasso_error<- matrix (0 ,nrow= length (lambda_candidate),ncol= nfolds)
for (i in seq_along (lambda_candidate)){
lam1<- lambda_candidate[i]
for (j in 1 : nfolds){
training_set<- store%>%
filter (fold != j)
test_set<- store%>%
filter (fold == j)
X_train<- training_set%>% ######inactivate package MASS
##### or add dplyr:: in front of select
dplyr:: select (- Y,- fold)%>%
as.matrix ()
Y_train<- training_set$ Y
X_test<- test_set%>%
dplyr:: select (- Y,- fold)%>%
as.matrix ()
Y_test<- test_set$ Y
###### Step 3
lasso_mod<- glmnet (X_train,Y_train,
family = "gaussian" ,
alpha= 1 ,lambda= lam1)
###### Step 4
pred1<- predict (lasso_mod,X_test,s= lam1,"response" )
cv_lasso_error[i,j]<- mean ((Y_test- pred1)^ 2 )
}
}
########Step 5
mean_cv<- rowMeans (cv_lasso_error)
smallest_error<- which (mean_cv == min (mean_cv))
best_lam<- lambda_candidate[smallest_error]
########Done
return (list (candidate_for_lambda= lambda_candidate,
mean_loss= mean_cv,
best_lambda_picked= best_lam))
}
Implementation
set.seed (886 )
Variant_My_cv_Lasso<- function (X,Y,nfolds,mu){
##### Step 1
n<- nrow (X)
store<- data.frame (Y,X)%>%
mutate (fold= sample (rep (1 : nfolds,length.out= n)))
#### Step 2
lambda_candidate<- rexp (100 ,mu)
cv_lasso_error<- matrix (0 ,nrow= length (lambda_candidate),ncol= nfolds)
for (i in seq_along (lambda_candidate)){
lam1<- lambda_candidate[i]
for (j in 1 : nfolds){
training_set<- store%>%
filter (fold != j)
test_set<- store%>%
filter (fold == j)
X_train<- training_set%>% ######inactivate package MASS
##### or add dplyr:: in front of select
dplyr:: select (- Y,- fold)%>%
as.matrix ()
Y_train<- training_set$ Y
X_test<- test_set%>%
dplyr:: select (- Y,- fold)%>%
as.matrix ()
Y_test<- test_set$ Y
###### Step 3
lasso_mod<- glmnet (X_train,Y_train,
family = "gaussian" ,
alpha= 1 ,lambda= lam1)
###### Step 4
pred1<- predict (lasso_mod,X_test,s= lam1,"response" )
cv_lasso_error[i,j]<- mean ((Y_test- pred1)^ 2 )
}
}
########Step 5
mean_cv<- rowMeans (cv_lasso_error)
smallest_error<- which (mean_cv == min (mean_cv))
best_lam<- lambda_candidate[smallest_error]
########Done
return (list (candidate_for_lambda= lambda_candidate,
mean_loss= mean_cv,
best_lambda_picked= best_lam))
}
Result
Range of \(\Lambda_{1} \wedge \Lambda_{2}\) in the Female Set
Result
Chosen \(lambda\) Depending on \(m\)
Result
Monte Carlo Simulation for the Female Set Pick \(m=5,10,15\) using cv.glmnet
Result
Monte Carlo Simulation for the Female Set Pick \(m=5,10,15\) using My_cv_Lasso
Result
Monte Carlo Simulation for the Female Set Pick \(m=5,10,15\) using Variant_My_cv_Lasso
Result
Range of \(\Lambda_{3} \wedge \Lambda_{4}\) in the Testing Set
Result
Chosen \(\lambda\) Depending on \(m\)
Result
Monte Carlo Simulation for the Male Set Pick \(m=5,10,15\) using cv.glmnet
Result
Monte Carlo Simulation for the Female Set Pick \(m=5,10,15\) using My_cv_Lasso
Result
Monte Carlo Simulation for the Male Set Pick \(m=5,10,15\) using Variant_My_cv_Lasso
Result
> #mean(best_lam_glment_mc)
> #mean(best_lam_glment_mc_10)
> #mean(best_lam_glment_mc_15)
> #mean(best_lam_mc_my_cv_lasso_m5)
> #mean(best_lam_mc_my_cv_lasso_m10)
> #mean(best_lam_mc_my_cv_lasso_m15)
> #mean(best_lam_varitant_mc_my_cv_lasso_m5)
> #mean(best_lam_varitant_mc_my_cv_lasso_m10)
> #mean(best_lam_varitant_mc_my_cv_lasso_m15)
>
>
> S1<- c (0.0760686 ,0.08039263 ,0.07978809 ,0.0765162 ,
+ 0.08057883 ,0.08102547 ,0.07735165 ,0.08033728 ,0.08015581 )
> mod_lasso_female<- glmnet (X1,Y1,alpha= 1 ,family= "gaussian" ,lambda= S1)
> pred_female<- predict.glmnet (mod_lasso_female,newx= X1,s= S1,type= "response" )
> mse_female<- apply (pred_female,2 , function (pred_female_col) mean ((pred_female_col- Y1)^ 2 ))
> mse_female
s1 s2 s3 s4 s5 s6 s7 s8
3.114068 3.117036 3.116399 3.114357 3.117437 3.118108 3.114881 3.116824
s9
3.116636
> min (mse_female)
[1 ] 3.114068
> pred_female_s<- predict.glmnet (mod_lasso_female,newx= X1,s= S1,type= "nonzero" )
> pred_female_s
$ s1
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s2
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s3
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s4
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s5
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s6
[1 ] 2 4 7 8 10 11 12 14 15 17 18 19 22 23
$ s7
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s8
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
$ s9
[1 ] 2 4 6 7 8 10 11 12 14 15 17 18 19 22 23
Result
> #mean(best_lam_glment_mc_male_m5)
> #mean(best_lam_glment_mc_male_m10)
> #mean(best_lam_glment_mc_male_m15)
> #mean(lam_mc_my_cv_lasso_male_m5)
> #mean(lam_mc_my_cv_lasso_male_m10)
> #mean(lam_mc_my_cv_lasso_male_m15)
> #mean(lammc_v_my_cv_lasso_male_m5)
> #mean(lammc_v_my_cv_lasso_male_m10)
> #mean(lammc_v_my_cv_lasso_male_m15)
> S2<- c (0.06802671 ,0.06917023 ,0.07191867 ,0.06902735 ,
+ 0.0701646 ,0.07374311 ,0.06998219 ,0.07005154 ,0.07265663 )
> mod_lasso_male<- glmnet (X2,Y2,alpha= 1 ,family= "gaussian" ,lambda= S2)
> pred_male<- predict.glmnet (mod_lasso_male,newx= X2,s= S2,type= "response" )
> mse_male<- apply (pred_male,2 , function (pred_male_col) mean ((pred_male_col- Y2)^ 2 ))
> mse_male
s1 s2 s3 s4 s5 s6 s7 s8
4.540997 4.541764 4.543278 4.541612 4.542379 4.543822 4.542234 4.542280
s9
4.543511
> min (mse_male)
[1 ] 4.540997
> pred_male_s<- predict.glmnet (mod_lasso_male,newx= X2,s= S2,type= "nonzero" )
> pred_male_s
s1 s2 s3 s4 s5 s6 s7 s8 s9
1 2 2 2 2 2 2 2 2 2
2 4 4 4 4 4 4 4 4 4
3 5 5 5 5 5 5 5 5 5
4 6 6 6 6 6 6 6 6 6
5 7 7 7 7 7 7 7 7 7
6 8 8 8 8 8 8 8 8 8
7 9 9 9 9 9 9 9 9 9
8 10 10 10 10 10 10 10 10 10
9 11 11 11 11 11 11 11 11 11
10 12 12 12 12 12 12 12 12 12
11 13 13 13 13 13 13 13 13 13
12 14 14 14 14 14 14 14 14 14
13 15 15 15 15 15 15 15 15 15
14 16 16 16 16 16 16 16 16 16
15 17 17 17 17 17 17 17 17 17
16 18 18 18 18 18 18 18 18 18
17 19 19 19 19 19 19 19 19 19
18 22 22 22 22 22 22 22 22 22
19 23 23 23 23 23 23 23 23 23
Sample Run
> set.seed (886 )
> res2<- Variant_My_cv_Lasso (X1,Y1,5 ,5 )
> res2
$ candidate_for_lambda
[1 ] 0.012487123 0.352385823 0.232502957 0.009957949 0.079516527 0.194752352
[7 ] 0.340769092 0.266880039 0.532727541 0.026055493 0.320029560 0.279370330
[13 ] 0.013437844 0.046183559 0.127120746 0.021967883 0.071861608 0.643606116
[19 ] 0.191531646 0.065656968 0.065922472 0.602106430 0.092080867 0.091706455
[25 ] 0.053757905 0.119769080 0.316442357 0.228761920 0.520901318 0.308747648
[31 ] 0.270080524 0.575462043 0.334027557 0.046376753 0.211811092 0.264450000
[37 ] 0.053003845 0.256836649 0.074206675 0.033404985 0.133465785 0.177114280
[43 ] 0.167761748 0.025464315 0.882541958 0.005598880 0.862175975 0.329636912
[49 ] 0.080211182 0.156022957 0.035986217 0.050723594 0.149527735 0.013479271
[55 ] 0.238313868 0.135953414 0.509925953 0.915590206 0.008865030 0.055820811
[61 ] 0.002430579 0.276009405 0.280531697 0.117340148 0.153570089 0.181194891
[67 ] 0.082919934 0.356803690 0.019943266 0.157916053 0.596500337 0.310484051
[73 ] 0.358506566 0.039563498 0.293180220 0.091518236 0.120260399 0.568856753
[79 ] 0.555350788 0.361090035 0.581617962 0.162488700 0.030624983 0.100611279
[85 ] 0.034662556 0.090623805 0.110446260 0.476125166 0.066091100 0.227380392
[91 ] 0.313598560 0.208455373 0.091880510 0.564742577 0.114848789 0.019729175
[97 ] 0.018121741 0.075361771 0.095357318 0.053773372
$ mean_loss
[1 ] 4.326719 4.363514 4.243704 4.351243 4.068727 4.188051 4.349598 4.276576
[9 ] 4.655676 4.213217 4.328811 4.289910 4.318009 4.118218 4.090955 4.242635
[17 ] 4.075828 4.916212 4.182904 4.075151 4.075051 4.811073 4.069328 4.069280
[25 ] 4.082273 4.085336 4.325474 4.234394 4.629235 4.318122 4.279497 4.748098
[33 ] 4.342493 4.115495 4.215000 4.274092 4.082477 4.266118 4.075015 4.166005
[41 ] 4.098110 4.158170 4.144667 4.217370 5.673885 4.393299 5.598849 4.338047
[49 ] 4.068708 4.128075 4.156527 4.095877 4.119152 4.317631 4.249079 4.101141
[57 ] 4.607759 5.800193 4.361797 4.079682 4.427682 4.286694 4.291051 4.083548
[65 ] 4.124766 4.165121 4.068681 4.368172 4.258867 4.130544 4.797565 4.319682
[73 ] 4.369993 4.141330 4.303859 4.069257 4.085837 4.733147 4.702570 4.372781
[81 ] 4.761939 4.136805 4.182365 4.075355 4.160067 4.069152 4.078239 4.545854
[89 ] 4.074988 4.233277 4.322496 4.210204 4.069302 4.723837 4.082141 4.259009
[97 ] 4.270222 4.074724 4.069788 4.082248
$ best_lambda_picked
[1 ] 0.08291993
> plot (log (res2$ candidate_for_lambda), res2$ mean_loss, xlab= "Log Lambda" ,
+ ylab= "Mean Squared Error" )
> abline (v= log (res2$ best_lambda_picked),col= "red" )
> plot (log (res2$ candidate_for_lambda), res2$ mean_loss, xlab= "Log Lambda" ,
+ ylab= "Mean Squared Error" )
> abline (v= log (res2$ best_lambda_picked),col= "red" )
Sample Run
Sample Run
Elapsed Time
Average Elapsed Time of 3 Functions over 10 Iterations with (m=10)
Female Set
0.0318s
3.6987s
4.954s
Male Set
0.0327s
3.715s
5.091s
Conslusion
There does not exist an “Optimal” \(\lambda\) or “Best” Lambda
The Monte Carlo Method shows that My_cv_Lasso and Variant_My_cv_Lasso can work at least as effectively as cv.glmnet from glmnet
It would take a significantly longer time to run My_cv_Lasso and Variant_My_cv_Lasso than cv.glmnet from glmnet
Reference
The dataset can be found at: https://www.kaggle.com/datasets/mexwell/body-measurements/data