Nicole Janny Salomons

Homework 5

Get the libraries and import the data:

library(glmnet)
train <- read.csv("digits_train.csv",as.is=TRUE)
valid <- read.csv("digits_valid.csv", as.is=TRUE)

train_x = train[,-1]
train_y = train[257]
valid_x = valid[,-1]
valid_y = valid[257]

Part 1

** Apply the lasso penalty to train a multinomial logistic regression classifier with type.multinomial=“grouped”. **

ty<- data.frame(train_y)
ty = factor(ty$digit)

m.lasso.grouped <- cv.glmnet(x = as.matrix(train_x) ,y = ty, family = c("multinomial"), type.multinomial = "grouped", alpha = 1)

Show a plot that displays the misclassification error on the y-axis for values of log(lambda) on the x-axis.

plot(m.lasso.grouped)

** 2) Report min and 1se**

lmin_lasso_grouped <- m.lasso.grouped$lambda.min
lmin_lasso_grouped
[1] 0.004828283
lse_lasso_grouped <- m.lasso.grouped$lambda.1se
lse_lasso_grouped
[1] 0.007005011

The minimum lambda was 0.004828283. And with one standard deviation its 0.007005011.

** Next, use both min and 1se to generate predictions on the validation set. **

pmin <- predict(m.lasso.grouped,newx = as.matrix(valid_x), s=lmin_lasso_grouped, type="class")
psd <- predict(m.lasso.grouped, newx = as.matrix(valid_x), s=lse_lasso_grouped, type="class")

** Report the resulting misclassification error rates.**

mean(pmin != valid_y)
[1] 0.03773585
mean(psd != valid_y)
[1] 0.03773585

The error rates for the minimum lamda were 3.77%, and with 1 standard deviation it was 3.77%.

** (3) For 1 standard deviation lambda, how many predictors are selected by glmnet()? How many non-zero coefficients are estimated in total for this value of lambda? **

#m.lasso.grouped$lambda.min
m.lasso.grouped$nzero[which(m.lasso.grouped$lambda == m.lasso.grouped$lambda.1se)]

The number of non-zero coeficients for lambda with 1 standard deviation is 128. And the number of predictors is also 128.

** 4) Repeat with ungrouped **

m.lasso.ungrouped <- cv.glmnet(x = as.matrix(train_x) ,y = ty, family = c("multinomial"), type.multinomial = "ungrouped", alpha = 1)
plot(m.lasso.ungrouped)
m.lasso.ungrouped <- cv.glmnet(x = as.matrix(train_x) ,y = ty, family = c("multinomial"), type.multinomial = "ungrouped", alpha = 1)
plot(m.lasso.ungrouped)

lmin_lasso_ungrouped <- m.lasso.ungrouped$lambda.min
lmin_lasso_ungrouped
[1] 0.0002767712
lse_lasso_ungrouped <- m.lasso.ungrouped$lambda.1se
lse_lasso_ungrouped
[1] 0.001779105
pmin_ung <- predict(m.lasso.ungrouped,newx = as.matrix(valid_x), s=lmin_lasso_ungrouped, type="class")
psd_ung <- predict(m.lasso.ungrouped, newx = as.matrix(valid_x), s=lse_lasso_ungrouped, type="class")
mean(pmin_ung != valid_y)
[1] 0.05974843
mean(psd_ung != valid_y)
[1] 0.07232704
m.lasso.ungrouped$nzero[which(m.lasso.ungrouped$lambda == m.lasso.ungrouped$lambda.1se)]
s50 
 36 

For ungrouped the minimum lambda was: 0.0002767712. And lambda with 1 standard deviation is: 0.001779105.

The misclassification for minimum lambda is: 5.97%.

The misclassification for 1 standard deviation is: 7.23%.

With one standard deviation the number of non-zero coefficients is 36.

With one standard deviation the number of predictors is also 36.

Part 2

** 1)Apply the ridge penalty to train a multinomial logistic regression classifier. **

m.ridge <- cv.glmnet(x = as.matrix(train_fold) ,y = train_fold_y, family = c("multinomial"), alpha = 0)

** Show a plot that displays the misclassification error on the y-axis for values of log(lambda) on the x-axis. **

plot(m.ridge)

** (2) Report min and 1se and their associated misclassification errors for the validation set. **

lmin_ridge <- m.ridge$lambda.min
lmin_ridge
[1] 0.01937868
lse_ridge <- m.ridge$lambda.1se
lse_ridge
[1] 0.02811515
pmin_ridge <- predict(m.ridge,newx = as.matrix(valid_x), s=lmin_ridge, type="class")
psd_ridge <- predict(m.ridge, newx = as.matrix(valid_x), s=lse_ridge, type="class")
mean(pmin_ridge != valid_y)
[1] 0.08176101
mean(psd_ridge != valid_y)
[1] 0.08805031

For ungrouped the minimum lambda was: 0.01937868. And lambda with 1 standard deviation is: 0.02811515.

The misclassification for minimum lambda is: 8.18%.

The misclassification for 1 standard deviation is: 8.81%.

Part 3

** In your own words, how do these 3 approaches differ? **

The grouped lasso approach will create the most simple model in my opinion, using only a subset of the predictors, and using the same predictors for all the classification. On the other hand, the ungrouped lasso approach will use different subsets of predictors creating a more detailed model. Both lasso approaches are similar that they do not use all of the predictors. And Ridge uses all of the predictors, but has some of the weights be very close to 0.

** Compare the performance of ridge, lasso, and your best approach from Homework 3 in prediction accuracy on the validation set. **

I will be using the best performance in prediction accuracy for each:

The best performance seems to come from lasso grouped.

** If you had to pick a favorite final model, what would you choose? Justify your answer. **

My favourite final model in this case is lasso grouped. It has the highest accuracy, and although it drops some of the predictors making it harder to interpret, at least it uses the same predictors accross groups.

DQotLS0NCnRpdGxlOiAiSG9tZXdvcmsgNSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tIA0KDQojTmljb2xlIEphbm55IFNhbG9tb25zDQojIyBIb21ld29yayA1IA0KDQpHZXQgdGhlIGxpYnJhcmllcyBhbmQgaW1wb3J0IHRoZSBkYXRhOg0KDQpgYGB7cn0NCmxpYnJhcnkoZ2xtbmV0KQ0KdHJhaW4gPC0gcmVhZC5jc3YoImRpZ2l0c190cmFpbi5jc3YiLGFzLmlzPVRSVUUpDQp2YWxpZCA8LSByZWFkLmNzdigiZGlnaXRzX3ZhbGlkLmNzdiIsIGFzLmlzPVRSVUUpDQoNCnRyYWluX3ggPSB0cmFpblssLTFdDQp0cmFpbl95ID0gdHJhaW5bMjU3XQ0KdmFsaWRfeCA9IHZhbGlkWywtMV0NCnZhbGlkX3kgPSB2YWxpZFsyNTddDQpgYGANCg0KI1BhcnQgMQ0KDQoqKiBBcHBseSB0aGUgbGFzc28gcGVuYWx0eSB0byB0cmFpbiBhIG11bHRpbm9taWFsIGxvZ2lzdGljIHJlZ3Jlc3Npb24gY2xhc3NpZmllciB3aXRoIHR5cGUubXVsdGlub21pYWw9Imdyb3VwZWQiLiAqKg0KDQpgYGB7cn0NCnR5PC0gZGF0YS5mcmFtZSh0cmFpbl95KQ0KdHkgPSBmYWN0b3IodHkkZGlnaXQpDQoNCm0ubGFzc28uZ3JvdXBlZCA8LSBjdi5nbG1uZXQoeCA9IGFzLm1hdHJpeCh0cmFpbl94KSAseSA9IHR5LCBmYW1pbHkgPSBjKCJtdWx0aW5vbWlhbCIpLCB0eXBlLm11bHRpbm9taWFsID0gImdyb3VwZWQiLCBhbHBoYSA9IDEpDQoNCg0KYGBgDQoNCioqU2hvdyBhIHBsb3QgdGhhdCBkaXNwbGF5cyB0aGUgbWlzY2xhc3NpZmljYXRpb24gZXJyb3Igb24gdGhlIHktYXhpcyBmb3IgdmFsdWVzIG9mIGxvZyhsYW1iZGEpIG9uIHRoZSB4LWF4aXMuICoqDQoNCmBgYHtyfQ0KcGxvdChtLmxhc3NvLmdyb3VwZWQpDQpgYGANCg0KKiogMikgUmVwb3J0IG1pbiBhbmQgMXNlKioNCmBgYHtyfQ0KbG1pbl9sYXNzb19ncm91cGVkIDwtIG0ubGFzc28uZ3JvdXBlZCRsYW1iZGEubWluDQpsbWluX2xhc3NvX2dyb3VwZWQNCmxzZV9sYXNzb19ncm91cGVkIDwtIG0ubGFzc28uZ3JvdXBlZCRsYW1iZGEuMXNlDQpsc2VfbGFzc29fZ3JvdXBlZA0KYGBgDQoNClRoZSBtaW5pbXVtIGxhbWJkYSB3YXMgMC4wMDQ4MjgyODMuIEFuZCB3aXRoIG9uZSBzdGFuZGFyZCBkZXZpYXRpb24gaXRzIDAuMDA3MDA1MDExLg0KDQoqKiBOZXh0LCB1c2UgYm90aCBtaW4gYW5kIDFzZSB0byBnZW5lcmF0ZSBwcmVkaWN0aW9ucyBvbiB0aGUgdmFsaWRhdGlvbiBzZXQuICoqDQpgYGB7cn0NCnBtaW4gPC0gcHJlZGljdChtLmxhc3NvLmdyb3VwZWQsbmV3eCA9IGFzLm1hdHJpeCh2YWxpZF94KSwgcz1sbWluX2xhc3NvX2dyb3VwZWQsIHR5cGU9ImNsYXNzIikNCnBzZCA8LSBwcmVkaWN0KG0ubGFzc28uZ3JvdXBlZCwgbmV3eCA9IGFzLm1hdHJpeCh2YWxpZF94KSwgcz1sc2VfbGFzc29fZ3JvdXBlZCwgdHlwZT0iY2xhc3MiKQ0KYGBgDQoNCg0KKiogUmVwb3J0IHRoZSByZXN1bHRpbmcgbWlzY2xhc3NpZmljYXRpb24gZXJyb3IgcmF0ZXMuKioNCg0KYGBge3J9DQptZWFuKHBtaW4gIT0gdmFsaWRfeSkNCm1lYW4ocHNkICE9IHZhbGlkX3kpDQpgYGANCg0KVGhlIGVycm9yIHJhdGVzIGZvciB0aGUgbWluaW11bSBsYW1kYSB3ZXJlIDMuNzclLCBhbmQgd2l0aCAxIHN0YW5kYXJkIGRldmlhdGlvbiBpdCB3YXMgMy43NyUuDQoNCioqICgzKSBGb3IgMSBzdGFuZGFyZCBkZXZpYXRpb24gbGFtYmRhLCBob3cgbWFueSBwcmVkaWN0b3JzIGFyZSBzZWxlY3RlZCBieSBnbG1uZXQoKT8gSG93IG1hbnkgbm9uLXplcm8gY29lZmZpY2llbnRzIGFyZSBlc3RpbWF0ZWQgaW4gdG90YWwgZm9yIHRoaXMgdmFsdWUgb2YgbGFtYmRhPyAqKg0KDQpgYGB7cn0NCiNtLmxhc3NvLmdyb3VwZWQkbGFtYmRhLm1pbg0KbS5sYXNzby5ncm91cGVkJG56ZXJvW3doaWNoKG0ubGFzc28uZ3JvdXBlZCRsYW1iZGEgPT0gbS5sYXNzby5ncm91cGVkJGxhbWJkYS4xc2UpXQ0KYGBgDQoNClRoZSBudW1iZXIgb2Ygbm9uLXplcm8gY29lZmljaWVudHMgZm9yIGxhbWJkYSB3aXRoIDEgc3RhbmRhcmQgZGV2aWF0aW9uIGlzIDEyOC4gQW5kIHRoZSBudW1iZXIgb2YgcHJlZGljdG9ycyBpcyBhbHNvIDEyOC4NCg0KKiogNCkgUmVwZWF0IHdpdGggdW5ncm91cGVkICoqDQoNCmBgYHtyfQ0KbS5sYXNzby51bmdyb3VwZWQgPC0gY3YuZ2xtbmV0KHggPSBhcy5tYXRyaXgodHJhaW5feCkgLHkgPSB0eSwgZmFtaWx5ID0gYygibXVsdGlub21pYWwiKSwgdHlwZS5tdWx0aW5vbWlhbCA9ICJ1bmdyb3VwZWQiLCBhbHBoYSA9IDEpDQpgYGANCmBgYHtyfQ0KcGxvdChtLmxhc3NvLnVuZ3JvdXBlZCkNCmBgYA0KDQpgYGB7cn0NCg0KbG1pbl9sYXNzb191bmdyb3VwZWQgPC0gbS5sYXNzby51bmdyb3VwZWQkbGFtYmRhLm1pbg0KbG1pbl9sYXNzb191bmdyb3VwZWQNCmxzZV9sYXNzb191bmdyb3VwZWQgPC0gbS5sYXNzby51bmdyb3VwZWQkbGFtYmRhLjFzZQ0KbHNlX2xhc3NvX3VuZ3JvdXBlZA0KDQpwbWluX3VuZyA8LSBwcmVkaWN0KG0ubGFzc28udW5ncm91cGVkLG5ld3ggPSBhcy5tYXRyaXgodmFsaWRfeCksIHM9bG1pbl9sYXNzb191bmdyb3VwZWQsIHR5cGU9ImNsYXNzIikNCnBzZF91bmcgPC0gcHJlZGljdChtLmxhc3NvLnVuZ3JvdXBlZCwgbmV3eCA9IGFzLm1hdHJpeCh2YWxpZF94KSwgcz1sc2VfbGFzc29fdW5ncm91cGVkLCB0eXBlPSJjbGFzcyIpDQoNCm1lYW4ocG1pbl91bmcgIT0gdmFsaWRfeSkNCm1lYW4ocHNkX3VuZyAhPSB2YWxpZF95KQ0KDQptLmxhc3NvLnVuZ3JvdXBlZCRuemVyb1t3aGljaChtLmxhc3NvLnVuZ3JvdXBlZCRsYW1iZGEgPT0gbS5sYXNzby51bmdyb3VwZWQkbGFtYmRhLjFzZSldDQpgYGANCg0KRm9yIHVuZ3JvdXBlZCB0aGUgbWluaW11bSBsYW1iZGEgd2FzOiAwLjAwMDI3Njc3MTIuIEFuZCBsYW1iZGEgd2l0aCAxIHN0YW5kYXJkIGRldmlhdGlvbiBpczogMC4wMDE3NzkxMDUuDQoNClRoZSBtaXNjbGFzc2lmaWNhdGlvbiBmb3IgbWluaW11bSBsYW1iZGEgaXM6IDUuOTclLg0KDQpUaGUgbWlzY2xhc3NpZmljYXRpb24gZm9yIDEgc3RhbmRhcmQgZGV2aWF0aW9uIGlzOiA3LjIzJS4NCg0KV2l0aCBvbmUgc3RhbmRhcmQgZGV2aWF0aW9uIHRoZSBudW1iZXIgb2Ygbm9uLXplcm8gY29lZmZpY2llbnRzIGlzIDM2LiANCg0KV2l0aCBvbmUgc3RhbmRhcmQgZGV2aWF0aW9uIHRoZSBudW1iZXIgb2YgcHJlZGljdG9ycyBpcyBhbHNvIDM2LiANCg0KDQojUGFydCAyDQoNCioqIDEpQXBwbHkgdGhlIHJpZGdlIHBlbmFsdHkgdG8gdHJhaW4gYSBtdWx0aW5vbWlhbCBsb2dpc3RpYyByZWdyZXNzaW9uIGNsYXNzaWZpZXIuICoqDQoNCmBgYHtyfQ0KbS5yaWRnZSA8LSBjdi5nbG1uZXQoeCA9IGFzLm1hdHJpeCh0cmFpbl9mb2xkKSAseSA9IHRyYWluX2ZvbGRfeSwgZmFtaWx5ID0gYygibXVsdGlub21pYWwiKSwgYWxwaGEgPSAwKQ0KYGBgDQoNCioqIFNob3cgYSBwbG90IHRoYXQgZGlzcGxheXMgdGhlIG1pc2NsYXNzaWZpY2F0aW9uIGVycm9yIG9uIHRoZSB5LWF4aXMgZm9yIHZhbHVlcyBvZiBsb2cobGFtYmRhKSBvbiB0aGUgeC1heGlzLiAqKg0KDQpgYGB7cn0NCnBsb3QobS5yaWRnZSkNCmBgYA0KDQoNCioqICgyKSBSZXBvcnQgbWluIGFuZCAxc2UgYW5kIHRoZWlyIGFzc29jaWF0ZWQgbWlzY2xhc3NpZmljYXRpb24gZXJyb3JzIGZvciB0aGUgdmFsaWRhdGlvbiBzZXQuICoqDQoNCmBgYHtyfQ0KbG1pbl9yaWRnZSA8LSBtLnJpZGdlJGxhbWJkYS5taW4NCmxtaW5fcmlkZ2UNCmxzZV9yaWRnZSA8LSBtLnJpZGdlJGxhbWJkYS4xc2UNCmxzZV9yaWRnZQ0KDQpwbWluX3JpZGdlIDwtIHByZWRpY3QobS5yaWRnZSxuZXd4ID0gYXMubWF0cml4KHZhbGlkX3gpLCBzPWxtaW5fcmlkZ2UsIHR5cGU9ImNsYXNzIikNCnBzZF9yaWRnZSA8LSBwcmVkaWN0KG0ucmlkZ2UsIG5ld3ggPSBhcy5tYXRyaXgodmFsaWRfeCksIHM9bHNlX3JpZGdlLCB0eXBlPSJjbGFzcyIpDQoNCm1lYW4ocG1pbl9yaWRnZSAhPSB2YWxpZF95KQ0KbWVhbihwc2RfcmlkZ2UgIT0gdmFsaWRfeSkNCmBgYA0KDQpGb3IgdW5ncm91cGVkIHRoZSBtaW5pbXVtIGxhbWJkYSB3YXM6IDAuMDE5Mzc4NjguIEFuZCBsYW1iZGEgd2l0aCAxIHN0YW5kYXJkIGRldmlhdGlvbiBpczogMC4wMjgxMTUxNS4NCg0KVGhlIG1pc2NsYXNzaWZpY2F0aW9uIGZvciBtaW5pbXVtIGxhbWJkYSBpczogOC4xOCUuDQoNClRoZSBtaXNjbGFzc2lmaWNhdGlvbiBmb3IgMSBzdGFuZGFyZCBkZXZpYXRpb24gaXM6IDguODElLg0KDQojUGFydCAzDQoNCioqIEluIHlvdXIgb3duIHdvcmRzLCBob3cgZG8gdGhlc2UgMyBhcHByb2FjaGVzIGRpZmZlcj8gKioNCg0KVGhlIGdyb3VwZWQgbGFzc28gYXBwcm9hY2ggd2lsbCBjcmVhdGUgdGhlIG1vc3Qgc2ltcGxlIG1vZGVsIGluIG15IG9waW5pb24sIHVzaW5nIG9ubHkgYSBzdWJzZXQgb2YgdGhlIHByZWRpY3RvcnMsIGFuZCB1c2luZyB0aGUgc2FtZSBwcmVkaWN0b3JzIGZvciBhbGwgdGhlIGNsYXNzaWZpY2F0aW9uLiBPbiB0aGUgb3RoZXIgaGFuZCwgdGhlIHVuZ3JvdXBlZCBsYXNzbyBhcHByb2FjaCB3aWxsIHVzZSBkaWZmZXJlbnQgc3Vic2V0cyBvZiBwcmVkaWN0b3JzIGNyZWF0aW5nIGEgbW9yZSBkZXRhaWxlZCBtb2RlbC4gQm90aCBsYXNzbyBhcHByb2FjaGVzIGFyZSBzaW1pbGFyIHRoYXQgdGhleSBkbyBub3QgdXNlIGFsbCBvZiB0aGUgcHJlZGljdG9ycy4gQW5kIFJpZGdlIHVzZXMgYWxsIG9mIHRoZSBwcmVkaWN0b3JzLCBidXQgaGFzIHNvbWUgb2YgdGhlIHdlaWdodHMgYmUgdmVyeSBjbG9zZSB0byAwLg0KDQoqKiBDb21wYXJlIHRoZSBwZXJmb3JtYW5jZSBvZiByaWRnZSwgbGFzc28sIGFuZCB5b3VyIGJlc3QgYXBwcm9hY2ggZnJvbSBIb21ld29yayAzIGluIHByZWRpY3Rpb24gYWNjdXJhY3kgb24gdGhlIHZhbGlkYXRpb24gc2V0LiAqKg0KDQpJIHdpbGwgYmUgdXNpbmcgdGhlIGJlc3QgcGVyZm9ybWFuY2UgaW4gcHJlZGljdGlvbiBhY2N1cmFjeSBmb3IgZWFjaDoNCg0KKiBQZXJmb3JtYW5jZSBmb3IgbGFzc28gKHVuZ3JvdXBlZCk6IDk0LjAzJQ0KDQoqIFBlcmZvcm1hbmNlIGZvciBsYXNzbyAoZ3JvdXBlZCk6IDk2LjIzJQ0KDQoqIFBlcmZvcm1hbmNlIGZvciByaWRnZTogOTEuODIlDQoNCiogVGhlIGJlc3QgYXBwcm9hY2ggZnJvbSBob21ld29yayAoS05OKTogODcuNDIlDQoNClRoZSBiZXN0IHBlcmZvcm1hbmNlIHNlZW1zIHRvIGNvbWUgZnJvbSBsYXNzbyBncm91cGVkLg0KDQoqKiBJZiB5b3UgaGFkIHRvIHBpY2sgYSBmYXZvcml0ZSBmaW5hbCBtb2RlbCwgd2hhdCB3b3VsZCB5b3UgY2hvb3NlPyBKdXN0aWZ5IHlvdXIgYW5zd2VyLiAqKg0KDQpNeSBmYXZvdXJpdGUgZmluYWwgbW9kZWwgaW4gdGhpcyBjYXNlIGlzIGxhc3NvIGdyb3VwZWQuIEl0IGhhcyB0aGUgaGlnaGVzdCBhY2N1cmFjeSwgYW5kIGFsdGhvdWdoIGl0IGRyb3BzIHNvbWUgb2YgdGhlIHByZWRpY3RvcnMgbWFraW5nIGl0IGhhcmRlciB0byBpbnRlcnByZXQsIGF0IGxlYXN0IGl0IHVzZXMgdGhlIHNhbWUgcHJlZGljdG9ycyBhY2Nyb3NzIGdyb3Vwcy4NCg0KDQoNCg0KDQoNCg0KDQo=