Regularization is a set of methods that manage the bias-variance trade-off problem in linear regression.

Whereas OLS estimates the linear model coefficients by minimizing the loss function

\[L = \sum_{i = 1}^n \left(y_i - x_i^{'} \hat\beta \right)^2\]

ridge regression minimizes the loss function

\[L = \sum_{i = 1}^n \left(y_i - x_i^{'} \hat\beta \right)^2 + \lambda \sum_{j=1}^k \hat{\beta}_j^2,\]

lasso regression minimizes the loss function

\[L = \sum_{i = 1}^n \left(y_i - x_i^{'} \hat\beta \right)^2 + \lambda \sum_{j=1}^k \left| \hat{\beta}_j \right|,\]

and elastic net minimizes the loss function

\[L = \frac{\sum_{i = 1}^n \left(y_i - x_i^{'} \hat\beta \right)^2}{2n} + \lambda \frac{1 - \alpha}{2}\sum_{j=1}^k \hat{\beta}_j^2 + \lambda \alpha\left| \hat{\beta}_j \right|.\]

The best model was elastic net. The model run on the full training data set had a holdout set accuracy of 0.8305.

After fitting to the full training data set, the performance on the testing data set on kaggle was 0.78468 accuracy - a half percent worse than the straight logistic regression 0.78947 accuracy.

Setup

library(tidyverse)
library(caret)
library(recipes)
library(plotROC)
library(precrec)

The initial data management created the data set full, with training rows indexed by train_index. I added another predictor variables in the exploratory analysis and split the data into training and testing, then 80:20 split training into training_80 for training and training_20 to compare models.

load("./titanic_02.RData")

I’ll use 10-fold CV.

train_control <- trainControl(
  method = "cv", number = 10,
  savePredictions = "final",
  classProbs = TRUE
)

I’ll try three models: ridge, lasso, and elastic net.

Model

My model data set variables are PassengerID, Survived, and 13 predictors.

mdl_vars <- c("PassengerId", mdl_vars)
mdl_vars
##  [1] "PassengerId"   "Survived"      "Pclass"        "Sex"          
##  [5] "Age"           "SibSp"         "Parch"         "Embarked"     
##  [9] "TicketN"       "FarePerPass"   "Employee"      "Deck"         
## [13] "AgeCohort"     "TicketNCohort" "NetSurv"

I’ll use the recipe method to train. From the exploratory analysis section I’ve decided to create interactions Pclass*Sex, Embarked*Sex, TicketN:TicketNCohort, and Age*AgeCohort*Sex. My recipe does not center and scale the predictors because the glmnet() algorithm will do it automatically.

rcpe <- recipe(Survived ~ ., data = training_80[, mdl_vars]) %>%
  update_role(PassengerId, new_role = "id variable") %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  # model will center/scale - don't do it here
  # step_center(Age, SibSp, Parch, TicketN, FarePerPass, NetSurv) %>%
  # step_scale(Age, SibSp, Parch, TicketN, FarePerPass, NetSurv) %>%
  step_interact(terms = ~
                  starts_with("Sex_"):starts_with("Pclass_") +
                  starts_with("Sex_"):starts_with("Embarked_") +
                  TicketN:starts_with("TicketNCohort_") +
                  Age:starts_with("AgeCohort_"):starts_with("Sex_")) 

prep(rcpe, training = training_80)
## Data Recipe
## 
## Inputs:
## 
##         role #variables
##  id variable          1
##      outcome          1
##    predictor         13
## 
## Training data contained 714 data points and no missing data.
## 
## Operations:
## 
## Dummy variables from Pclass, Sex, Embarked, Employee, Deck, ... [trained]
## Interactions with Sex_female:(Pclass_X2 + Pclass_X3) + Sex_female:(Embarked_Q + Embarked_S) + TicketN:TicketNCohort_gt4 + Age:AgeCohort_gt10:Sex_female [trained]

Ridge

The ridge model had a holdout set accuracy of 0.8249, sensitivity of 0.6324, specificity of 0.9450, and AUC of 0.8768.

Fit the ridge model by specifying tuning parameter alpha = 0 (meaning percentage mix between ridge and lasso as 0% lasso). I set up the tuning grid for lambda with a little trial and error. Here is the fit summary.

set.seed(1970)
mdl_ridge <- train(
  rcpe,
  training_80[, mdl_vars],
  method = "glmnet",
  family = "binomial",
  tuneGrid = expand.grid(
    .alpha = 0,  # optimize a ridge regression
    .lambda = seq(0, 2, length.out = 101)
  ),
  trControl = train_control,
  metric = "Accuracy"
)
mdl_ridge$bestTune
##   alpha lambda
## 3     0   0.04

The best tuning parameter is lambda = 0.04, so almost no penalty on the coefficient estimator sizes. You can see the relationship between accuracy and lambda in the plot.

ggplot(mdl_ridge) +
  labs(title = "Ridge Regression Parameter Tuning", x = "lambda")

varImp() ranks the predictors by the absolute value of the coefficients in the tuned model. The most important variable here was AgeCohort. In the straight logistic regression model, it was NetSurv, followed by Pclass.

varImp(mdl_ridge)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 25)
## 
##                         Overall
## AgeCohort_gt10          100.000
## Sex_female               97.300
## Sex_female_x_Pclass_X2   92.823
## Sex_female_x_Embarked_Q  89.908
## NetSurv                  68.207
## Pclass_X3                57.600
## TicketNCohort_gt4        54.158
## Employee_X1              47.141
## Sex_female_x_Pclass_X3   39.001
## Pclass_X2                35.388
## Embarked_S               34.401
## Embarked_Q               27.500
## Sex_female_x_Embarked_S  21.611
## Deck_C                   17.252
## Deck_D                   16.094
## SibSp                    13.750
## Deck_B                   11.116
## TicketN                   9.646
## Deck_E                    4.226
## Deck_F                    4.147

Resampling Performance

The accuracy from the confusion matrix is 0.8389.

confusionMatrix(mdl_ridge)
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction   No  Yes
##        No  57.3 11.8
##        Yes  4.3 26.6
##                             
##  Accuracy (average) : 0.8389

Holdout Performance

Here is the model performance on the holdout data set.

predicted_classes <- predict(mdl_ridge, newdata = training_20) 
predicted_probs <- predict(mdl_ridge, newdata = training_20, type = "prob")
confusionMatrix(predicted_classes, training_20$Survived, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  103  25
##        Yes   6  43
##                                           
##                Accuracy : 0.8249          
##                  95% CI : (0.7607, 0.8778)
##     No Information Rate : 0.6158          
##     P-Value [Acc > NIR] : 1.304e-09       
##                                           
##                   Kappa : 0.6093          
##                                           
##  Mcnemar's Test P-Value : 0.001225        
##                                           
##             Sensitivity : 0.6324          
##             Specificity : 0.9450          
##          Pos Pred Value : 0.8776          
##          Neg Pred Value : 0.8047          
##              Prevalence : 0.3842          
##          Detection Rate : 0.2429          
##    Detection Prevalence : 0.2768          
##       Balanced Accuracy : 0.7887          
##                                           
##        'Positive' Class : Yes             
## 

The sensitivity is 0.6324 and the specificity is 0.9450, so the model is prone to under-predicting survivors. The accuracy from the confusion matrix is 0.8249. precrec::evalmod() will calculate the confusion matrix values from the model using the holdout data set. The AUC on the holdout set is 0.8768.

mdl_ridge_preds <- predict(mdl_ridge, newdata = training_20, type = "prob")
(mdl_ridge_eval <- evalmod(
  scores = mdl_ridge_preds$Yes,
  labels = training_20$Survived
))
## 
##     === AUCs ===
## 
##      Model name Dataset ID Curve type       AUC
##    1         m1          1        ROC 0.8768214
##    2         m1          1        PRC 0.8490097
## 
## 
##     === Input data ===
## 
##      Model name Dataset ID # of negatives # of positives
##    1         m1          1            109             68
options(yardstick.event_first = FALSE)  # set the second level as success
data.frame(
  pred = mdl_ridge_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::roc_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Ridge Model ROC Curve, Test Data",
    subtitle = "AUC = 0.8783"
  )

The gain curve plots the cumulative summed true outcome versus the fraction of items seen when sorted by the predicted value. The “wizard” curve is the gain curve when the data is sorted by the true outcome. If the model’s gain curve is close to the wizard curve, then the model predicted the response variable well. The gray area is the “gain” over a random prediction.

68 of the 177 passengers in the holdout set survived.

  • The gain curve encountered 34 survivors (50%) within the first 37 observations (21%).

  • It encountered the 68th survivor on the 163th observation (92%).

  • The bottom of the gray area is the outcome of a random model. Only half the survivors would be observed within 50% of the observations. The top of the gray area is the outcome of the perfect model, the “wizard curve”. Half the survivors would be observed in 68/177=38% of the observations.

data.frame(
  pred = mdl_ridge_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::gain_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Ridge Model Gain Curve on Holdout Set"
  )

Lasso

The lasso model had a holdout set accuracy of 0.8305 - a little better than ridge’s 0.8249, sensitivity of 0.6618, specificity of 0.9358, and AUC of 0.8651.

Fit the lasso model by specifying tuning parameter alpha = 1 (meaning percentage mix between ridge and lasso as 100% lasso). I set up the tuning grid for lambda with a little trial and error. Here is the fit summary.

set.seed(1970)
mdl_lasso <- train(
  rcpe,
  training_80[, mdl_vars],
  method = "glmnet",
  family = "binomial",
  tuneGrid = expand.grid(
    .alpha = 1,  # optimize a lasso regression
    .lambda = seq(0, 1, length.out = 101)
  ),
  trControl = train_control,
  metric = "Accuracy"
)
mdl_lasso$bestTune
##   alpha lambda
## 3     1   0.02

The best tuning parameter is lambda = 0.02, so again almost no penalty on the coefficient estimator sizes. You can see the relationship between accuracy and lambda in the plot.

ggplot(mdl_lasso) +
  labs(title = "Lasso Regression Parameter Tuning", x = "lambda")

The most important variable was Sex, then relative importance drops quickly. Second place goes to NetSurv. In ridge, second place was AgeCohort, then Sex. Lasso looks more like the straight logistic regression model.

varImp(mdl_lasso)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 25)
## 
##                                    Overall
## Sex_female                        100.0000
## NetSurv                            61.5683
## AgeCohort_gt10                     52.8553
## Sex_female_x_Pclass_X2             38.7697
## Pclass_X3                          38.1470
## TicketNCohort_gt4                  16.3332
## Embarked_S                         14.2456
## Sex_female_x_Pclass_X3             13.5522
## Sex_female_x_Embarked_Q            11.2780
## Sex_female_x_Age_x_AgeCohort_gt10   2.0565
## SibSp                               1.1662
## FarePerPass                         0.9601
## Age                                 0.8307
## Deck_E                              0.0000
## Deck_C                              0.0000
## Pclass_X2                           0.0000
## Sex_female_x_Embarked_S             0.0000
## Employee_X1                         0.0000
## Deck_B                              0.0000
## Deck_F                              0.0000

Resampling Performance

The accuracy from the confusion matrix is 0.8375 - a little worse than ridge’s 0.8389.

confusionMatrix(mdl_lasso)
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction   No  Yes
##        No  56.2 10.8
##        Yes  5.5 27.6
##                             
##  Accuracy (average) : 0.8375

Holdout Performance

Here is the model performance on the holdout data set.

predicted_classes <- predict(mdl_lasso, newdata = training_20) 
predicted_probs <- predict(mdl_lasso, newdata = training_20, type = "prob")
confusionMatrix(predicted_classes, training_20$Survived, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  102  23
##        Yes   7  45
##                                          
##                Accuracy : 0.8305         
##                  95% CI : (0.767, 0.8826)
##     No Information Rate : 0.6158         
##     P-Value [Acc > NIR] : 4.325e-10      
##                                          
##                   Kappa : 0.6252         
##                                          
##  Mcnemar's Test P-Value : 0.00617        
##                                          
##             Sensitivity : 0.6618         
##             Specificity : 0.9358         
##          Pos Pred Value : 0.8654         
##          Neg Pred Value : 0.8160         
##              Prevalence : 0.3842         
##          Detection Rate : 0.2542         
##    Detection Prevalence : 0.2938         
##       Balanced Accuracy : 0.7988         
##                                          
##        'Positive' Class : Yes            
## 

The sensitivity is 0.6618 and the specificity is 0.9358. The accuracy is 0.8305 - a little better than ridge’s 0.8249. The AUC on the holdout set is 0.8651.

mdl_lasso_preds <- predict(mdl_lasso, newdata = training_20, type = "prob")
(mdl_lasso_eval <- evalmod(
  scores = mdl_lasso_preds$Yes,
  labels = training_20$Survived
))
## 
##     === AUCs ===
## 
##      Model name Dataset ID Curve type       AUC
##    1         m1          1        ROC 0.8651511
##    2         m1          1        PRC 0.8394165
## 
## 
##     === Input data ===
## 
##      Model name Dataset ID # of negatives # of positives
##    1         m1          1            109             68
options(yardstick.event_first = FALSE)  # set the second level as success
data.frame(
  pred = mdl_lasso_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::roc_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Lasso Model ROC Curve, Test Data",
    subtitle = "AUC = 0.8522"
  )

Here is the gain curve.

68 of the 177 passengers in the holdout set survived.

  • The gain curve encountered 34 survivors (50%) within the first 37 observations (21%).

  • It encountered the 68th survivor on the 159th observation (89%).

data.frame(
  pred = mdl_lasso_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::gain_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Lasso Model Gain Curve on Holdout Set"
  )

Elastic Net

The elastic net model had a holdout set accuracy of 0.8249 - same as ridge, sensitivity of 0.6324, specificity of 0.9450, and AUC of 0.8783.

Fit the elastic net model by tuning both lambda and alpha. I set up the tuning grid with by trial and error. Here is the fit summary.

set.seed(1970)
mdl_elnet <- train(
  rcpe,
  training_80[, mdl_vars],
  method = "glmnet",
  family = "binomial",
  tuneGrid = expand.grid(
    .alpha = seq(0, 1, length.out = 11),  # optimize an elastic net regression
    .lambda = seq(0, 1, length.out = 101)
  ),
  trControl = train_control,
  metric = "Accuracy"
)
mdl_elnet$bestTune
##   alpha lambda
## 4     0   0.03

The best tuning parameter is alpha = 0 and lambda = 0.03, so no mixing at all (100% ridge) and almost no penalty on the coefficient estimator sizes.

ggplot(mdl_elnet) +
  labs(title = "Elastic Net Regression Parameter Tuning", x = "lambda")
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 11. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 505 rows containing missing values (geom_point).

This might be a better way of looking at it. The AUC is maximized with very little regularization. As regularization increases, the ridge model performs relatively well compared to lasso.

glmnPlot <- plot(mdl_elnet,
                 plotType = "level",
                 cuts = 15,
                 scales = list(x = list(rot = 90, cex = .65)))
update(glmnPlot,
       ylab = "Mixing Percentage\nRidge <---------> Lasso",
       sub = "",
       main = "Area Under the ROC Curve",
       xlab = "Amount of Regularization")

The most important variable was AgeCohort again.

varImp(mdl_elnet)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 25)
## 
##                         Overall
## AgeCohort_gt10          100.000
## Sex_female               94.977
## Sex_female_x_Embarked_Q  88.167
## Sex_female_x_Pclass_X2   87.773
## NetSurv                  65.668
## Pclass_X3                56.825
## TicketNCohort_gt4        56.814
## Employee_X1              45.839
## Sex_female_x_Pclass_X3   43.680
## Pclass_X2                40.108
## Embarked_S               31.662
## Embarked_Q               29.810
## Deck_C                   18.212
## Sex_female_x_Embarked_S  16.876
## Deck_D                   15.659
## SibSp                    14.677
## TicketN                  10.504
## Deck_B                    9.586
## Deck_E                    4.770
## Deck_F                    3.770

Resampling Performance

The accuracy from the confusion matrix is 0.8403 - a little better than ridge.

confusionMatrix(mdl_elnet)
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction   No  Yes
##        No  57.3 11.6
##        Yes  4.3 26.8
##                             
##  Accuracy (average) : 0.8403

Holdout Performance

Here is the model performance on the holdout data set.

predicted_classes <- predict(mdl_elnet, newdata = training_20) 
predicted_probs <- predict(mdl_elnet, newdata = training_20, type = "prob")
confusionMatrix(predicted_classes, training_20$Survived, positive = "Yes")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  103  25
##        Yes   6  43
##                                           
##                Accuracy : 0.8249          
##                  95% CI : (0.7607, 0.8778)
##     No Information Rate : 0.6158          
##     P-Value [Acc > NIR] : 1.304e-09       
##                                           
##                   Kappa : 0.6093          
##                                           
##  Mcnemar's Test P-Value : 0.001225        
##                                           
##             Sensitivity : 0.6324          
##             Specificity : 0.9450          
##          Pos Pred Value : 0.8776          
##          Neg Pred Value : 0.8047          
##              Prevalence : 0.3842          
##          Detection Rate : 0.2429          
##    Detection Prevalence : 0.2768          
##       Balanced Accuracy : 0.7887          
##                                           
##        'Positive' Class : Yes             
## 

The sensitivity is 0.6324 and the specificity is 0.9450. The accuracy is 0.8249 - same as ridge. The AUC on the holdout set is 0.8783.

mdl_elnet_preds <- predict(mdl_elnet, newdata = training_20, type = "prob")
(mdl_elnet_eval <- evalmod(
  scores = mdl_elnet_preds$Yes,
  labels = training_20$Survived
))
## 
##     === AUCs ===
## 
##      Model name Dataset ID Curve type       AUC
##    1         m1          1        ROC 0.8783055
##    2         m1          1        PRC 0.8510103
## 
## 
##     === Input data ===
## 
##      Model name Dataset ID # of negatives # of positives
##    1         m1          1            109             68
options(yardstick.event_first = FALSE)  # set the second level as success
data.frame(
  pred = mdl_elnet_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::roc_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Elastic Net Model ROC Curve, Test Data",
    subtitle = "AUC = 0.8733"
  )

Here is the gain curve.

68 of the 177 passengers in the holdout set survived.

  • The gain curve encountered 34 survivors (50%) within the first 37 observations (21%).

  • It encountered the 68th survivor on the 163th observation (92%).

data.frame(
  pred = mdl_elnet_preds$Yes, 
  obs = training_20$Survived
) %>%
  yardstick::gain_curve(obs, pred) %>%
  autoplot() +
  labs(
    title = "Elastic Net Model Gain Curve on Holdout Set"
  )

Conclusions

Compare the models with evalmod().

scores_list <- join_scores(
  predict(mdl_ridge, newdata = training_20, type = "prob")$Yes,
  predict(mdl_lasso, newdata = training_20, type = "prob")$Yes,
  predict(mdl_elnet, newdata = training_20, type = "prob")$Yes
)
labels_list <- join_labels(
  training_20$Survived,
  training_20$Survived,
  training_20$Survived
)

pe <- evalmod(
  scores = scores_list, 
  labels = labels_list,
  modnames = c("Ridge", "Lasso", "Elastic Net"),
  posclass = "Yes")

autoplot(pe, "ROC")

pe
## 
##     === AUCs ===
## 
##       Model name Dataset ID Curve type       AUC
##    1       Ridge          1        ROC 0.8768214
##    2       Ridge          1        PRC 0.8490097
##    3       Lasso          1        ROC 0.8651511
##    4       Lasso          1        PRC 0.8394165
##    5 Elastic Net          1        ROC 0.8783055
##    6 Elastic Net          1        PRC 0.8510103
## 
## 
##     === Input data ===
## 
##       Model name Dataset ID # of negatives # of positives
##    1       Ridge          1            109             68
##    2       Lasso          1            109             68
##    3 Elastic Net          1            109             68

The highest AUC was with elastic net. Elastic net also had the bestmedian accuracy.

resamps <- resamples(list('Ridge' = mdl_ridge, 
                          'Lasso' = mdl_lasso,
                          'Elastic Net' = mdl_elnet))
summary(resamps)
## 
## Call:
## summary.resamples(object = resamps)
## 
## Models: Ridge, Lasso, Elastic Net 
## Number of resamples: 10 
## 
## Accuracy 
##                  Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## Ridge       0.8028169 0.8175372 0.8321596 0.8390063 0.8561718 0.9014085    0
## Lasso       0.7746479 0.8175372 0.8380282 0.8375587 0.8561718 0.9154930    0
## Elastic Net 0.8028169 0.8175372 0.8391041 0.8403756 0.8591549 0.9014085    0
## 
## Kappa 
##                 Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## Ridge       0.556250 0.5937978 0.6360042 0.6454318 0.6917002 0.7830642    0
## Lasso       0.507799 0.5937978 0.6437060 0.6460703 0.6941293 0.8181042    0
## Elastic Net 0.556250 0.5937978 0.6385230 0.6482558 0.7000939 0.7830642    0
bwplot(resamps, layout = c(3, 1))

Refit Final Model

I’ll do a final fit with the elastic net model to the entire training set to predict on testing.

Here is the fit summary.

set.seed(1970)
mdl_final <- train(
  rcpe,
  training[, mdl_vars],
  method = "glmnet",
  family = "binomial",
  tuneGrid = expand.grid(
    .alpha = 0.0,
    .lambda = 0.03
  ),
  trControl = train_control,
  metric = "Accuracy"
)
mdl_final$bestTune
##   alpha lambda
## 1     0   0.03

AgeCohort continues to be the most important predictor.

varImp(mdl_final)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 25)
## 
##                         Overall
## AgeCohort_gt10          100.000
## Sex_female               97.260
## Sex_female_x_Embarked_Q  96.029
## Sex_female_x_Pclass_X2   87.733
## NetSurv                  67.090
## Pclass_X3                65.482
## TicketNCohort_gt4        59.663
## Employee_X1              52.858
## Pclass_X2                48.294
## Sex_female_x_Pclass_X3   40.090
## Embarked_Q               35.460
## Deck_C                   25.340
## Embarked_S               24.992
## Sex_female_x_Embarked_S  15.785
## SibSp                     9.609
## Deck_E                    7.486
## Deck_B                    7.096
## TicketN                   6.968
## Parch                     4.726
## Deck_D                    3.875

Resampling Performance

The accuracy from the confusion matrix below is 0.8305.

confusionMatrix(mdl_final)
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction   No  Yes
##        No  56.1 11.4
##        Yes  5.5 26.9
##                             
##  Accuracy (average) : 0.8305

Create Submission File

preds <- predict(mdl_final, newdata = testing) %>% {ifelse(. == "Yes", 1, 0)}
sub_file <- data.frame(PassengerId = testing$PassengerId, Survived = preds)
write.csv(sub_file, file = "./titanic_04_glmnet.csv", row.names = FALSE)