Part 1: Definition (20 Points)

  1. The Trade-Off Between Prediction Accuracy and Model Interpretability Whenever building a model, generally more complexity allows for a more accurate and comprehensive analysis whether it is classification or regression. However, this typically means the model is generally less interpretable. Typically, we want a model that is more accurate than interpretable, but it is important to make sure you can explain your model, especially in the instance of medical works

  2. Supervised Versus Unsupervised Learning
    Supervised learning is when we have a clear input and output with our variables. The model learns off of previous instances of data, and this method is better for prediction rather than classification. Learning takes place on the computer, and is typically faster and less complex. On the other hand, Unsupervised learning means we only have a clear input, and as such, the model learns off of natural patterns exhibited in the data. This can help find patterns in the data, and as such, works really well for classification.

  3. The Bias-Variance Trade-Off Bias and variance have an inverse correlation, meaning as one increases the other increases. This is important to consider in your models, as more flexible models result in less variance but more bias, and less flexible models result in less bias but more variance. This is why we assess our models by RMSE, as we can generally make good judgement on which model may be more proper for our situation off of it.

  4. Linear Regression versus K-Nearest Neighbors Linear regression is a form of supervised learning. Linear regression works very well with linear data, and is very easy to implement in comparison to other methods. Linear regression however, is typically not very good with non-linear data, and thus is not completely applicable to every situation. KNN is also supervised, but offers an extreme amount of flexibility. The model can be computationally expensive if k is high enough, and is generally prone to overfitting.

  5. Logistic Regression versus LDA versus QDA Logistic regression typically works really for binary classification i.e spam/not spam and has a linear boundary, as does LDA, but they differ in the way that the assumptions for LDA are more strict, and thus, typically result in higher accuracy for LDA. QDA has a quadratic boundary. Because its decision boudnary is not linear, it is typically more flexible than either LDA or LR. It is essentially a compromise between LDA and KNN

library('tidyverse')
library('caret')
library('modelr')
library('readr')
set.seed(10003)

Part 2: Regression (40 Points)

The table below displays catalog-spending data for the first few of 200 randomly selected individuals from a very large (over 20,000 households) data base.1 The variable of particular interest is catalog spending as measured by the Spending Ratio (SpendRat). All of the catalog variables are represented by indicator variables; either the consumer bought and the variable is coded as 1 or the consumer didn’t buy and the variable is coded as 0. The other variables can be viewed as indexes for measuring assets, liquidity, and spending.

We recode all of our binary variables into factors so that we may use it as ordinal data.

catalog<-read_csv('C:\\Users\\diego\\Desktop\\Data Mining\\assignments\\midterm\\catalog.csv')
Parsed with column specification:
cols(
  .default = col_double()
)
See spec(...) for full column specifications.
catalog$CollGifts<-as_factor(catalog$CollGifts)
catalog$BricMortar<-as_factor(catalog$BricMortar)
catalog$MarthaHome<-as_factor(catalog$MarthaHome)
catalog$SunAds<-as_factor(catalog$SunAds)
catalog$ThemeColl<-as_factor(catalog$ThemeColl)
catalog$CustDec<-as_factor(catalog$CustDec)
catalog$RetailKids<-as_factor(catalog$RetailKids)
catalog$TeenWr<-as_factor(catalog$TeenWr)
catalog$Carlovers<-as_factor(catalog$Carlovers)
catalog$CountryColl<-as_factor(catalog$CountryColl)
str(catalog)
Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    200 obs. of  21 variables:
 $ SpendRat   : num  11.8 16.8 11.4 31.3 1.9 ...
 $ Age        : num  0 35 46 41 46 46 46 56 48 54 ...
 $ LenRes     : num  2 3 9 2 7 15 16 31 8 8 ...
 $ Income     : num  3 5 5 2 9 5 4 6 5 5 ...
 $ TotAsset   : num  122 195 123 117 493 138 162 117 119 50 ...
 $ SecAssets  : num  27 36 24 25 105 27 25 27 23 10 ...
 $ ShortLiq   : num  225 220 200 222 310 340 230 300 250 200 ...
 $ LongLiq    : num  422 420 420 419 500 450 430 440 430 420 ...
 $ WlthIdx    : num  286 430 290 279 520 440 360 400 360 230 ...
 $ SpendVol   : num  503 690 600 543 680 440 690 500 610 660 ...
 $ SpenVel    : num  285 570 280 308 100 50 180 10 0 0 ...
 $ CollGifts  : Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 2 2 1 ...
 $ BricMortar : Factor w/ 2 levels "0","1": 1 2 1 1 2 2 1 2 1 2 ...
 $ MarthaHome : Factor w/ 2 levels "0","1": 1 2 1 1 2 2 1 2 2 1 ...
 $ SunAds     : Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 1 1 1 ...
 $ ThemeColl  : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 2 2 1 ...
 $ CustDec    : Factor w/ 2 levels "0","1": 2 2 2 1 2 2 1 2 2 1 ...
 $ RetailKids : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 2 1 1 ...
 $ TeenWr     : Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 2 1 2 ...
 $ Carlovers  : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
 $ CountryColl: Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 1 2 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   SpendRat = col_double(),
  ..   Age = col_double(),
  ..   LenRes = col_double(),
  ..   Income = col_double(),
  ..   TotAsset = col_double(),
  ..   SecAssets = col_double(),
  ..   ShortLiq = col_double(),
  ..   LongLiq = col_double(),
  ..   WlthIdx = col_double(),
  ..   SpendVol = col_double(),
  ..   SpenVel = col_double(),
  ..   CollGifts = col_double(),
  ..   BricMortar = col_double(),
  ..   MarthaHome = col_double(),
  ..   SunAds = col_double(),
  ..   ThemeColl = col_double(),
  ..   CustDec = col_double(),
  ..   RetailKids = col_double(),
  ..   TeenWr = col_double(),
  ..   Carlovers = col_double(),
  ..   CountryColl = col_double()
  .. )

Data Cleaning

The goal of this section is to explore the data set and get it ready for analysis. There are no missing values in the data set, but there are some incorrect entries that must be identified and removed before completing the analysis. Income is coded as an ordinal value, ranging from 1 to 12. Age can be regarded as quantitative, and any value less than 18 is invalid. Length of residence (LenRes) is a value ranging from zero to someone’s age. LenRes should not be higher than Age. You should create a simple 1-2 paragraph summary of this section. Be sure to fully explain the reasoning behind transforming any columns and removing any rows. Campbell told me to is not sufficient. Justify why it makes sense not to include any rows whose age is less than 18 or why we shouldn’t use rows in which length of residence is larger than age.

Here, we filter our dataset to have rows in which data follow the following specifications: ages are greater than 18, and LenRes is less than age. age has to be greater than 18 as it is pretty impossible to own a house under 18 under normal circumstances, and LenRes has the specification because you can’t own a house longer than you’ve existed for unless you’re a trust fund baby. This could be accounted for in the dataset but it has been decided it’s not pertinent.

cat.clean<-filter(catalog,Age>=18,LenRes<=Age)

Basic Summary

Provide a basic summary of the cleaned data set. Include a table of univariate statistics to summarize each variable. Choose meaningful summary statistics for each type of variable. You should also include a basic summary of the catalog spending (SpendRat) including an appropriate graphical display.

dim(cat.clean)
[1] 184  21
summary(cat.clean)
    SpendRat            Age            LenRes          Income         TotAsset     
 Min.   :  0.080   Min.   :21.00   Min.   : 0.00   Min.   :1.000   Min.   :  5.00  
 1st Qu.:  6.077   1st Qu.:44.75   1st Qu.: 8.00   1st Qu.:4.000   1st Qu.: 94.75  
 Median : 18.805   Median :53.00   Median :11.00   Median :5.000   Median :150.00  
 Mean   : 43.792   Mean   :54.71   Mean   :14.58   Mean   :4.473   Mean   :184.67  
 3rd Qu.: 50.273   3rd Qu.:63.00   3rd Qu.:19.00   3rd Qu.:5.000   3rd Qu.:222.50  
 Max.   :401.420   Max.   :89.00   Max.   :46.00   Max.   :9.000   Max.   :999.00  
   SecAssets        ShortLiq        LongLiq         WlthIdx         SpendVol    
 Min.   :  0.0   Min.   :160.0   Min.   :400.0   Min.   : 90.0   Min.   :  0.0  
 1st Qu.: 19.0   1st Qu.:210.0   1st Qu.:420.0   1st Qu.:300.0   1st Qu.:532.0  
 Median : 28.0   Median :230.0   Median :430.0   Median :360.0   Median :610.0  
 Mean   : 40.9   Mean   :240.6   Mean   :439.5   Mean   :367.1   Mean   :568.4  
 3rd Qu.: 42.0   3rd Qu.:260.0   3rd Qu.:440.0   3rd Qu.:430.0   3rd Qu.:670.0  
 Max.   :999.0   Max.   :999.0   Max.   :999.0   Max.   :880.0   Max.   :780.0  
    SpenVel      CollGifts BricMortar MarthaHome SunAds  ThemeColl CustDec RetailKids
 Min.   :  0.0   0:94      0:131      0:117      0:105   0:111     0:120   0:119     
 1st Qu.: 40.0   1:90      1: 53      1: 67      1: 79   1: 73     1: 64   1: 65     
 Median :160.0                                                                       
 Mean   :219.5                                                                       
 3rd Qu.:310.0                                                                       
 Max.   :999.0                                                                       
 TeenWr Carlovers CountryColl
 0:89   0:133     0:107      
 1:95   1: 51     1: 77      
                             
                             
                             
                             
cat.clean %>%
  ggplot(aes(x=log(SpendRat))) + geom_histogram()

Modeling

We are interested in developing a model to predict spending ratio. Find a multiple regression model for predicting the amount of money that consumers will spend on catalog shopping, as measured by spending ratio. Your goal is to identify the best model you can. In your write-up be sure to justify your choice of model, discuss any transformation you make to the variables, discuss your model fit, and discuss the effect of the significant predictors using both hypothesis tests and confidence intervals. Remember to check the conditions for inference as you evaluate your models. The data set is much too small to split into training and test data sets, so use cross validation in all your models

## Set up Repeated k-fold Cross Validation
train_control <- trainControl(method="repeatedcv", number=10, repeats=3)

a. Fit a linear model using least squares on the training set, and report the CV error obtained.

lm.fit<-train(log(SpendRat)~., data=cat.clean, trControl=train_control,method='lm')
print(lm.fit)
Linear Regression 

184 samples
 20 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 167, 167, 165, 165, 165, 166, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  1.353788  0.3089562  1.047582

Tuning parameter 'intercept' was held constant at a value of TRUE

b. Fit a ridge regression model on the training set, with \(\lambda\) chosen by cross-validation. Report the CV error obtained.

y_train=log(cat.clean$SpendRat)
X_train=model_matrix(cat.clean,log(SpendRat)~Age+LenRes+Income+TotAsset+SecAssets+ShortLiq+LongLiq+WlthIdx+SpendVol+SpenVel+CollGifts+BricMortar+MarthaHome+SunAds+ThemeColl+CustDec+RetailKids+TeenWr+Carlovers+CountryColl)
parameters <- c(seq(0.1, 2, by =0.1) ,  seq(2, 5, 0.5) , seq(5, 25, 1))
ridge.fit<-train(y=y_train,x=X_train,method='glmnet',trControl=train_control,tuneGrid=expand.grid(alpha=0,lambda = parameters))
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.
print(ridge.fit)
glmnet 

184 samples
 21 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 166, 166, 165, 165, 167, 165, ... 
Resampling results across tuning parameters:

  lambda  RMSE      Rsquared   MAE      
   0.1    1.267051  0.3366916  1.0119377
   0.2    1.252265  0.3452095  0.9998015
   0.3    1.246772  0.3490908  0.9954463
   0.4    1.244588  0.3512467  0.9943238
   0.5    1.244173  0.3525806  0.9944682
   0.6    1.244855  0.3534525  0.9953885
   0.7    1.246272  0.3540359  0.9971209
   0.8    1.248201  0.3544200  0.9994983
   0.9    1.250498  0.3546429  1.0020119
   1.0    1.253046  0.3547497  1.0045020
   1.1    1.255770  0.3547620  1.0070238
   1.2    1.258615  0.3546985  1.0096574
   1.3    1.261542  0.3545660  1.0123627
   1.4    1.264510  0.3543813  1.0150804
   1.5    1.267494  0.3541502  1.0177404
   1.6    1.270489  0.3538833  1.0203233
   1.7    1.273477  0.3535828  1.0227834
   1.8    1.276424  0.3532558  1.0251233
   1.9    1.279358  0.3529077  1.0273822
   2.0    1.282242  0.3525407  1.0295746
   2.5    1.296000  0.3505258  1.0399743
   3.0    1.308477  0.3484035  1.0494215
   3.5    1.319693  0.3463194  1.0577313
   4.0    1.329801  0.3443176  1.0654698
   4.5    1.338892  0.3424487  1.0731297
   5.0    1.347120  0.3407101  1.0800273
   6.0    1.361405  0.3376191  1.0924368
   7.0    1.373374  0.3349786  1.1028297
   8.0    1.383498  0.3327413  1.1115573
   9.0    1.392224  0.3307994  1.1190655
  10.0    1.399794  0.3291209  1.1254942
  11.0    1.406425  0.3276583  1.1311264
  12.0    1.412285  0.3263728  1.1360674
  13.0    1.417495  0.3252379  1.1403968
  14.0    1.422162  0.3242282  1.1442105
  15.0    1.426381  0.3233129  1.1476060
  16.0    1.430212  0.3224853  1.1506694
  17.0    1.433677  0.3217527  1.1534134
  18.0    1.436858  0.3210735  1.1559043
  19.0    1.439782  0.3204541  1.1582330
  20.0    1.442458  0.3198946  1.1603602
  21.0    1.444955  0.3193679  1.1623622
  22.0    1.447243  0.3188948  1.1641816
  23.0    1.449396  0.3184437  1.1658883
  24.0    1.451380  0.3180365  1.1674491
  25.0    1.453249  0.3176497  1.1689215

Tuning parameter 'alpha' was held constant at a value of 0
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 0 and lambda = 0.5.

c. Fit a lasso model on the training set, with \(\lambda\) chosen by cross-validation. Report the CV error obtained, along with the number of non-zero coefficient estimates.

y_train=log(cat.clean$SpendRat)
X_train=model_matrix(cat.clean,log(SpendRat)~Age+LenRes+Income+TotAsset+SecAssets+ShortLiq+LongLiq+WlthIdx+SpendVol+SpenVel+CollGifts+BricMortar+MarthaHome+SunAds+ThemeColl+CustDec+RetailKids+TeenWr+Carlovers+CountryColl)
parameters <- c(seq(0.1, 2, by =0.1) ,  seq(2, 5, 0.5) , seq(5, 25, 1))
lasso.fit<-train(y=y_train,x=X_train,method='glmnet',trControl=train_control,tuneGrid=expand.grid(alpha=1,lambda = parameters))
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.There were missing values in resampled performance measures.Setting row names on a tibble is deprecated.
print(lasso.fit)
glmnet 

184 samples
 21 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 166, 166, 167, 166, 167, 165, ... 
Resampling results across tuning parameters:

  lambda  RMSE      Rsquared    MAE      
   0.1    1.252745  0.34116209  0.9993008
   0.2    1.301890  0.32135334  1.0412411
   0.3    1.351439  0.31181029  1.0803331
   0.4    1.415902  0.27410827  1.1336850
   0.5    1.474425  0.22194382  1.1817590
   0.6    1.511403  0.04943936  1.2115716
   0.7    1.513084         NaN  1.2133441
   0.8    1.513084         NaN  1.2133441
   0.9    1.513084         NaN  1.2133441
   1.0    1.513084         NaN  1.2133441
   1.1    1.513084         NaN  1.2133441
   1.2    1.513084         NaN  1.2133441
   1.3    1.513084         NaN  1.2133441
   1.4    1.513084         NaN  1.2133441
   1.5    1.513084         NaN  1.2133441
   1.6    1.513084         NaN  1.2133441
   1.7    1.513084         NaN  1.2133441
   1.8    1.513084         NaN  1.2133441
   1.9    1.513084         NaN  1.2133441
   2.0    1.513084         NaN  1.2133441
   2.5    1.513084         NaN  1.2133441
   3.0    1.513084         NaN  1.2133441
   3.5    1.513084         NaN  1.2133441
   4.0    1.513084         NaN  1.2133441
   4.5    1.513084         NaN  1.2133441
   5.0    1.513084         NaN  1.2133441
   6.0    1.513084         NaN  1.2133441
   7.0    1.513084         NaN  1.2133441
   8.0    1.513084         NaN  1.2133441
   9.0    1.513084         NaN  1.2133441
  10.0    1.513084         NaN  1.2133441
  11.0    1.513084         NaN  1.2133441
  12.0    1.513084         NaN  1.2133441
  13.0    1.513084         NaN  1.2133441
  14.0    1.513084         NaN  1.2133441
  15.0    1.513084         NaN  1.2133441
  16.0    1.513084         NaN  1.2133441
  17.0    1.513084         NaN  1.2133441
  18.0    1.513084         NaN  1.2133441
  19.0    1.513084         NaN  1.2133441
  20.0    1.513084         NaN  1.2133441
  21.0    1.513084         NaN  1.2133441
  22.0    1.513084         NaN  1.2133441
  23.0    1.513084         NaN  1.2133441
  24.0    1.513084         NaN  1.2133441
  25.0    1.513084         NaN  1.2133441

Tuning parameter 'alpha' was held constant at a value of 1
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 1 and lambda = 0.1.

d. Fit a PCR model on the training set, with M chosen by cross-validation. Report the CV error obtained, along with the value of M selected by cross-validation.

pcr.fit<-train(log(SpendRat)~., data=cat.clean, trControl=train_control,tuneLength=ncol(cat.clean),method='pcr')
1 package is needed for this model and is not installed. (pls). Would you like to try to install it now?
1: yes
2: no
1
Installing package into 㤼㸱C:/Users/diego/Documents/R/win-library/3.6㤼㸲
(as 㤼㸱lib㤼㸲 is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.6/pls_2.7-2.zip'
Content type 'application/zip' length 1230527 bytes (1.2 MB)
downloaded 1.2 MB
package ‘pls’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\diego\AppData\Local\Temp\Rtmp2NTT8O\downloaded_packages
plot(pcr.fit)

pcr.fit$bestTune
print(pcr.fit)
Principal Component Analysis 

184 samples
 20 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 166, 166, 166, 165, 166, 165, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared    MAE     
   1     1.503001  0.05124398  1.223822
   2     1.500916  0.05030193  1.225134
   3     1.510551  0.06080604  1.240493
   4     1.507454  0.04496214  1.229418
   5     1.508424  0.04632181  1.229763
   6     1.547384  0.04228428  1.255670
   7     1.553237  0.03493156  1.261259
   8     1.602087  0.03715311  1.285698
   9     1.679544  0.04308677  1.309618
  10     1.713405  0.03979775  1.315119
  11     1.607684  0.10911150  1.246436
  12     1.489018  0.23842172  1.137157
  13     1.435726  0.30231337  1.066561
  14     1.436901  0.30390130  1.065393
  15     1.361983  0.32455326  1.035026
  16     1.360091  0.32414467  1.038810
  17     1.370272  0.32137060  1.047106
  18     1.394330  0.32325615  1.056278
  19     1.398918  0.31781916  1.059788

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 16.

e. Fit a PLS model on the training set, with M chosen by cross-validation. Report the CV error obtained, along with the value of M selected by cross-validation.

pls.fit<-train(log(SpendRat)~., data=cat.clean, trControl=train_control,tuneLength=ncol(cat.clean),method='pls')
plot(pls.fit)

pls.fit$bestTune
print(pls.fit)
Partial Least Squares 

184 samples
 20 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times) 
Summary of sample sizes: 166, 165, 164, 166, 165, 166, ... 
Resampling results across tuning parameters:

  ncomp  RMSE      Rsquared    MAE     
   1     1.514124  0.05180940  1.234679
   2     1.519536  0.04003535  1.236803
   3     1.521734  0.03980321  1.238881
   4     1.517557  0.03787143  1.227074
   5     1.589556  0.02953898  1.263265
   6     1.615449  0.03619017  1.282169
   7     1.596234  0.04111200  1.278422
   8     1.777183  0.04538481  1.323395
   9     2.020306  0.06903793  1.377884
  10     1.485052  0.21801695  1.129988
  11     1.359647  0.29279979  1.039547
  12     1.366636  0.30431906  1.036349
  13     1.370391  0.29791756  1.047712
  14     1.396472  0.29764361  1.055757
  15     1.394893  0.29916053  1.053718
  16     1.391851  0.29975506  1.052253
  17     1.391739  0.29968962  1.052281
  18     1.391706  0.29971672  1.052274
  19     1.391703  0.29971739  1.052272

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 11.

f. Comment on the results obtained. How accurately can we predict the Spending Ratio? Is there much difference among the CV errors resulting from these five approaches?

Model RMSE
Lasso Regression 1.2527448
Ridge Regression 1.2441734
Principle Components Regression 1.3600913
Partial Least Squares Regression 1.3596472
Linear Regression 1.3537877

Best model was Ridge Regression with \(\lambda\) = 0.1 with a RMSE of 1.2441734. The worst performing model was the Principal components model with a RMSE of 1.5030006, 1.5009159, 1.5105507, 1.507454, 1.5084236, 1.5473836, 1.5532372, 1.6020867, 1.6795437, 1.7134054, 1.6076836, 1.4890181, 1.4357262, 1.4369009, 1.3619829, 1.3600913, 1.3702716, 1.3943295, 1.3989184. There’s not a substantial amount of difference of CV RMSE across the five models.

Part 3: Classification (40 Points)

In this problem, you will develop a model to predict whether income exceeds $50K/yr based on census data.

a. Use the code in Blackboard to create the adult data set.

adult<-read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", col_names=FALSE, na='?')
Parsed with column specification:
cols(
  X1 = col_double(),
  X2 = col_character(),
  X3 = col_double(),
  X4 = col_character(),
  X5 = col_double(),
  X6 = col_character(),
  X7 = col_character(),
  X8 = col_character(),
  X9 = col_character(),
  X10 = col_character(),
  X11 = col_double(),
  X12 = col_double(),
  X13 = col_double(),
  X14 = col_character(),
  X15 = col_character()
)
names(adult)<-c("age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","sex","capital_gain","capital_loss","hours_per_week","native_country","income")
set.seed(1303)
adult$fnlwgt<-NULL
adult$workclass<-as_factor(adult$workclass)
adult$education<-as_factor(adult$education)
adult$marital_status<-as_factor(adult$marital_status)
adult$occupation<-as_factor(adult$occupation)
adult$relationship<-as_factor(adult$relationship)
adult$race<-as_factor(adult$race)
adult$sex<-as_factor(adult$sex)
adult$native_country<-as_factor(adult$native_country)
adult$income<-as_factor(adult$income)
adult<-na.omit(adult)
y=adult$income
X=model_matrix(adult,income~.)
X$`(Intercept)`<-NULL
adult2<-as_tibble(cbind(adult$income,X),.name_repair = "unique")
names(adult2)[1]<-"income"
attach(adult2)
The following objects are masked from adult2 (pos = 4):

    age, capital_gain, capital_loss, education_num, education10th,
    education11th, education12th, education1st-4th, education5th-6th,
    education7th-8th, education9th, educationAssoc-acdm, educationAssoc-voc,
    educationDoctorate, educationHS-grad, educationMasters, educationPreschool,
    educationProf-school, educationSome-college, hours_per_week, income,
    marital_statusDivorced, marital_statusMarried-AF-spouse,
    marital_statusMarried-civ-spouse, marital_statusMarried-spouse-absent,
    marital_statusSeparated, marital_statusWidowed, native_countryCambodia,
    native_countryCanada, native_countryChina, native_countryColumbia,
    native_countryCuba, native_countryDominican-Republic, native_countryEcuador,
    native_countryEl-Salvador, native_countryEngland, native_countryFrance,
    native_countryGermany, native_countryGreece, native_countryGuatemala,
    native_countryHaiti, native_countryHoland-Netherlands,
    native_countryHonduras, native_countryHong, native_countryHungary,
    native_countryIndia, native_countryIran, native_countryIreland,
    native_countryItaly, native_countryJamaica, native_countryJapan,
    native_countryLaos, native_countryMexico, native_countryNicaragua,
    native_countryOutlying-US(Guam-USVI-etc), native_countryPeru,
    native_countryPhilippines, native_countryPoland, native_countryPortugal,
    native_countryPuerto-Rico, native_countryScotland, native_countrySouth,
    native_countryTaiwan, native_countryThailand, native_countryTrinadad&Tobago,
    native_countryVietnam, native_countryYugoslavia, occupationArmed-Forces,
    occupationCraft-repair, occupationExec-managerial,
    occupationFarming-fishing, occupationHandlers-cleaners,
    occupationMachine-op-inspct, occupationOther-service,
    occupationPriv-house-serv, occupationProf-specialty,
    occupationProtective-serv, occupationSales, occupationTech-support,
    occupationTransport-moving, raceAmer-Indian-Eskimo, raceAsian-Pac-Islander,
    raceBlack, raceOther, relationshipHusband, relationshipOther-relative,
    relationshipOwn-child, relationshipUnmarried, relationshipWife, sexFemale,
    workclassFederal-gov, workclassLocal-gov, workclassNever-worked,
    workclassPrivate, workclassSelf-emp-inc, workclassSelf-emp-not-inc,
    workclassWithout-pay
summary(adult2)
   income           age        workclassSelf-emp-not-inc workclassPrivate
 <=50K:22654   Min.   :17.00   Min.   :0.00000           Min.   :0.0000  
 >50K : 7508   1st Qu.:28.00   1st Qu.:0.00000           1st Qu.:0.0000  
               Median :37.00   Median :0.00000           Median :1.0000  
               Mean   :38.44   Mean   :0.08285           Mean   :0.7389  
               3rd Qu.:47.00   3rd Qu.:0.00000           3rd Qu.:1.0000  
               Max.   :90.00   Max.   :1.00000           Max.   :1.0000  
 workclassFederal-gov workclassLocal-gov workclassSelf-emp-inc workclassWithout-pay
 Min.   :0.00000      Min.   :0.00000    Min.   :0.00000       Min.   :0.0000000   
 1st Qu.:0.00000      1st Qu.:0.00000    1st Qu.:0.00000       1st Qu.:0.0000000   
 Median :0.00000      Median :0.00000    Median :0.00000       Median :0.0000000   
 Mean   :0.03126      Mean   :0.06853    Mean   :0.03561       Mean   :0.0004642   
 3rd Qu.:0.00000      3rd Qu.:0.00000    3rd Qu.:0.00000       3rd Qu.:0.0000000   
 Max.   :1.00000      Max.   :1.00000    Max.   :1.00000       Max.   :1.0000000   
 workclassNever-worked educationHS-grad education11th     educationMasters 
 Min.   :0             Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0             1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :0             Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0             Mean   :0.3262   Mean   :0.03475   Mean   :0.05394  
 3rd Qu.:0             3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :0             Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
  education9th     educationSome-college educationAssoc-acdm educationAssoc-voc
 Min.   :0.00000   Min.   :0.0000        Min.   :0.00000     Min.   :0.00000   
 1st Qu.:0.00000   1st Qu.:0.0000        1st Qu.:0.00000     1st Qu.:0.00000   
 Median :0.00000   Median :0.0000        Median :0.00000     Median :0.00000   
 Mean   :0.01509   Mean   :0.2214        Mean   :0.03342     Mean   :0.04333   
 3rd Qu.:0.00000   3rd Qu.:0.0000        3rd Qu.:0.00000     3rd Qu.:0.00000   
 Max.   :1.00000   Max.   :1.0000        Max.   :1.00000     Max.   :1.00000   
 education7th-8th  educationDoctorate educationProf-school education5th-6th  
 Min.   :0.00000   Min.   :0.00000    Min.   :0.00000      Min.   :0.000000  
 1st Qu.:0.00000   1st Qu.:0.00000    1st Qu.:0.00000      1st Qu.:0.000000  
 Median :0.00000   Median :0.00000    Median :0.00000      Median :0.000000  
 Mean   :0.01847   Mean   :0.01243    Mean   :0.01797      Mean   :0.009548  
 3rd Qu.:0.00000   3rd Qu.:0.00000    3rd Qu.:0.00000      3rd Qu.:0.000000  
 Max.   :1.00000   Max.   :1.00000    Max.   :1.00000      Max.   :1.000000  
 education10th     education1st-4th   educationPreschool education12th    education_num  
 Min.   :0.00000   Min.   :0.000000   Min.   :0.000000   Min.   :0.0000   Min.   : 1.00  
 1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.: 9.00  
 Median :0.00000   Median :0.000000   Median :0.000000   Median :0.0000   Median :10.00  
 Mean   :0.02719   Mean   :0.005006   Mean   :0.001492   Mean   :0.0125   Mean   :10.12  
 3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.0000   3rd Qu.:13.00  
 Max.   :1.00000   Max.   :1.000000   Max.   :1.000000   Max.   :1.0000   Max.   :16.00  
 marital_statusMarried-civ-spouse marital_statusDivorced
 Min.   :0.0000                   Min.   :0.0000        
 1st Qu.:0.0000                   1st Qu.:0.0000        
 Median :0.0000                   Median :0.0000        
 Mean   :0.4663                   Mean   :0.1397        
 3rd Qu.:1.0000                   3rd Qu.:0.0000        
 Max.   :1.0000                   Max.   :1.0000        
 marital_statusMarried-spouse-absent marital_statusSeparated
 Min.   :0.00000                     Min.   :0.00000        
 1st Qu.:0.00000                     1st Qu.:0.00000        
 Median :0.00000                     Median :0.00000        
 Mean   :0.01227                     Mean   :0.03113        
 3rd Qu.:0.00000                     3rd Qu.:0.00000        
 Max.   :1.00000                     Max.   :1.00000        
 marital_statusMarried-AF-spouse marital_statusWidowed occupationExec-managerial
 Min.   :0.0000000               Min.   :0.00000       Min.   :0.0000           
 1st Qu.:0.0000000               1st Qu.:0.00000       1st Qu.:0.0000           
 Median :0.0000000               Median :0.00000       Median :0.0000           
 Mean   :0.0006962               Mean   :0.02742       Mean   :0.1324           
 3rd Qu.:0.0000000               3rd Qu.:0.00000       3rd Qu.:0.0000           
 Max.   :1.0000000               Max.   :1.00000       Max.   :1.0000           
 occupationHandlers-cleaners occupationProf-specialty occupationOther-service
 Min.   :0.00000             Min.   :0.0000           Min.   :0.0000         
 1st Qu.:0.00000             1st Qu.:0.0000           1st Qu.:0.0000         
 Median :0.00000             Median :0.0000           Median :0.0000         
 Mean   :0.04476             Mean   :0.1339           Mean   :0.1065         
 3rd Qu.:0.00000             3rd Qu.:0.0000           3rd Qu.:0.0000         
 Max.   :1.00000             Max.   :1.0000           Max.   :1.0000         
 occupationSales  occupationCraft-repair occupationTransport-moving
 Min.   :0.0000   Min.   :0.0000         Min.   :0.00000           
 1st Qu.:0.0000   1st Qu.:0.0000         1st Qu.:0.00000           
 Median :0.0000   Median :0.0000         Median :0.00000           
 Mean   :0.1188   Mean   :0.1336         Mean   :0.05212           
 3rd Qu.:0.0000   3rd Qu.:0.0000         3rd Qu.:0.00000           
 Max.   :1.0000   Max.   :1.0000         Max.   :1.00000           
 occupationFarming-fishing occupationMachine-op-inspct occupationTech-support
 Min.   :0.00000           Min.   :0.00000             Min.   :0.00000       
 1st Qu.:0.00000           1st Qu.:0.00000             1st Qu.:0.00000       
 Median :0.00000           Median :0.00000             Median :0.00000       
 Mean   :0.03279           Mean   :0.06518             Mean   :0.03024       
 3rd Qu.:0.00000           3rd Qu.:0.00000             3rd Qu.:0.00000       
 Max.   :1.00000           Max.   :1.00000             Max.   :1.00000       
 occupationProtective-serv occupationArmed-Forces occupationPriv-house-serv
 Min.   :0.00000           Min.   :0.0000000      Min.   :0.000000         
 1st Qu.:0.00000           1st Qu.:0.0000000      1st Qu.:0.000000         
 Median :0.00000           Median :0.0000000      Median :0.000000         
 Mean   :0.02135           Mean   :0.0002984      Mean   :0.004741         
 3rd Qu.:0.00000           3rd Qu.:0.0000000      3rd Qu.:0.000000         
 Max.   :1.00000           Max.   :1.0000000      Max.   :1.000000         
 relationshipHusband relationshipWife  relationshipOwn-child relationshipUnmarried
 Min.   :0.0000      Min.   :0.00000   Min.   :0.0000        Min.   :0.0000       
 1st Qu.:0.0000      1st Qu.:0.00000   1st Qu.:0.0000        1st Qu.:0.0000       
 Median :0.0000      Median :0.00000   Median :0.0000        Median :0.0000       
 Mean   :0.4132      Mean   :0.04661   Mean   :0.1481        Mean   :0.1065       
 3rd Qu.:1.0000      3rd Qu.:0.00000   3rd Qu.:0.0000        3rd Qu.:0.0000       
 Max.   :1.0000      Max.   :1.00000   Max.   :1.0000        Max.   :1.0000       
 relationshipOther-relative   raceBlack      raceAsian-Pac-Islander
 Min.   :0.00000            Min.   :0.0000   Min.   :0.00000       
 1st Qu.:0.00000            1st Qu.:0.0000   1st Qu.:0.00000       
 Median :0.00000            Median :0.0000   Median :0.00000       
 Mean   :0.02947            Mean   :0.0934   Mean   :0.02967       
 3rd Qu.:0.00000            3rd Qu.:0.0000   3rd Qu.:0.00000       
 Max.   :1.00000            Max.   :1.0000   Max.   :1.00000       
 raceAmer-Indian-Eskimo   raceOther          sexFemale       capital_gain  
 Min.   :0.000000       Min.   :0.000000   Min.   :0.0000   Min.   :    0  
 1st Qu.:0.000000       1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.:    0  
 Median :0.000000       Median :0.000000   Median :0.0000   Median :    0  
 Mean   :0.009482       Mean   :0.007659   Mean   :0.3243   Mean   : 1092  
 3rd Qu.:0.000000       3rd Qu.:0.000000   3rd Qu.:1.0000   3rd Qu.:    0  
 Max.   :1.000000       Max.   :1.000000   Max.   :1.0000   Max.   :99999  
  capital_loss     hours_per_week  native_countryCuba native_countryJamaica
 Min.   :   0.00   Min.   : 1.00   Min.   :0.00000    Min.   :0.000000     
 1st Qu.:   0.00   1st Qu.:40.00   1st Qu.:0.00000    1st Qu.:0.000000     
 Median :   0.00   Median :40.00   Median :0.00000    Median :0.000000     
 Mean   :  88.37   Mean   :40.93   Mean   :0.00305    Mean   :0.002652     
 3rd Qu.:   0.00   3rd Qu.:45.00   3rd Qu.:0.00000    3rd Qu.:0.000000     
 Max.   :4356.00   Max.   :99.00   Max.   :1.00000    Max.   :1.000000     
 native_countryIndia native_countryMexico native_countrySouth native_countryPuerto-Rico
 Min.   :0.000000    Min.   :0.00000      Min.   :0.000000    Min.   :0.000000         
 1st Qu.:0.000000    1st Qu.:0.00000      1st Qu.:0.000000    1st Qu.:0.000000         
 Median :0.000000    Median :0.00000      Median :0.000000    Median :0.000000         
 Mean   :0.003315    Mean   :0.02022      Mean   :0.002354    Mean   :0.003614         
 3rd Qu.:0.000000    3rd Qu.:0.00000      3rd Qu.:0.000000    3rd Qu.:0.000000         
 Max.   :1.000000    Max.   :1.00000      Max.   :1.000000    Max.   :1.000000         
 native_countryHonduras native_countryEngland native_countryCanada native_countryGermany
 Min.   :0.0000000      Min.   :0.000000      Min.   :0.000000     Min.   :0.000000     
 1st Qu.:0.0000000      1st Qu.:0.000000      1st Qu.:0.000000     1st Qu.:0.000000     
 Median :0.0000000      Median :0.000000      Median :0.000000     Median :0.000000     
 Mean   :0.0003979      Mean   :0.002851      Mean   :0.003547     Mean   :0.004244     
 3rd Qu.:0.0000000      3rd Qu.:0.000000      3rd Qu.:0.000000     3rd Qu.:0.000000     
 Max.   :1.0000000      Max.   :1.000000      Max.   :1.000000     Max.   :1.000000     
 native_countryIran native_countryPhilippines native_countryItaly native_countryPoland
 Min.   :0.000000   Min.   :0.000000          Min.   :0.000000    Min.   :0.000000    
 1st Qu.:0.000000   1st Qu.:0.000000          1st Qu.:0.000000    1st Qu.:0.000000    
 Median :0.000000   Median :0.000000          Median :0.000000    Median :0.000000    
 Mean   :0.001393   Mean   :0.006233          Mean   :0.002254    Mean   :0.001857    
 3rd Qu.:0.000000   3rd Qu.:0.000000          3rd Qu.:0.000000    3rd Qu.:0.000000    
 Max.   :1.000000   Max.   :1.000000          Max.   :1.000000    Max.   :1.000000    
 native_countryColumbia native_countryCambodia native_countryThailand
 Min.   :0.000000       Min.   :0.0000000      Min.   :0.0000000     
 1st Qu.:0.000000       1st Qu.:0.0000000      1st Qu.:0.0000000     
 Median :0.000000       Median :0.0000000      Median :0.0000000     
 Mean   :0.001857       Mean   :0.0005968      Mean   :0.0005636     
 3rd Qu.:0.000000       3rd Qu.:0.0000000      3rd Qu.:0.0000000     
 Max.   :1.000000       Max.   :1.0000000      Max.   :1.0000000     
 native_countryEcuador native_countryLaos  native_countryTaiwan native_countryHaiti
 Min.   :0.0000000     Min.   :0.0000000   Min.   :0.000000     Min.   :0.000000   
 1st Qu.:0.0000000     1st Qu.:0.0000000   1st Qu.:0.000000     1st Qu.:0.000000   
 Median :0.0000000     Median :0.0000000   Median :0.000000     Median :0.000000   
 Mean   :0.0008952     Mean   :0.0005636   Mean   :0.001393     Mean   :0.001393   
 3rd Qu.:0.0000000     3rd Qu.:0.0000000   3rd Qu.:0.000000     3rd Qu.:0.000000   
 Max.   :1.0000000     Max.   :1.0000000   Max.   :1.000000     Max.   :1.000000   
 native_countryPortugal native_countryDominican-Republic native_countryEl-Salvador
 Min.   :0.000000       Min.   :0.000000                 Min.   :0.000000         
 1st Qu.:0.000000       1st Qu.:0.000000                 1st Qu.:0.000000         
 Median :0.000000       Median :0.000000                 Median :0.000000         
 Mean   :0.001127       Mean   :0.002221                 Mean   :0.003315         
 3rd Qu.:0.000000       3rd Qu.:0.000000                 3rd Qu.:0.000000         
 Max.   :1.000000       Max.   :1.000000                 Max.   :1.000000         
 native_countryFrance native_countryGuatemala native_countryChina native_countryJapan
 Min.   :0.0000000    Min.   :0.000000        Min.   :0.000000    Min.   :0.000000   
 1st Qu.:0.0000000    1st Qu.:0.000000        1st Qu.:0.000000    1st Qu.:0.000000   
 Median :0.0000000    Median :0.000000        Median :0.000000    Median :0.000000   
 Mean   :0.0008952    Mean   :0.002089        Mean   :0.002254    Mean   :0.001956   
 3rd Qu.:0.0000000    3rd Qu.:0.000000        3rd Qu.:0.000000    3rd Qu.:0.000000   
 Max.   :1.0000000    Max.   :1.000000        Max.   :1.000000    Max.   :1.000000   
 native_countryYugoslavia native_countryPeru  native_countryOutlying-US(Guam-USVI-etc)
 Min.   :0.0000000        Min.   :0.0000000   Min.   :0.0000000                       
 1st Qu.:0.0000000        1st Qu.:0.0000000   1st Qu.:0.0000000                       
 Median :0.0000000        Median :0.0000000   Median :0.0000000                       
 Mean   :0.0005305        Mean   :0.0009946   Mean   :0.0004642                       
 3rd Qu.:0.0000000        3rd Qu.:0.0000000   3rd Qu.:0.0000000                       
 Max.   :1.0000000        Max.   :1.0000000   Max.   :1.0000000                       
 native_countryScotland native_countryTrinadad&Tobago native_countryGreece
 Min.   :0.0000000      Min.   :0.0000000             Min.   :0.0000000   
 1st Qu.:0.0000000      1st Qu.:0.0000000             1st Qu.:0.0000000   
 Median :0.0000000      Median :0.0000000             Median :0.0000000   
 Mean   :0.0003647      Mean   :0.0005968             Mean   :0.0009615   
 3rd Qu.:0.0000000      3rd Qu.:0.0000000             3rd Qu.:0.0000000   
 Max.   :1.0000000      Max.   :1.0000000             Max.   :1.0000000   
 native_countryNicaragua native_countryVietnam native_countryHong  native_countryIreland
 Min.   :0.000000        Min.   :0.000000      Min.   :0.0000000   Min.   :0.0000000    
 1st Qu.:0.000000        1st Qu.:0.000000      1st Qu.:0.0000000   1st Qu.:0.0000000    
 Median :0.000000        Median :0.000000      Median :0.0000000   Median :0.0000000    
 Mean   :0.001094        Mean   :0.002122      Mean   :0.0006299   Mean   :0.0007957    
 3rd Qu.:0.000000        3rd Qu.:0.000000      3rd Qu.:0.0000000   3rd Qu.:0.0000000    
 Max.   :1.000000        Max.   :1.000000      Max.   :1.0000000   Max.   :1.0000000    
 native_countryHungary native_countryHoland-Netherlands
 Min.   :0.000000      Min.   :0.00e+00                
 1st Qu.:0.000000      1st Qu.:0.00e+00                
 Median :0.000000      Median :0.00e+00                
 Mean   :0.000431      Mean   :3.32e-05                
 3rd Qu.:0.000000      3rd Qu.:0.00e+00                
 Max.   :1.000000      Max.   :1.00e+00                

b. Explore the data graphically in order to investigate the association between income and the other features. Which of the other features seem most likely to be useful in predicting income? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings.

Using random forest method for feature selection.

##Generate a small sample of the data set to investigate which variables are closely associated with income
k=sample(nrow(adult2),nrow(adult2)*0.08)
library('e1071')
##Use random forest model to calculate variable importance
rf=train(income~.,data=adult2[k,])

It appears my computer is not powerful enough to run this. c. Split the data into an 80% training set and a 20% test set. Set the seed at 1303.

##Create training indicator vector
set.seed(1303)
inTrain <- createDataPartition(adult2$income, p=0.8, list=FALSE)
##Tabulate training and test data sets
train=adult2[inTrain,]
test=adult2[-inTrain,]
dim(adult2)
[1] 30162    97
dim(train)
[1] 24131    97
dim(test)
[1] 6031   97

d. Perform LDA on the training data in order to predict income using the variables that seemed most associated with income in (b). What is the test error of the model obtained?

##Train Model
lda.fit=train(income~age+`marital_statusMarried-civ-spouse`+capital_gain+education_num+hours_per_week+relationshipHusband+capital_loss+`occupationExec-managerial`+workclassPrivate,data=train,method='lda',trControl = trainControl(method = "cv"))
lda.fit
Linear Discriminant Analysis 

24131 samples
    9 predictor
    2 classes: '<=50K', '>50K' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 21719, 21719, 21718, 21718, 21718, 21717, ... 
Resampling results:

  Accuracy   Kappa    
  0.8317514  0.5150774
##Calculate Predictions
pred.lda<-predict(lda.fit,test)
##Estimate Accuracy
confusionMatrix(pred.lda,test$income)
Confusion Matrix and Statistics

          Reference
Prediction <=50K >50K
     <=50K  4181  686
     >50K    349  815
                                          
               Accuracy : 0.8284          
                 95% CI : (0.8186, 0.8378)
    No Information Rate : 0.7511          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5037          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.9230          
            Specificity : 0.5430          
         Pos Pred Value : 0.8591          
         Neg Pred Value : 0.7002          
             Prevalence : 0.7511          
         Detection Rate : 0.6933          
   Detection Prevalence : 0.8070          
      Balanced Accuracy : 0.7330          
                                          
       'Positive' Class : <=50K           
                                          

e. Perform QDA on the training data in order to predict income using the variables that seemed most associated with income in (b). What is the test error of the model obtained?

##Train Model
qda.fit=train(income~age+`marital_statusMarried-civ-spouse`+capital_gain+education_num+hours_per_week+relationshipHusband+capital_loss+`occupationExec-managerial`+workclassPrivate,data=train,method='qda',trControl = trainControl(method = "cv"))
qda.fit
Quadratic Discriminant Analysis 

24131 samples
    9 predictor
    2 classes: '<=50K', '>50K' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 21717, 21718, 21719, 21718, 21717, 21718, ... 
Resampling results:

  Accuracy   Kappa    
  0.7985565  0.3967202
##Calculate Predictions
pred.qda<-predict(qda.fit,test)
##Estimate Accuracy
confusionMatrix(pred.qda,test$income)
Confusion Matrix and Statistics

          Reference
Prediction <=50K >50K
     <=50K  4142  858
     >50K    388  643
                                         
               Accuracy : 0.7934         
                 95% CI : (0.783, 0.8036)
    No Information Rate : 0.7511         
    P-Value [Acc > NIR] : 5.204e-15      
                                         
                  Kappa : 0.3828         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.9143         
            Specificity : 0.4284         
         Pos Pred Value : 0.8284         
         Neg Pred Value : 0.6237         
             Prevalence : 0.7511         
         Detection Rate : 0.6868         
   Detection Prevalence : 0.8290         
      Balanced Accuracy : 0.6714         
                                         
       'Positive' Class : <=50K          
                                         

f. Perform logistic regression on the training data in order to predict income using the variables that seemed most associated with income in (b). What is the test error of the model obtained?

##Train Model
glm.fit=train(income~age+`marital_statusMarried-civ-spouse`+capital_gain+education_num+hours_per_week+relationshipHusband+capital_loss+`occupationExec-managerial`+workclassPrivate,data=train,method='glm',trControl = trainControl(method = "cv"))
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
glm.fit
Generalized Linear Model 

24131 samples
    9 predictor
    2 classes: '<=50K', '>50K' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 21719, 21719, 21718, 21718, 21717, 21717, ... 
Resampling results:

  Accuracy   Kappa    
  0.8431492  0.5491027
##Calculate Predictions
pred.glm<-predict(glm.fit,test)
##Estimate Accuracy
confusionMatrix(pred.glm,test$income)
Confusion Matrix and Statistics

          Reference
Prediction <=50K >50K
     <=50K  4201  667
     >50K    329  834
                                          
               Accuracy : 0.8349          
                 95% CI : (0.8252, 0.8441)
    No Information Rate : 0.7511          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5223          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.9274          
            Specificity : 0.5556          
         Pos Pred Value : 0.8630          
         Neg Pred Value : 0.7171          
             Prevalence : 0.7511          
         Detection Rate : 0.6966          
   Detection Prevalence : 0.8072          
      Balanced Accuracy : 0.7415          
                                          
       'Positive' Class : <=50K           
                                          

g. Perform KNN on the training data, with several values of K, in order to predict income. Use only the variables that seemed most associated with income in (b). What test errors do you obtain? Which value of K seems to perform the best on this data set?

##Train Model, Let CV choose value for K
knn.fit<-train(income~age+`marital_statusMarried-civ-spouse`+capital_gain+education_num+hours_per_week+relationshipHusband+capital_loss+`occupationExec-managerial`+workclassPrivate,data=train,method='knn',trControl = trainControl(method = "cv"), tuneLength=20)
knn.fit
k-Nearest Neighbors 

24131 samples
    9 predictor
    2 classes: '<=50K', '>50K' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 21717, 21718, 21718, 21718, 21717, 21718, ... 
Resampling results across tuning parameters:

  k   Accuracy   Kappa    
   5  0.8449293  0.5537778
   7  0.8478300  0.5586043
   9  0.8456334  0.5502511
  11  0.8462139  0.5497724
  13  0.8451778  0.5459386
  15  0.8450536  0.5436919
  17  0.8448462  0.5428098
  19  0.8438933  0.5382269
  21  0.8437273  0.5380229
  23  0.8446805  0.5398063
  25  0.8431473  0.5349175
  27  0.8428985  0.5338392
  29  0.8431886  0.5335989
  31  0.8421525  0.5296729
  33  0.8404119  0.5233287
  35  0.8403705  0.5220293
  37  0.8390445  0.5177549
  39  0.8392101  0.5175455
  41  0.8387545  0.5163369
  43  0.8385058  0.5154016

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 7.
##Calculate Predictions
pred.knn<-predict(knn.fit,test)
##Estimate Accuracy
confusionMatrix(pred.knn,test$income)
Confusion Matrix and Statistics

          Reference
Prediction <=50K >50K
     <=50K  4270  644
     >50K    260  857
                                         
               Accuracy : 0.8501         
                 95% CI : (0.8408, 0.859)
    No Information Rate : 0.7511         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.5616         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.9426         
            Specificity : 0.5710         
         Pos Pred Value : 0.8689         
         Neg Pred Value : 0.7672         
             Prevalence : 0.7511         
         Detection Rate : 0.7080         
   Detection Prevalence : 0.8148         
      Balanced Accuracy : 0.7568         
                                         
       'Positive' Class : <=50K          
                                         

h. Choose which model predicts income the best and justify your choice.

Model Accuracy
Linear Discriminant 0.7985565
Quadratic Discriminant 0.8317514
K Nearest Neighbors (K=7) 0.8385058
Logistic Regression 0.8431492

It appears KNN is the optimal way to run this classification problem, as it is the most accurate method given the results, with QDA giving the lowest amount of accuracy. While this is sensitive to other data, so are the other models at a similar percentage, making KNN the best choice.

LS0tDQp0aXRsZTogIm1pZHRlcm0iDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19mbG9hdDogdHJ1ZQ0KLS0tDQoNCiMjIyBQYXJ0IDE6IERlZmluaXRpb24gKDIwIFBvaW50cykNCmEuIFRoZSBUcmFkZS1PZmYgQmV0d2VlbiBQcmVkaWN0aW9uIEFjY3VyYWN5IGFuZCBNb2RlbCBJbnRlcnByZXRhYmlsaXR5DQpXaGVuZXZlciBidWlsZGluZyBhIG1vZGVsLCBnZW5lcmFsbHkgbW9yZSBjb21wbGV4aXR5IGFsbG93cyBmb3IgYSBtb3JlIGFjY3VyYXRlIGFuZCBjb21wcmVoZW5zaXZlIGFuYWx5c2lzIHdoZXRoZXIgaXQgaXMgY2xhc3NpZmljYXRpb24gb3IgcmVncmVzc2lvbi4gSG93ZXZlciwgdGhpcyB0eXBpY2FsbHkgbWVhbnMgdGhlIG1vZGVsIGlzIGdlbmVyYWxseSBsZXNzIGludGVycHJldGFibGUuIFR5cGljYWxseSwgd2Ugd2FudCBhIG1vZGVsIHRoYXQgaXMgbW9yZSBhY2N1cmF0ZSB0aGFuIGludGVycHJldGFibGUsIGJ1dCBpdCBpcyBpbXBvcnRhbnQgdG8gbWFrZSBzdXJlIHlvdSBjYW4gZXhwbGFpbiB5b3VyIG1vZGVsLCBlc3BlY2lhbGx5IGluIHRoZSBpbnN0YW5jZSBvZiBtZWRpY2FsIHdvcmtzDQoNCmIuIFN1cGVydmlzZWQgVmVyc3VzIFVuc3VwZXJ2aXNlZCBMZWFybmluZ1wNClN1cGVydmlzZWQgbGVhcm5pbmcgaXMgd2hlbiB3ZSBoYXZlIGEgY2xlYXIgaW5wdXQgYW5kIG91dHB1dCB3aXRoIG91ciB2YXJpYWJsZXMuIFRoZSBtb2RlbCBsZWFybnMgb2ZmIG9mIHByZXZpb3VzIGluc3RhbmNlcyBvZiBkYXRhLCBhbmQgdGhpcyBtZXRob2QgaXMgYmV0dGVyIGZvciBwcmVkaWN0aW9uIHJhdGhlciB0aGFuIGNsYXNzaWZpY2F0aW9uLiBMZWFybmluZyB0YWtlcyBwbGFjZSBvbiB0aGUgY29tcHV0ZXIsIGFuZCBpcyB0eXBpY2FsbHkgZmFzdGVyIGFuZCBsZXNzIGNvbXBsZXguIE9uIHRoZSBvdGhlciBoYW5kLCBVbnN1cGVydmlzZWQgbGVhcm5pbmcgbWVhbnMgd2Ugb25seSBoYXZlIGEgY2xlYXIgaW5wdXQsIGFuZCBhcyBzdWNoLCB0aGUgbW9kZWwgbGVhcm5zIG9mZiBvZiBuYXR1cmFsIHBhdHRlcm5zIGV4aGliaXRlZCBpbiB0aGUgZGF0YS4gVGhpcyBjYW4gaGVscCBmaW5kIHBhdHRlcm5zIGluIHRoZSBkYXRhLCBhbmQgYXMgc3VjaCwgd29ya3MgcmVhbGx5IHdlbGwgZm9yIGNsYXNzaWZpY2F0aW9uLg0KDQpjLiBUaGUgQmlhcy1WYXJpYW5jZSBUcmFkZS1PZmYNCkJpYXMgYW5kIHZhcmlhbmNlIGhhdmUgYW4gaW52ZXJzZSBjb3JyZWxhdGlvbiwgbWVhbmluZyBhcyBvbmUgaW5jcmVhc2VzIHRoZSBvdGhlciBpbmNyZWFzZXMuIFRoaXMgaXMgaW1wb3J0YW50IHRvIGNvbnNpZGVyIGluIHlvdXIgbW9kZWxzLCBhcyBtb3JlIGZsZXhpYmxlIG1vZGVscyByZXN1bHQgaW4gbGVzcyB2YXJpYW5jZSBidXQgbW9yZSBiaWFzLCBhbmQgbGVzcyBmbGV4aWJsZSBtb2RlbHMgcmVzdWx0IGluIGxlc3MgYmlhcyBidXQgbW9yZSB2YXJpYW5jZS4gVGhpcyBpcyB3aHkgd2UgYXNzZXNzIG91ciBtb2RlbHMgYnkgUk1TRSwgYXMgd2UgY2FuIGdlbmVyYWxseSBtYWtlIGdvb2QganVkZ2VtZW50IG9uIHdoaWNoIG1vZGVsIG1heSBiZSBtb3JlIHByb3BlciBmb3Igb3VyIHNpdHVhdGlvbiBvZmYgb2YgaXQuDQoNCmQuIExpbmVhciBSZWdyZXNzaW9uIHZlcnN1cyBLLU5lYXJlc3QgTmVpZ2hib3JzDQpMaW5lYXIgcmVncmVzc2lvbiBpcyBhIGZvcm0gb2Ygc3VwZXJ2aXNlZCBsZWFybmluZy4gTGluZWFyIHJlZ3Jlc3Npb24gd29ya3MgdmVyeSB3ZWxsIHdpdGggbGluZWFyIGRhdGEsIGFuZCBpcyB2ZXJ5IGVhc3kgdG8gaW1wbGVtZW50IGluIGNvbXBhcmlzb24gdG8gb3RoZXIgbWV0aG9kcy4gTGluZWFyIHJlZ3Jlc3Npb24gaG93ZXZlciwgaXMgdHlwaWNhbGx5IG5vdCB2ZXJ5IGdvb2Qgd2l0aCBub24tbGluZWFyIGRhdGEsIGFuZCB0aHVzIGlzIG5vdCBjb21wbGV0ZWx5IGFwcGxpY2FibGUgdG8gZXZlcnkgc2l0dWF0aW9uLiBLTk4gaXMgYWxzbyBzdXBlcnZpc2VkLCBidXQgb2ZmZXJzIGFuIGV4dHJlbWUgYW1vdW50IG9mIGZsZXhpYmlsaXR5LiBUaGUgbW9kZWwgY2FuIGJlIGNvbXB1dGF0aW9uYWxseSBleHBlbnNpdmUgaWYgayBpcyBoaWdoIGVub3VnaCwgYW5kIGlzIGdlbmVyYWxseSBwcm9uZSB0byBvdmVyZml0dGluZy4NCg0KZS4gTG9naXN0aWMgUmVncmVzc2lvbiB2ZXJzdXMgTERBIHZlcnN1cyBRREEgDQpMb2dpc3RpYyByZWdyZXNzaW9uIHR5cGljYWxseSB3b3JrcyByZWFsbHkgZm9yIGJpbmFyeSBjbGFzc2lmaWNhdGlvbiBpLmUgc3BhbS9ub3Qgc3BhbSBhbmQgaGFzIGEgbGluZWFyIGJvdW5kYXJ5LCBhcyBkb2VzIExEQSwgYnV0IHRoZXkgZGlmZmVyIGluIHRoZSB3YXkgdGhhdCB0aGUgYXNzdW1wdGlvbnMgZm9yIExEQSBhcmUgbW9yZSBzdHJpY3QsIGFuZCB0aHVzLCB0eXBpY2FsbHkgcmVzdWx0IGluIGhpZ2hlciBhY2N1cmFjeSBmb3IgTERBLiBRREEgaGFzIGEgcXVhZHJhdGljIGJvdW5kYXJ5LiBCZWNhdXNlIGl0cyBkZWNpc2lvbiBib3VkbmFyeSBpcyBub3QgbGluZWFyLCBpdCBpcyB0eXBpY2FsbHkgbW9yZSBmbGV4aWJsZSB0aGFuIGVpdGhlciBMREEgb3IgTFIuIEl0IGlzIGVzc2VudGlhbGx5IGEgY29tcHJvbWlzZSBiZXR3ZWVuIExEQSBhbmQgS05ODQoNCmBgYHtyfQ0KbGlicmFyeSgndGlkeXZlcnNlJykNCmxpYnJhcnkoJ2NhcmV0JykNCmxpYnJhcnkoJ21vZGVscicpDQpsaWJyYXJ5KCdyZWFkcicpDQpzZXQuc2VlZCgxMDAwMykNCmBgYA0KIyMjIFBhcnQgMjogUmVncmVzc2lvbiAoNDAgUG9pbnRzKQ0KVGhlIHRhYmxlIGJlbG93IGRpc3BsYXlzIGNhdGFsb2ctc3BlbmRpbmcgZGF0YSBmb3IgdGhlIGZpcnN0IGZldyBvZiAyMDAgcmFuZG9tbHkgc2VsZWN0ZWQgaW5kaXZpZHVhbHMgZnJvbSBhIHZlcnkgbGFyZ2UgKG92ZXIgMjAsMDAwIGhvdXNlaG9sZHMpIGRhdGEgYmFzZS4xIFRoZSB2YXJpYWJsZSBvZiBwYXJ0aWN1bGFyIGludGVyZXN0IGlzIGNhdGFsb2cgc3BlbmRpbmcgYXMgbWVhc3VyZWQgYnkgdGhlIFNwZW5kaW5nIFJhdGlvIChTcGVuZFJhdCkuIEFsbCBvZiB0aGUgY2F0YWxvZyB2YXJpYWJsZXMgYXJlIHJlcHJlc2VudGVkIGJ5IGluZGljYXRvciB2YXJpYWJsZXM7IGVpdGhlciB0aGUgY29uc3VtZXIgYm91Z2h0IGFuZCB0aGUgdmFyaWFibGUgaXMgY29kZWQgYXMgMSBvciB0aGUgY29uc3VtZXIgZGlkbuKAmXQgYnV5IGFuZCB0aGUgdmFyaWFibGUgaXMgY29kZWQgYXMgMC4gVGhlIG90aGVyIHZhcmlhYmxlcyBjYW4gYmUgdmlld2VkIGFzIGluZGV4ZXMgZm9yIG1lYXN1cmluZyBhc3NldHMsIGxpcXVpZGl0eSwgYW5kIHNwZW5kaW5nLg0KDQpXZSByZWNvZGUgYWxsIG9mIG91ciBiaW5hcnkgdmFyaWFibGVzIGludG8gZmFjdG9ycyBzbyB0aGF0IHdlIG1heSB1c2UgaXQgYXMgb3JkaW5hbCBkYXRhLg0KYGBge3J9DQpjYXRhbG9nPC1yZWFkX2NzdignQzpcXFVzZXJzXFxkaWVnb1xcRGVza3RvcFxcRGF0YSBNaW5pbmdcXGFzc2lnbm1lbnRzXFxtaWR0ZXJtXFxjYXRhbG9nLmNzdicpDQpjYXRhbG9nJENvbGxHaWZ0czwtYXNfZmFjdG9yKGNhdGFsb2ckQ29sbEdpZnRzKQ0KY2F0YWxvZyRCcmljTW9ydGFyPC1hc19mYWN0b3IoY2F0YWxvZyRCcmljTW9ydGFyKQ0KY2F0YWxvZyRNYXJ0aGFIb21lPC1hc19mYWN0b3IoY2F0YWxvZyRNYXJ0aGFIb21lKQ0KY2F0YWxvZyRTdW5BZHM8LWFzX2ZhY3RvcihjYXRhbG9nJFN1bkFkcykNCmNhdGFsb2ckVGhlbWVDb2xsPC1hc19mYWN0b3IoY2F0YWxvZyRUaGVtZUNvbGwpDQpjYXRhbG9nJEN1c3REZWM8LWFzX2ZhY3RvcihjYXRhbG9nJEN1c3REZWMpDQpjYXRhbG9nJFJldGFpbEtpZHM8LWFzX2ZhY3RvcihjYXRhbG9nJFJldGFpbEtpZHMpDQpjYXRhbG9nJFRlZW5XcjwtYXNfZmFjdG9yKGNhdGFsb2ckVGVlbldyKQ0KY2F0YWxvZyRDYXJsb3ZlcnM8LWFzX2ZhY3RvcihjYXRhbG9nJENhcmxvdmVycykNCmNhdGFsb2ckQ291bnRyeUNvbGw8LWFzX2ZhY3RvcihjYXRhbG9nJENvdW50cnlDb2xsKQ0Kc3RyKGNhdGFsb2cpDQpgYGANCg0KIyMjIyBEYXRhIENsZWFuaW5nDQpUaGUgZ29hbCBvZiB0aGlzIHNlY3Rpb24gaXMgdG8gZXhwbG9yZSB0aGUgZGF0YSBzZXQgYW5kIGdldCBpdCByZWFkeSBmb3IgYW5hbHlzaXMuIFRoZXJlIGFyZSBubyBtaXNzaW5nIHZhbHVlcyBpbiB0aGUgZGF0YSBzZXQsIGJ1dCB0aGVyZSBhcmUgc29tZSBpbmNvcnJlY3QgZW50cmllcyB0aGF0IG11c3QgYmUgaWRlbnRpZmllZCBhbmQgcmVtb3ZlZCBiZWZvcmUgY29tcGxldGluZyB0aGUgYW5hbHlzaXMuIEluY29tZSBpcyBjb2RlZCBhcyBhbiBvcmRpbmFsIHZhbHVlLCByYW5naW5nIGZyb20gMSB0byAxMi4gQWdlIGNhbiBiZSByZWdhcmRlZCBhcyBxdWFudGl0YXRpdmUsIGFuZCBhbnkgdmFsdWUgbGVzcyB0aGFuIDE4IGlzIGludmFsaWQuIExlbmd0aCBvZiByZXNpZGVuY2UgKExlblJlcykgaXMgYSB2YWx1ZSByYW5naW5nIGZyb20gemVybyB0byBzb21lb25l4oCZcyBhZ2UuIExlblJlcyBzaG91bGQgbm90IGJlIGhpZ2hlciB0aGFuIEFnZS4gWW91IHNob3VsZCBjcmVhdGUgYSBzaW1wbGUgMS0yIHBhcmFncmFwaCBzdW1tYXJ5IG9mIHRoaXMgc2VjdGlvbi4gQmUgc3VyZSB0byBmdWxseSBleHBsYWluIHRoZSByZWFzb25pbmcgYmVoaW5kIHRyYW5zZm9ybWluZyBhbnkgY29sdW1ucyBhbmQgcmVtb3ZpbmcgYW55IHJvd3MuIENhbXBiZWxsIHRvbGQgbWUgdG8gaXMgbm90IHN1ZmZpY2llbnQuIEp1c3RpZnkgd2h5IGl0IG1ha2VzIHNlbnNlIG5vdCB0byBpbmNsdWRlIGFueSByb3dzIHdob3NlIGFnZSBpcyBsZXNzIHRoYW4gMTggb3Igd2h5IHdlIHNob3VsZG7igJl0IHVzZSByb3dzIGluIHdoaWNoIGxlbmd0aCBvZiByZXNpZGVuY2UgaXMgbGFyZ2VyIHRoYW4gYWdlLg0KDQpIZXJlLCB3ZSBmaWx0ZXIgb3VyIGRhdGFzZXQgdG8gaGF2ZSByb3dzIGluIHdoaWNoIGRhdGEgZm9sbG93IHRoZSBmb2xsb3dpbmcgc3BlY2lmaWNhdGlvbnM6IGFnZXMgYXJlIGdyZWF0ZXIgdGhhbiAxOCwgYW5kIExlblJlcyBpcyBsZXNzIHRoYW4gYWdlLiBhZ2UgaGFzIHRvIGJlIGdyZWF0ZXIgdGhhbiAxOCBhcyBpdCBpcyBwcmV0dHkgaW1wb3NzaWJsZSB0byBvd24gYSBob3VzZSB1bmRlciAxOCB1bmRlciBub3JtYWwgY2lyY3Vtc3RhbmNlcywgYW5kIExlblJlcyBoYXMgdGhlIHNwZWNpZmljYXRpb24gYmVjYXVzZSB5b3UgY2FuJ3Qgb3duIGEgaG91c2UgbG9uZ2VyIHRoYW4geW91J3ZlIGV4aXN0ZWQgZm9yIHVubGVzcyB5b3UncmUgYSB0cnVzdCBmdW5kIGJhYnkuIFRoaXMgY291bGQgYmUgYWNjb3VudGVkIGZvciBpbiB0aGUgZGF0YXNldCBidXQgaXQgaGFzIGJlZW4gZGVjaWRlZCBpdCdzIG5vdCBwZXJ0aW5lbnQuDQpgYGB7cn0NCmNhdC5jbGVhbjwtZmlsdGVyKGNhdGFsb2csQWdlPj0xOCxMZW5SZXM8PUFnZSkNCmBgYA0KDQojIyMjIEJhc2ljIFN1bW1hcnkNClByb3ZpZGUgYSBiYXNpYyBzdW1tYXJ5IG9mIHRoZSBjbGVhbmVkIGRhdGEgc2V0LiBJbmNsdWRlIGEgdGFibGUgb2YgdW5pdmFyaWF0ZSBzdGF0aXN0aWNzIHRvIHN1bW1hcml6ZSBlYWNoIHZhcmlhYmxlLiBDaG9vc2UgbWVhbmluZ2Z1bCBzdW1tYXJ5IHN0YXRpc3RpY3MgZm9yIGVhY2ggdHlwZSBvZiB2YXJpYWJsZS4gWW91IHNob3VsZCBhbHNvIGluY2x1ZGUgYSBiYXNpYyBzdW1tYXJ5IG9mIHRoZSBjYXRhbG9nIHNwZW5kaW5nIChTcGVuZFJhdCkgaW5jbHVkaW5nIGFuIGFwcHJvcHJpYXRlIGdyYXBoaWNhbCBkaXNwbGF5Lg0KDQpgYGB7cn0NCmRpbShjYXQuY2xlYW4pDQpzdW1tYXJ5KGNhdC5jbGVhbikNCmNhdC5jbGVhbiAlPiUNCiAgZ2dwbG90KGFlcyh4PWxvZyhTcGVuZFJhdCkpKSArIGdlb21faGlzdG9ncmFtKCkNCmBgYA0KDQojIyMjIE1vZGVsaW5nDQpXZSBhcmUgaW50ZXJlc3RlZCBpbiBkZXZlbG9waW5nIGEgbW9kZWwgdG8gcHJlZGljdCBzcGVuZGluZyByYXRpby4gRmluZCBhIG11bHRpcGxlIHJlZ3Jlc3Npb24gbW9kZWwgZm9yIHByZWRpY3RpbmcgdGhlIGFtb3VudCBvZiBtb25leSB0aGF0IGNvbnN1bWVycyB3aWxsIHNwZW5kIG9uIGNhdGFsb2cgc2hvcHBpbmcsIGFzIG1lYXN1cmVkIGJ5IHNwZW5kaW5nIHJhdGlvLiBZb3VyIGdvYWwgaXMgdG8gaWRlbnRpZnkgdGhlIGJlc3QgbW9kZWwgeW91IGNhbi4gSW4geW91ciB3cml0ZS11cCBiZSBzdXJlIHRvIGp1c3RpZnkgeW91ciBjaG9pY2Ugb2YgbW9kZWwsIGRpc2N1c3MgYW55IHRyYW5zZm9ybWF0aW9uIHlvdSBtYWtlIHRvIHRoZSB2YXJpYWJsZXMsIGRpc2N1c3MgeW91ciBtb2RlbCBmaXQsIGFuZCBkaXNjdXNzIHRoZSBlZmZlY3Qgb2YgdGhlIHNpZ25pZmljYW50IHByZWRpY3RvcnMgdXNpbmcgYm90aCBoeXBvdGhlc2lzIHRlc3RzIGFuZCBjb25maWRlbmNlIGludGVydmFscy4gUmVtZW1iZXIgdG8gY2hlY2sgdGhlIGNvbmRpdGlvbnMgZm9yIGluZmVyZW5jZSBhcyB5b3UgZXZhbHVhdGUgeW91ciBtb2RlbHMuIFRoZSBkYXRhIHNldCBpcyBtdWNoIHRvbyBzbWFsbCB0byBzcGxpdCBpbnRvIHRyYWluaW5nIGFuZCB0ZXN0IGRhdGEgc2V0cywgc28gdXNlIGNyb3NzIHZhbGlkYXRpb24gaW4gYWxsIHlvdXIgbW9kZWxzDQoNCmBgYHtyfQ0KIyMgU2V0IHVwIFJlcGVhdGVkIGstZm9sZCBDcm9zcyBWYWxpZGF0aW9uDQp0cmFpbl9jb250cm9sIDwtIHRyYWluQ29udHJvbChtZXRob2Q9InJlcGVhdGVkY3YiLCBudW1iZXI9MTAsIHJlcGVhdHM9MykNCmBgYA0KDQoqKmEuIEZpdCBhIGxpbmVhciBtb2RlbCB1c2luZyBsZWFzdCBzcXVhcmVzIG9uIHRoZSB0cmFpbmluZyBzZXQsIGFuZCByZXBvcnQgdGhlIENWIGVycm9yIG9idGFpbmVkLioqDQoNCmBgYHtyfQ0KbG0uZml0PC10cmFpbihsb2coU3BlbmRSYXQpfi4sIGRhdGE9Y2F0LmNsZWFuLCB0ckNvbnRyb2w9dHJhaW5fY29udHJvbCxtZXRob2Q9J2xtJykNCnByaW50KGxtLmZpdCkNCmBgYA0KDQoqKmIuIEZpdCBhIHJpZGdlIHJlZ3Jlc3Npb24gbW9kZWwgb24gdGhlIHRyYWluaW5nIHNldCwgd2l0aCAkXGxhbWJkYSQgY2hvc2VuIGJ5IGNyb3NzLXZhbGlkYXRpb24uIFJlcG9ydCB0aGUgQ1YgZXJyb3Igb2J0YWluZWQuKioNCmBgYHtyfQ0KeV90cmFpbj1sb2coY2F0LmNsZWFuJFNwZW5kUmF0KQ0KWF90cmFpbj1tb2RlbF9tYXRyaXgoY2F0LmNsZWFuLGxvZyhTcGVuZFJhdCl+QWdlK0xlblJlcytJbmNvbWUrVG90QXNzZXQrU2VjQXNzZXRzK1Nob3J0TGlxK0xvbmdMaXErV2x0aElkeCtTcGVuZFZvbCtTcGVuVmVsK0NvbGxHaWZ0cytCcmljTW9ydGFyK01hcnRoYUhvbWUrU3VuQWRzK1RoZW1lQ29sbCtDdXN0RGVjK1JldGFpbEtpZHMrVGVlbldyK0NhcmxvdmVycytDb3VudHJ5Q29sbCkNCnBhcmFtZXRlcnMgPC0gYyhzZXEoMC4xLCAyLCBieSA9MC4xKSAsICBzZXEoMiwgNSwgMC41KSAsIHNlcSg1LCAyNSwgMSkpDQpyaWRnZS5maXQ8LXRyYWluKHk9eV90cmFpbix4PVhfdHJhaW4sbWV0aG9kPSdnbG1uZXQnLHRyQ29udHJvbD10cmFpbl9jb250cm9sLHR1bmVHcmlkPWV4cGFuZC5ncmlkKGFscGhhPTAsbGFtYmRhID0gcGFyYW1ldGVycykpDQpwcmludChyaWRnZS5maXQpDQpgYGANCioqYy4gRml0IGEgbGFzc28gbW9kZWwgb24gdGhlIHRyYWluaW5nIHNldCwgd2l0aCAkXGxhbWJkYSQgY2hvc2VuIGJ5IGNyb3NzLXZhbGlkYXRpb24uIFJlcG9ydCB0aGUgQ1YgZXJyb3Igb2J0YWluZWQsIGFsb25nIHdpdGggdGhlIG51bWJlciBvZiBub24temVybyBjb2VmZmljaWVudCBlc3RpbWF0ZXMuKioNCg0KYGBge3J9DQp5X3RyYWluPWxvZyhjYXQuY2xlYW4kU3BlbmRSYXQpDQpYX3RyYWluPW1vZGVsX21hdHJpeChjYXQuY2xlYW4sbG9nKFNwZW5kUmF0KX5BZ2UrTGVuUmVzK0luY29tZStUb3RBc3NldCtTZWNBc3NldHMrU2hvcnRMaXErTG9uZ0xpcStXbHRoSWR4K1NwZW5kVm9sK1NwZW5WZWwrQ29sbEdpZnRzK0JyaWNNb3J0YXIrTWFydGhhSG9tZStTdW5BZHMrVGhlbWVDb2xsK0N1c3REZWMrUmV0YWlsS2lkcytUZWVuV3IrQ2FybG92ZXJzK0NvdW50cnlDb2xsKQ0KcGFyYW1ldGVycyA8LSBjKHNlcSgwLjEsIDIsIGJ5ID0wLjEpICwgIHNlcSgyLCA1LCAwLjUpICwgc2VxKDUsIDI1LCAxKSkNCmxhc3NvLmZpdDwtdHJhaW4oeT15X3RyYWluLHg9WF90cmFpbixtZXRob2Q9J2dsbW5ldCcsdHJDb250cm9sPXRyYWluX2NvbnRyb2wsdHVuZUdyaWQ9ZXhwYW5kLmdyaWQoYWxwaGE9MSxsYW1iZGEgPSBwYXJhbWV0ZXJzKSkNCnByaW50KGxhc3NvLmZpdCkNCmBgYA0KDQoqKmQuIEZpdCBhIFBDUiBtb2RlbCBvbiB0aGUgdHJhaW5pbmcgc2V0LCB3aXRoIE0gY2hvc2VuIGJ5IGNyb3NzLXZhbGlkYXRpb24uIFJlcG9ydCB0aGUgQ1YgZXJyb3Igb2J0YWluZWQsIGFsb25nIHdpdGggdGhlIHZhbHVlIG9mIE0gc2VsZWN0ZWQgYnkgY3Jvc3MtdmFsaWRhdGlvbi4qKg0KDQpgYGB7cn0NCnBjci5maXQ8LXRyYWluKGxvZyhTcGVuZFJhdCl+LiwgZGF0YT1jYXQuY2xlYW4sIHRyQ29udHJvbD10cmFpbl9jb250cm9sLHR1bmVMZW5ndGg9bmNvbChjYXQuY2xlYW4pLG1ldGhvZD0ncGNyJykNCnBsb3QocGNyLmZpdCkNCnBjci5maXQkYmVzdFR1bmUNCnByaW50KHBjci5maXQpDQpgYGANCg0KKiplLiBGaXQgYSBQTFMgbW9kZWwgb24gdGhlIHRyYWluaW5nIHNldCwgd2l0aCBNIGNob3NlbiBieSBjcm9zcy12YWxpZGF0aW9uLiBSZXBvcnQgdGhlIENWIGVycm9yIG9idGFpbmVkLCBhbG9uZyB3aXRoIHRoZSB2YWx1ZSBvZiBNIHNlbGVjdGVkIGJ5IGNyb3NzLXZhbGlkYXRpb24uKioNCg0KYGBge3J9DQpwbHMuZml0PC10cmFpbihsb2coU3BlbmRSYXQpfi4sIGRhdGE9Y2F0LmNsZWFuLCB0ckNvbnRyb2w9dHJhaW5fY29udHJvbCx0dW5lTGVuZ3RoPW5jb2woY2F0LmNsZWFuKSxtZXRob2Q9J3BscycpDQpwbG90KHBscy5maXQpDQpwbHMuZml0JGJlc3RUdW5lDQpwcmludChwbHMuZml0KQ0KYGBgDQoNCioqZi4gQ29tbWVudCBvbiB0aGUgcmVzdWx0cyBvYnRhaW5lZC4gSG93IGFjY3VyYXRlbHkgY2FuIHdlIHByZWRpY3QgdGhlIFNwZW5kaW5nIFJhdGlvPyBJcyB0aGVyZSBtdWNoIGRpZmZlcmVuY2UgYW1vbmcgdGhlIENWIGVycm9ycyByZXN1bHRpbmcgZnJvbSB0aGVzZSBmaXZlIGFwcHJvYWNoZXM/KioNCg0KfCBNb2RlbCAgICAgICAgICAgICAgICAgICAgICAgICAgICB8IFJNU0UgICAgIHwNCnwtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tfC0tLS0tLS0tLS18DQp8IExhc3NvIFJlZ3Jlc3Npb24gICAgICAgICAgICAgICAgIHwgYHIgbWluKGxhc3NvLmZpdCRyZXN1bHRzJFJNU0UpYCB8DQp8IFJpZGdlIFJlZ3Jlc3Npb24gICAgICAgICAgICAgICAgIHwgYHIgbWluKHJpZGdlLmZpdCRyZXN1bHRzJFJNU0UpYCB8DQp8IFByaW5jaXBsZSBDb21wb25lbnRzIFJlZ3Jlc3Npb24gIHwgYHIgbWluKHBjci5maXQkcmVzdWx0cyRSTVNFKWAgfA0KfCBQYXJ0aWFsIExlYXN0IFNxdWFyZXMgUmVncmVzc2lvbiB8IGByIG1pbihwbHMuZml0JHJlc3VsdHMkUk1TRSlgIHwNCnwgTGluZWFyIFJlZ3Jlc3Npb24gICAgICAgICAgICAgICAgfCBgciBsbS5maXQkcmVzdWx0cyRSTVNFYCB8DQoNCg0KQmVzdCBtb2RlbCB3YXMgUmlkZ2UgUmVncmVzc2lvbiB3aXRoICRcbGFtYmRhJCA9IDAuMSB3aXRoIGEgUk1TRSBvZiBgciBtaW4ocmlkZ2UuZml0JHJlc3VsdHMkUk1TRSlgLiBUaGUgd29yc3QgcGVyZm9ybWluZyBtb2RlbCB3YXMgdGhlIFByaW5jaXBhbCBjb21wb25lbnRzIG1vZGVsIHdpdGggYSBSTVNFIG9mIGByIHBjci5maXQkcmVzdWx0cyRSTVNFYC4gVGhlcmUncyBub3QgYSBzdWJzdGFudGlhbCBhbW91bnQgb2YgZGlmZmVyZW5jZSBvZiBDViBSTVNFIGFjcm9zcyB0aGUgZml2ZSBtb2RlbHMuIA0KDQojIyMgUGFydCAzOiBDbGFzc2lmaWNhdGlvbiAoNDAgUG9pbnRzKQ0KSW4gdGhpcyBwcm9ibGVtLCB5b3Ugd2lsbCBkZXZlbG9wIGEgbW9kZWwgdG8gcHJlZGljdCB3aGV0aGVyIGluY29tZSBleGNlZWRzICQ1MEsveXIgYmFzZWQgb24gY2Vuc3VzIGRhdGEuDQoNCioqYS4gVXNlIHRoZSBjb2RlIGluIEJsYWNrYm9hcmQgdG8gY3JlYXRlIHRoZSBhZHVsdCBkYXRhIHNldC4qKg0KDQpgYGB7cn0NCmFkdWx0PC1yZWFkX2NzdigiaHR0cHM6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL21hY2hpbmUtbGVhcm5pbmctZGF0YWJhc2VzL2FkdWx0L2FkdWx0LmRhdGEiLCBjb2xfbmFtZXM9RkFMU0UsIG5hPSc/JykNCm5hbWVzKGFkdWx0KTwtYygiYWdlIiwid29ya2NsYXNzIiwiZm5sd2d0IiwiZWR1Y2F0aW9uIiwiZWR1Y2F0aW9uX251bSIsIm1hcml0YWxfc3RhdHVzIiwib2NjdXBhdGlvbiIsInJlbGF0aW9uc2hpcCIsInJhY2UiLCJzZXgiLCJjYXBpdGFsX2dhaW4iLCJjYXBpdGFsX2xvc3MiLCJob3Vyc19wZXJfd2VlayIsIm5hdGl2ZV9jb3VudHJ5IiwiaW5jb21lIikNCnNldC5zZWVkKDEzMDMpDQphZHVsdCRmbmx3Z3Q8LU5VTEwNCmFkdWx0JHdvcmtjbGFzczwtYXNfZmFjdG9yKGFkdWx0JHdvcmtjbGFzcykNCmFkdWx0JGVkdWNhdGlvbjwtYXNfZmFjdG9yKGFkdWx0JGVkdWNhdGlvbikNCmFkdWx0JG1hcml0YWxfc3RhdHVzPC1hc19mYWN0b3IoYWR1bHQkbWFyaXRhbF9zdGF0dXMpDQphZHVsdCRvY2N1cGF0aW9uPC1hc19mYWN0b3IoYWR1bHQkb2NjdXBhdGlvbikNCmFkdWx0JHJlbGF0aW9uc2hpcDwtYXNfZmFjdG9yKGFkdWx0JHJlbGF0aW9uc2hpcCkNCmFkdWx0JHJhY2U8LWFzX2ZhY3RvcihhZHVsdCRyYWNlKQ0KYWR1bHQkc2V4PC1hc19mYWN0b3IoYWR1bHQkc2V4KQ0KYWR1bHQkbmF0aXZlX2NvdW50cnk8LWFzX2ZhY3RvcihhZHVsdCRuYXRpdmVfY291bnRyeSkNCmFkdWx0JGluY29tZTwtYXNfZmFjdG9yKGFkdWx0JGluY29tZSkNCmFkdWx0PC1uYS5vbWl0KGFkdWx0KQ0KeT1hZHVsdCRpbmNvbWUNClg9bW9kZWxfbWF0cml4KGFkdWx0LGluY29tZX4uKQ0KWCRgKEludGVyY2VwdClgPC1OVUxMDQphZHVsdDI8LWFzX3RpYmJsZShjYmluZChhZHVsdCRpbmNvbWUsWCksLm5hbWVfcmVwYWlyID0gInVuaXF1ZSIpDQpuYW1lcyhhZHVsdDIpWzFdPC0iaW5jb21lIg0KYXR0YWNoKGFkdWx0MikNCnN1bW1hcnkoYWR1bHQyKQ0KYGBgDQoNCioqYi4gRXhwbG9yZSB0aGUgZGF0YSBncmFwaGljYWxseSBpbiBvcmRlciB0byBpbnZlc3RpZ2F0ZSB0aGUgYXNzb2NpYXRpb24gYmV0d2VlbiBpbmNvbWUgYW5kIHRoZSBvdGhlciBmZWF0dXJlcy4gV2hpY2ggb2YgdGhlIG90aGVyIGZlYXR1cmVzIHNlZW0gbW9zdCBsaWtlbHkgdG8gYmUgdXNlZnVsIGluIHByZWRpY3RpbmcgaW5jb21lPyBTY2F0dGVycGxvdHMgYW5kIGJveHBsb3RzIG1heSBiZSB1c2VmdWwgdG9vbHMgdG8gYW5zd2VyIHRoaXMgcXVlc3Rpb24uIERlc2NyaWJlIHlvdXIgZmluZGluZ3MuKioNCg0KVXNpbmcgW3JhbmRvbSBmb3Jlc3QgbWV0aG9kXShodHRwOi8vci1zdGF0aXN0aWNzLmNvL1ZhcmlhYmxlLVNlbGVjdGlvbi1hbmQtSW1wb3J0YW5jZS1XaXRoLVIuaHRtbCkgZm9yIGZlYXR1cmUgc2VsZWN0aW9uLg0KDQpgYGB7ciwgY2FjaGU9VFJVRX0NCiMjR2VuZXJhdGUgYSBzbWFsbCBzYW1wbGUgb2YgdGhlIGRhdGEgc2V0IHRvIGludmVzdGlnYXRlIHdoaWNoIHZhcmlhYmxlcyBhcmUgY2xvc2VseSBhc3NvY2lhdGVkIHdpdGggaW5jb21lDQprPXNhbXBsZShucm93KGFkdWx0MiksbnJvdyhhZHVsdDIpKjAuMDgpDQpsaWJyYXJ5KCdlMTA3MScpDQojI1VzZSByYW5kb20gZm9yZXN0IG1vZGVsIHRvIGNhbGN1bGF0ZSB2YXJpYWJsZSBpbXBvcnRhbmNlDQpyZj10cmFpbihpbmNvbWV+LixkYXRhPWFkdWx0MltrLF0pDQpyZkltcDwtdmFySW1wKHJmKQ0KcGxvdChyZkltcCx0b3A9MTApDQpgYGANCkl0IGFwcGVhcnMgbXkgY29tcHV0ZXIgaXMgbm90IHBvd2VyZnVsIGVub3VnaCB0byBydW4gdGhpcy4NCioqYy4gU3BsaXQgdGhlIGRhdGEgaW50byBhbiA4MCUgdHJhaW5pbmcgc2V0IGFuZCBhIDIwJSB0ZXN0IHNldC4gU2V0IHRoZSBzZWVkIGF0IDEzMDMuKioNCg0KYGBge3J9DQojI0NyZWF0ZSB0cmFpbmluZyBpbmRpY2F0b3IgdmVjdG9yDQppblRyYWluIDwtIGNyZWF0ZURhdGFQYXJ0aXRpb24oYWR1bHQyJGluY29tZSwgcD0wLjgsIGxpc3Q9RkFMU0UpDQojI1RhYnVsYXRlIHRyYWluaW5nIGFuZCB0ZXN0IGRhdGEgc2V0cw0KdHJhaW49YWR1bHQyW2luVHJhaW4sXQ0KdGVzdD1hZHVsdDJbLWluVHJhaW4sXQ0KZGltKGFkdWx0MikNCmRpbSh0cmFpbikNCmRpbSh0ZXN0KQ0KYGBgDQoNCioqZC4gUGVyZm9ybSBMREEgb24gdGhlIHRyYWluaW5nIGRhdGEgaW4gb3JkZXIgdG8gcHJlZGljdCBpbmNvbWUgdXNpbmcgdGhlIHZhcmlhYmxlcyB0aGF0IHNlZW1lZCBtb3N0IGFzc29jaWF0ZWQgd2l0aCBpbmNvbWUgaW4gKGIpLiBXaGF0IGlzIHRoZSB0ZXN0IGVycm9yIG9mIHRoZSBtb2RlbCBvYnRhaW5lZD8qKg0KDQpgYGB7cn0NCiMjVHJhaW4gTW9kZWwNCmxkYS5maXQ9dHJhaW4oaW5jb21lfmFnZStgbWFyaXRhbF9zdGF0dXNNYXJyaWVkLWNpdi1zcG91c2VgK2NhcGl0YWxfZ2FpbitlZHVjYXRpb25fbnVtK2hvdXJzX3Blcl93ZWVrK3JlbGF0aW9uc2hpcEh1c2JhbmQrY2FwaXRhbF9sb3NzK2BvY2N1cGF0aW9uRXhlYy1tYW5hZ2VyaWFsYCt3b3JrY2xhc3NQcml2YXRlLGRhdGE9dHJhaW4sbWV0aG9kPSdsZGEnLHRyQ29udHJvbCA9IHRyYWluQ29udHJvbChtZXRob2QgPSAiY3YiKSkNCmxkYS5maXQNCiMjQ2FsY3VsYXRlIFByZWRpY3Rpb25zDQpwcmVkLmxkYTwtcHJlZGljdChsZGEuZml0LHRlc3QpDQojI0VzdGltYXRlIEFjY3VyYWN5DQpjb25mdXNpb25NYXRyaXgocHJlZC5sZGEsdGVzdCRpbmNvbWUpDQpgYGANCg0KKiplLiBQZXJmb3JtIFFEQSBvbiB0aGUgdHJhaW5pbmcgZGF0YSBpbiBvcmRlciB0byBwcmVkaWN0IGluY29tZSB1c2luZyB0aGUgdmFyaWFibGVzIHRoYXQgc2VlbWVkIG1vc3QgYXNzb2NpYXRlZCB3aXRoIGluY29tZSBpbiAoYikuIFdoYXQgaXMgdGhlIHRlc3QgZXJyb3Igb2YgdGhlIG1vZGVsIG9idGFpbmVkPyoqDQpgYGB7cn0NCiMjVHJhaW4gTW9kZWwNCnFkYS5maXQ9dHJhaW4oaW5jb21lfmFnZStgbWFyaXRhbF9zdGF0dXNNYXJyaWVkLWNpdi1zcG91c2VgK2NhcGl0YWxfZ2FpbitlZHVjYXRpb25fbnVtK2hvdXJzX3Blcl93ZWVrK3JlbGF0aW9uc2hpcEh1c2JhbmQrY2FwaXRhbF9sb3NzK2BvY2N1cGF0aW9uRXhlYy1tYW5hZ2VyaWFsYCt3b3JrY2xhc3NQcml2YXRlLGRhdGE9dHJhaW4sbWV0aG9kPSdxZGEnLHRyQ29udHJvbCA9IHRyYWluQ29udHJvbChtZXRob2QgPSAiY3YiKSkNCnFkYS5maXQNCiMjQ2FsY3VsYXRlIFByZWRpY3Rpb25zDQpwcmVkLnFkYTwtcHJlZGljdChxZGEuZml0LHRlc3QpDQojI0VzdGltYXRlIEFjY3VyYWN5DQpjb25mdXNpb25NYXRyaXgocHJlZC5xZGEsdGVzdCRpbmNvbWUpDQpgYGANCioqZi4gUGVyZm9ybSBsb2dpc3RpYyByZWdyZXNzaW9uIG9uIHRoZSB0cmFpbmluZyBkYXRhIGluIG9yZGVyIHRvIHByZWRpY3QgaW5jb21lIHVzaW5nIHRoZSB2YXJpYWJsZXMgdGhhdCBzZWVtZWQgbW9zdCBhc3NvY2lhdGVkIHdpdGggaW5jb21lIGluIChiKS4gV2hhdCBpcyB0aGUgdGVzdCBlcnJvciBvZiB0aGUgbW9kZWwgb2J0YWluZWQ/KioNCg0KYGBge3J9DQojI1RyYWluIE1vZGVsDQpnbG0uZml0PXRyYWluKGluY29tZX5hZ2UrYG1hcml0YWxfc3RhdHVzTWFycmllZC1jaXYtc3BvdXNlYCtjYXBpdGFsX2dhaW4rZWR1Y2F0aW9uX251bStob3Vyc19wZXJfd2VlaytyZWxhdGlvbnNoaXBIdXNiYW5kK2NhcGl0YWxfbG9zcytgb2NjdXBhdGlvbkV4ZWMtbWFuYWdlcmlhbGArd29ya2NsYXNzUHJpdmF0ZSxkYXRhPXRyYWluLG1ldGhvZD0nZ2xtJyx0ckNvbnRyb2wgPSB0cmFpbkNvbnRyb2wobWV0aG9kID0gImN2IikpDQpnbG0uZml0DQojI0NhbGN1bGF0ZSBQcmVkaWN0aW9ucw0KcHJlZC5nbG08LXByZWRpY3QoZ2xtLmZpdCx0ZXN0KQ0KIyNFc3RpbWF0ZSBBY2N1cmFjeQ0KY29uZnVzaW9uTWF0cml4KHByZWQuZ2xtLHRlc3QkaW5jb21lKQ0KYGBgDQoNCioqZy4gUGVyZm9ybSBLTk4gb24gdGhlIHRyYWluaW5nIGRhdGEsIHdpdGggc2V2ZXJhbCB2YWx1ZXMgb2YgSywgaW4gb3JkZXIgdG8gcHJlZGljdCBpbmNvbWUuIFVzZSBvbmx5IHRoZSB2YXJpYWJsZXMgdGhhdCBzZWVtZWQgbW9zdCBhc3NvY2lhdGVkIHdpdGggaW5jb21lIGluIChiKS4gV2hhdCB0ZXN0IGVycm9ycyBkbyB5b3Ugb2J0YWluPyBXaGljaCB2YWx1ZSBvZiBLIHNlZW1zIHRvIHBlcmZvcm0gdGhlIGJlc3Qgb24gdGhpcyBkYXRhIHNldD8qKg0KDQpgYGB7cixjYWNoZT1UUlVFfQ0KIyNUcmFpbiBNb2RlbCwgTGV0IENWIGNob29zZSB2YWx1ZSBmb3IgSw0Ka25uLmZpdDwtdHJhaW4oaW5jb21lfmFnZStgbWFyaXRhbF9zdGF0dXNNYXJyaWVkLWNpdi1zcG91c2VgK2NhcGl0YWxfZ2FpbitlZHVjYXRpb25fbnVtK2hvdXJzX3Blcl93ZWVrK3JlbGF0aW9uc2hpcEh1c2JhbmQrY2FwaXRhbF9sb3NzK2BvY2N1cGF0aW9uRXhlYy1tYW5hZ2VyaWFsYCt3b3JrY2xhc3NQcml2YXRlLGRhdGE9dHJhaW4sbWV0aG9kPSdrbm4nLHRyQ29udHJvbCA9IHRyYWluQ29udHJvbChtZXRob2QgPSAiY3YiKSwgdHVuZUxlbmd0aD0yMCkNCmtubi5maXQNCiMjQ2FsY3VsYXRlIFByZWRpY3Rpb25zDQpwcmVkLmtubjwtcHJlZGljdChrbm4uZml0LHRlc3QpDQojI0VzdGltYXRlIEFjY3VyYWN5DQpjb25mdXNpb25NYXRyaXgocHJlZC5rbm4sdGVzdCRpbmNvbWUpDQpgYGANCioqaC4gQ2hvb3NlIHdoaWNoIG1vZGVsIHByZWRpY3RzIGluY29tZSB0aGUgYmVzdCBhbmQganVzdGlmeSB5b3VyIGNob2ljZS4qKg0KDQp8IE1vZGVsICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwgQWNjdXJhY3kgfA0KfC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS18LS0tLS0tLS0tLXwNCnwgTGluZWFyIERpc2NyaW1pbmFudCAgICAgICAgICAgICAgICAgfCBgciBxZGEuZml0JHJlc3VsdHNbMl1bWzFdXWAgfA0KfCBRdWFkcmF0aWMgRGlzY3JpbWluYW50ICAgICAgICAgICAgICAgICB8IGByIGxkYS5maXQkcmVzdWx0c1syXVtbMV1dYCB8DQp8IEsgTmVhcmVzdCBOZWlnaGJvcnMgKEs9NykgIHwgYHIgbWluKGtubi5maXQkcmVzdWx0c1syXVtbMV1dKWB8DQp8IExvZ2lzdGljIFJlZ3Jlc3Npb24gfCBgciBnbG0uZml0JHJlc3VsdHNbMl1bWzFdXWAgfA0KDQpJdCBhcHBlYXJzIEtOTiBpcyB0aGUgb3B0aW1hbCB3YXkgdG8gcnVuIHRoaXMgY2xhc3NpZmljYXRpb24gcHJvYmxlbSwgYXMgaXQgaXMgdGhlIG1vc3QgYWNjdXJhdGUgbWV0aG9kIGdpdmVuIHRoZSByZXN1bHRzLCB3aXRoIFFEQSBnaXZpbmcgdGhlIGxvd2VzdCBhbW91bnQgb2YgYWNjdXJhY3kuIFdoaWxlIHRoaXMgaXMgc2Vuc2l0aXZlIHRvIG90aGVyIGRhdGEsIHNvIGFyZSB0aGUgb3RoZXIgbW9kZWxzIGF0IGEgc2ltaWxhciBwZXJjZW50YWdlLCBtYWtpbmcgS05OIHRoZSBiZXN0IGNob2ljZS4NCg==