Question 14.1 The breast cancer data set breast-cancer-wisconsin.data.txt from http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/ (description at http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29 ) has missing values.

  1. Use the mean/mode imputation method to impute values for the missing data.

  2. Use regression to impute values for the missing data.

  3. Use regression with perturbation to impute values for the missing data.

  4. (Optional) Compare the results and quality of classification models (e.g., SVM, KNN) build using

  1. the data sets from questions 1,2,3;

  2. the data that remains after data points with missing values are removed; and (3) the data set when a binary variable is introduced to indicate missing values.

Data = read.csv("breast-cancer-wisconsin.data.txt",header = FALSE,na.strings = "?")
str(Data)
'data.frame':   699 obs. of  11 variables:
 $ V1 : int  1000025 1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 ...
 $ V2 : int  5 5 3 6 4 8 1 2 2 4 ...
 $ V3 : int  1 4 1 8 1 10 1 1 1 2 ...
 $ V4 : int  1 4 1 8 1 10 1 2 1 1 ...
 $ V5 : int  1 5 1 1 3 8 1 1 1 1 ...
 $ V6 : int  2 7 2 3 2 7 2 2 2 2 ...
 $ V7 : int  1 10 2 4 1 10 10 1 1 1 ...
 $ V8 : int  3 3 3 3 3 9 3 3 1 2 ...
 $ V9 : int  1 2 1 7 1 7 1 1 1 1 ...
 $ V10: int  1 1 1 1 1 1 1 1 5 1 ...
 $ V11: int  2 2 2 2 2 4 2 2 2 2 ...
print(summary(Data))
       V1                 V2               V3               V4        
 Min.   :   61634   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.:  870688   1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000  
 Median : 1171710   Median : 4.000   Median : 1.000   Median : 1.000  
 Mean   : 1071704   Mean   : 4.418   Mean   : 3.134   Mean   : 3.207  
 3rd Qu.: 1238298   3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000  
 Max.   :13454352   Max.   :10.000   Max.   :10.000   Max.   :10.000  
                                                                      
       V5               V6               V7               V8        
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 2.000  
 Median : 1.000   Median : 2.000   Median : 1.000   Median : 3.000  
 Mean   : 2.807   Mean   : 3.216   Mean   : 3.545   Mean   : 3.438  
 3rd Qu.: 4.000   3rd Qu.: 4.000   3rd Qu.: 6.000   3rd Qu.: 5.000  
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
                                   NA's   :16                       
       V9              V10              V11      
 Min.   : 1.000   Min.   : 1.000   Min.   :2.00  
 1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.:2.00  
 Median : 1.000   Median : 1.000   Median :2.00  
 Mean   : 2.867   Mean   : 1.589   Mean   :2.69  
 3rd Qu.: 4.000   3rd Qu.: 1.000   3rd Qu.:4.00  
 Max.   :10.000   Max.   :10.000   Max.   :4.00  
                                                 
# 1. Sample code number: id number
# 2. Clump Thickness: 1 - 10
# 3. Uniformity of Cell Size: 1 - 10
# 4. Uniformity of Cell Shape: 1 - 10
# 5. Marginal Adhesion: 1 - 10
# 6. Single Epithelial Cell Size: 1 - 10
# 7. Bare Nuclei: 1 - 10
# 8. Bland Chromatin: 1 - 10
# 9. Normal Nucleoli: 1 - 10
# 10. Mitoses: 1 - 10
# 11. Class: (2 for benign, 4 for malignant)

colnames(Data) <- c("ID", "Clump_Thickness", "Cell_Size", "Cell_Shape",
                 "Marginal_Adhesion", "Single_Epith_Cell_Size", "Bare_Nuclei", 
                 "Bland_Chromatin","Normal_Nucleoli", "Mitoses", "Class")

require(dplyr)
Data <- Data %>%mutate(Class = ifelse(Class == 4,1,0))

drops <- c("ID")
Data <- Data[ , !(names(Data) %in% drops)]
summary(Data)
 Clump_Thickness    Cell_Size        Cell_Shape     Marginal_Adhesion
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   
 1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   
 Median : 4.000   Median : 1.000   Median : 1.000   Median : 1.000   
 Mean   : 4.418   Mean   : 3.134   Mean   : 3.207   Mean   : 2.807   
 3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 4.000   
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   
                                                                     
 Single_Epith_Cell_Size  Bare_Nuclei     Bland_Chromatin  Normal_Nucleoli 
 Min.   : 1.000         Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 2.000         1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.: 1.000  
 Median : 2.000         Median : 1.000   Median : 3.000   Median : 1.000  
 Mean   : 3.216         Mean   : 3.545   Mean   : 3.438   Mean   : 2.867  
 3rd Qu.: 4.000         3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 4.000  
 Max.   :10.000         Max.   :10.000   Max.   :10.000   Max.   :10.000  
                        NA's   :16                                        
    Mitoses           Class       
 Min.   : 1.000   Min.   :0.0000  
 1st Qu.: 1.000   1st Qu.:0.0000  
 Median : 1.000   Median :0.0000  
 Mean   : 1.589   Mean   :0.3448  
 3rd Qu.: 1.000   3rd Qu.:1.0000  
 Max.   :10.000   Max.   :1.0000  
                                  
print(summary(Data))
 Clump_Thickness    Cell_Size        Cell_Shape     Marginal_Adhesion
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   
 1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   
 Median : 4.000   Median : 1.000   Median : 1.000   Median : 1.000   
 Mean   : 4.418   Mean   : 3.134   Mean   : 3.207   Mean   : 2.807   
 3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 4.000   
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   
                                                                     
 Single_Epith_Cell_Size  Bare_Nuclei     Bland_Chromatin  Normal_Nucleoli 
 Min.   : 1.000         Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 2.000         1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.: 1.000  
 Median : 2.000         Median : 1.000   Median : 3.000   Median : 1.000  
 Mean   : 3.216         Mean   : 3.545   Mean   : 3.438   Mean   : 2.867  
 3rd Qu.: 4.000         3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 4.000  
 Max.   :10.000         Max.   :10.000   Max.   :10.000   Max.   :10.000  
                        NA's   :16                                        
    Mitoses           Class       
 Min.   : 1.000   Min.   :0.0000  
 1st Qu.: 1.000   1st Qu.:0.0000  
 Median : 1.000   Median :0.0000  
 Mean   : 1.589   Mean   :0.3448  
 3rd Qu.: 1.000   3rd Qu.:1.0000  
 Max.   :10.000   Max.   :1.0000  
                                  
#Bare_Nuclei  has 16 missing values
library(VIM)
mice_plot <- aggr(Data, col=c('navyblue','yellow'),numbers=TRUE, sortVars=TRUE,
labels=names(Data), cex.axis=.7,gap=3, ylab=c("Missing data","Pattern"))

 Variables sorted by number of missings: 

# From the chart I see:
# There are 67% values in the data set with no missing value. There are 97.8% with
# no missing values.
# This chart can be very useful if multiple columns have missing values.

Data_imput_w_mean <- Data
Data_imput_w_mean$Bare_Nuclei[is.na(Data_imput_w_mean$Bare_Nuclei)] <- mean(Data_imput_w_mean$Bare_Nuclei, na.rm=TRUE)
Data_imput_w_mode <- Data
Data_imput_w_mode$Bare_Nuclei[is.na(Data_imput_w_mean$Bare_Nuclei)] <- mode(Data_imput_w_mean$Bare_Nuclei)
# Use regression to impute missing values
library(mice)
Data_imput_w_regression <- Data
imp <- mice(Data, method="norm.predict",m=1)

 iter imp variable
  1   1  Bare_Nuclei
  2   1  Bare_Nuclei
  3   1  Bare_Nuclei
  4   1  Bare_Nuclei
  5   1  Bare_Nuclei
#method="norm.predict"
#Linear regression, predicted values (numeric)
imp
Class: mids
Number of multiple imputations:  1 
Imputation methods:
       Clump_Thickness              Cell_Size             Cell_Shape 
                    ""                     ""                     "" 
     Marginal_Adhesion Single_Epith_Cell_Size            Bare_Nuclei 
                    ""                     ""         "norm.predict" 
       Bland_Chromatin        Normal_Nucleoli                Mitoses 
                    ""                     ""                     "" 
                 Class 
                    "" 
PredictorMatrix:
                       Clump_Thickness Cell_Size Cell_Shape Marginal_Adhesion
Clump_Thickness                      0         1          1                 1
Cell_Size                            1         0          1                 1
Cell_Shape                           1         1          0                 1
Marginal_Adhesion                    1         1          1                 0
Single_Epith_Cell_Size               1         1          1                 1
Bare_Nuclei                          1         1          1                 1
                       Single_Epith_Cell_Size Bare_Nuclei Bland_Chromatin
Clump_Thickness                             1           1               1
Cell_Size                                   1           1               1
Cell_Shape                                  1           1               1
Marginal_Adhesion                           1           1               1
Single_Epith_Cell_Size                      0           1               1
Bare_Nuclei                                 1           0               1
                       Normal_Nucleoli Mitoses Class
Clump_Thickness                      1       1     1
Cell_Size                            1       1     1
Cell_Shape                           1       1     1
Marginal_Adhesion                    1       1     1
Single_Epith_Cell_Size               1       1     1
Bare_Nuclei                          1       1     1
# list the actual imputations for Bare_Nuclei

# Class: mids
# Number of multiple imputations:  1 
# Imputation methods:
#        Clump_Thickness              Cell_Size             Cell_Shape 
#                     ""                     ""                     "" 
#      Marginal_Adhesion Single_Epith_Cell_Size            Bare_Nuclei 
#                     ""                     ""         "norm.predict" 
#        Bland_Chromatin        Normal_Nucleoli                Mitoses 
#                     ""                     ""                     "" 
#                  Class 
#                     "" 
# PredictorMatrix:
#                        Clump_Thickness Cell_Size Cell_Shape Marginal_Adhesion
# Clump_Thickness                      0         1          1                 1
# Cell_Size                            1         0          1                 1
# Cell_Shape                           1         1          0                 1
# Marginal_Adhesion                    1         1          1                 0
# Single_Epith_Cell_Size               1         1          1                 1
# Bare_Nuclei                          1         1          1                 1
#                        Single_Epith_Cell_Size Bare_Nuclei Bland_Chromatin
# Clump_Thickness                             1           1               1
# Cell_Size                                   1           1               1
# Cell_Shape                                  1           1               1
# Marginal_Adhesion                           1           1               1
# Single_Epith_Cell_Size                      0           1               1
# Bare_Nuclei                                 1           0               1
#                        Normal_Nucleoli Mitoses Class
# Clump_Thickness                      1       1     1
# Cell_Size                            1       1     1
# Cell_Shape                           1       1     1
# Marginal_Adhesion                    1       1     1
# Single_Epith_Cell_Size               1       1     1
# Bare_Nuclei                          1       1     1
Data_imput_w_regression <- complete(imp)


#Use regression with perturbation impute values
imp_perturbation <- mice(Data, method="norm.nob",m=1)

 iter imp variable
  1   1  Bare_Nuclei
  2   1  Bare_Nuclei
  3   1  Bare_Nuclei
  4   1  Bare_Nuclei
  5   1  Bare_Nuclei
#method="norm.nob"
#Linear regression, predicted values (numeric)
imp_perturbation
Class: mids
Number of multiple imputations:  1 
Imputation methods:
       Clump_Thickness              Cell_Size             Cell_Shape 
                    ""                     ""                     "" 
     Marginal_Adhesion Single_Epith_Cell_Size            Bare_Nuclei 
                    ""                     ""             "norm.nob" 
       Bland_Chromatin        Normal_Nucleoli                Mitoses 
                    ""                     ""                     "" 
                 Class 
                    "" 
PredictorMatrix:
                       Clump_Thickness Cell_Size Cell_Shape Marginal_Adhesion
Clump_Thickness                      0         1          1                 1
Cell_Size                            1         0          1                 1
Cell_Shape                           1         1          0                 1
Marginal_Adhesion                    1         1          1                 0
Single_Epith_Cell_Size               1         1          1                 1
Bare_Nuclei                          1         1          1                 1
                       Single_Epith_Cell_Size Bare_Nuclei Bland_Chromatin
Clump_Thickness                             1           1               1
Cell_Size                                   1           1               1
Cell_Shape                                  1           1               1
Marginal_Adhesion                           1           1               1
Single_Epith_Cell_Size                      0           1               1
Bare_Nuclei                                 1           0               1
                       Normal_Nucleoli Mitoses Class
Clump_Thickness                      1       1     1
Cell_Size                            1       1     1
Cell_Shape                           1       1     1
Marginal_Adhesion                    1       1     1
Single_Epith_Cell_Size               1       1     1
Bare_Nuclei                          1       1     1
# list the actual imputations for Bare_Nuclei
Data_imput_w_regression_pert <- complete(imp_perturbation)
  1. (Optional) Compare the results and quality of classification models (e.g., SVM, KNN) build using
  1. the data sets from questions 1,2,3;

  2. the data that remains after data points with missing values are removed; and (3) the data set when a binary variable is introduced to indicate missing values.


split_test_train <- function(df) {
  set.seed(1)
  train = sample(1:nrow(df), 0.75*nrow(df))
  train_df <- df[train,]
  test_df <- df[-train,]
  return (list("train" = train_df, "test" = test_df))
}

result_w_mean <- split_test_train(Data_imput_w_mean)

summary(result_w_mean$train)
 Clump_Thickness    Cell_Size        Cell_Shape     Marginal_Adhesion
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   
 1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   
 Median : 4.000   Median : 1.000   Median : 2.000   Median : 1.000   
 Mean   : 4.393   Mean   : 3.145   Mean   : 3.179   Mean   : 2.847   
 3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 4.000   
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   
 Single_Epith_Cell_Size  Bare_Nuclei     Bland_Chromatin  Normal_Nucleoli 
 Min.   : 1.000         Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 2.000         1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.: 1.000  
 Median : 2.000         Median : 1.000   Median : 3.000   Median : 1.000  
 Mean   : 3.174         Mean   : 3.498   Mean   : 3.454   Mean   : 2.855  
 3rd Qu.: 4.000         3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 3.250  
 Max.   :10.000         Max.   :10.000   Max.   :10.000   Max.   :10.000  
    Mitoses           Class       
 Min.   : 1.000   Min.   :0.0000  
 1st Qu.: 1.000   1st Qu.:0.0000  
 Median : 1.000   Median :0.0000  
 Mean   : 1.532   Mean   :0.3359  
 3rd Qu.: 1.000   3rd Qu.:1.0000  
 Max.   :10.000   Max.   :1.0000  
dim(result_w_mean$train)
[1] 524  10
dim(result_w_mean$test)
[1] 175  10
data_train_target_category <- result_w_mean$train[,"Class"]
data_test_target_category <- result_w_mean$test[,"Class"]

#run KNN models
library(class)

pr <- knn(result_w_mean$train,result_w_mean$test, cl=data_train_target_category)
pr
  [1] 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 0
 [38] 0 0 1 0 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 1 0 0 1 1 0
 [75] 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
[112] 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[149] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Levels: 0 1
##create confusion matrix
cf <- table(pr,data_test_target_category)
cf
   data_test_target_category
pr    0   1
  0 110   6
  1   0  59
#   data_test_target_category
#pr   0  1
#  0 79 38
#  1 31 27

 ##this function divides the correct predictions by total number of predictions that tell us how accurate teh model is.
 
accuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
accuracy(cf)
[1] 96.57143
#60.5% accuracy of the data imputed with mean


#Data_imput_w_regression

result_w_regression <- split_test_train(Data_imput_w_regression)

summary(result_w_regression$train)
 Clump_Thickness    Cell_Size        Cell_Shape     Marginal_Adhesion
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   
 1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   
 Median : 4.000   Median : 1.000   Median : 2.000   Median : 1.000   
 Mean   : 4.393   Mean   : 3.145   Mean   : 3.179   Mean   : 2.847   
 3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 4.000   
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   
 Single_Epith_Cell_Size  Bare_Nuclei     Bland_Chromatin  Normal_Nucleoli 
 Min.   : 1.000         Min.   : 1.000   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 2.000         1st Qu.: 1.000   1st Qu.: 2.000   1st Qu.: 1.000  
 Median : 2.000         Median : 1.000   Median : 3.000   Median : 1.000  
 Mean   : 3.174         Mean   : 3.462   Mean   : 3.454   Mean   : 2.855  
 3rd Qu.: 4.000         3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 3.250  
 Max.   :10.000         Max.   :10.000   Max.   :10.000   Max.   :10.000  
    Mitoses           Class       
 Min.   : 1.000   Min.   :0.0000  
 1st Qu.: 1.000   1st Qu.:0.0000  
 Median : 1.000   Median :0.0000  
 Mean   : 1.532   Mean   :0.3359  
 3rd Qu.: 1.000   3rd Qu.:1.0000  
 Max.   :10.000   Max.   :1.0000  
dim(result_w_regression$train)
[1] 524  10
dim(result_w_regression$test)
[1] 175  10
data_train_target_category <- result_w_regression$train[,"Class"]
data_test_target_category <- result_w_regression$test[,"Class"]


pr <- knn(result_w_regression$train,result_w_regression$test, cl=data_train_target_category)

##create confusion matrix
cf <- table(pr,data_test_target_category)
cf
   data_test_target_category
pr    0   1
  0 110   6
  1   0  59
#    data_test_target_category
# pr    0   1
#   0 110   6
#   1   0  59

accuracy(cf)
[1] 96.57143
#96% accuracy of the data imputed with regression


#Data_imput_w_regression_w_perturbation

result_w_regression_pert <- split_test_train(Data_imput_w_regression_pert)

summary(result_w_regression_pert$train)
 Clump_Thickness    Cell_Size        Cell_Shape     Marginal_Adhesion
 Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000   
 1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   
 Median : 4.000   Median : 1.000   Median : 2.000   Median : 1.000   
 Mean   : 4.393   Mean   : 3.145   Mean   : 3.179   Mean   : 2.847   
 3rd Qu.: 6.000   3rd Qu.: 5.000   3rd Qu.: 5.000   3rd Qu.: 4.000   
 Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000   
 Single_Epith_Cell_Size  Bare_Nuclei      Bland_Chromatin  Normal_Nucleoli 
 Min.   : 1.000         Min.   :-0.5784   Min.   : 1.000   Min.   : 1.000  
 1st Qu.: 2.000         1st Qu.: 1.0000   1st Qu.: 2.000   1st Qu.: 1.000  
 Median : 2.000         Median : 1.0000   Median : 3.000   Median : 1.000  
 Mean   : 3.174         Mean   : 3.4775   Mean   : 3.454   Mean   : 2.855  
 3rd Qu.: 4.000         3rd Qu.: 6.0000   3rd Qu.: 5.000   3rd Qu.: 3.250  
 Max.   :10.000         Max.   :10.0000   Max.   :10.000   Max.   :10.000  
    Mitoses           Class       
 Min.   : 1.000   Min.   :0.0000  
 1st Qu.: 1.000   1st Qu.:0.0000  
 Median : 1.000   Median :0.0000  
 Mean   : 1.532   Mean   :0.3359  
 3rd Qu.: 1.000   3rd Qu.:1.0000  
 Max.   :10.000   Max.   :1.0000  
data_train_target_category <- result_w_regression_pert$train[,"Class"]
data_test_target_category <- result_w_regression_pert$test[,"Class"]


pr <- knn(result_w_regression_pert$train,result_w_regression_pert$test, cl=data_train_target_category)
pr
  [1] 0 1 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 0
 [38] 0 0 1 0 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 1 0 0 1 1 0
 [75] 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0
[112] 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[149] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Levels: 0 1
##create confusion matrix
cf <- table(pr,data_test_target_category)
cf
   data_test_target_category
pr    0   1
  0 110   6
  1   0  59
#    data_test_target_category
# pr    0   1
#   0 110   6
#   1   0  59

accuracy(cf)
[1] 96.57143
#96% accuracy of the data imputed with regression

#Data with rows w/ missing values removed

data_w_no_na = Data[complete.cases(Data), ]
dim(data_w_no_na)
[1] 683  10
result_w_no_na <- split_test_train(data_w_no_na)
data_train_target_category <- result_w_no_na$train[,"Class"]
data_test_target_category <- result_w_no_na$test[,"Class"]


pr <- knn(result_w_no_na$train,result_w_no_na$test, cl=data_train_target_category)
pr
  [1] 1 0 1 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0
 [38] 0 1 1 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 0 1 0 0 1 1
 [75] 1 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 1 1 1 1 1 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0
[149] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Levels: 0 1
##create confusion matrix
cf <- table(pr,data_test_target_category)
cf
   data_test_target_category
pr    0   1
  0 113   3
  1   3  52
#   data_test_target_category
# pr    0   1
#   0 113   3
#   1   3  52

accuracy(cf)
[1] 96.49123
#96% accuracy of the data without missing values 

# Using KNN models with 
#(1) data without missing values rows
#(2) data imputed with mean
#(3) data imputed with regression
#(4) data imputed with regression with perturbation
#(1) performs the worst, while the other three perform similarly well 
#with accuracy of 96% 
#This seems suspiciously high. This might due to overfitting from imputation
#based on multiple independent variables. 

Question 15.1 Describe a situation or problem from your job, everyday life, current events, etc., for which optimization would be appropriate. What data would you need?

Optimization can be used for airline to decide the number of first-class tickets, premium tickets, and coach tickets they should sell to maximize their profits for their flights. The company might need to consider certain constraints, for example, only max. number of first class and coach seats are available on the flight. The number of staffs to serve first class might be limited, etc.

LS0tCnRpdGxlOiAiWXVjaGVuUGVuZ0hXNiIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0Ci0tLQoKUXVlc3Rpb24gMTQuMQpUaGUgYnJlYXN0IGNhbmNlciBkYXRhIHNldCBicmVhc3QtY2FuY2VyLXdpc2NvbnNpbi5kYXRhLnR4dCBmcm9tIGh0dHA6Ly9hcmNoaXZlLmljcy51Y2kuZWR1L21sL21hY2hpbmUtbGVhcm5pbmctZGF0YWJhc2VzL2JyZWFzdC1jYW5jZXItd2lzY29uc2luLyAoZGVzY3JpcHRpb24gYXQgaHR0cDovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvQnJlYXN0K0NhbmNlcitXaXNjb25zaW4rJTI4T3JpZ2luYWwlMjkgKSBoYXMgbWlzc2luZyB2YWx1ZXMuCgoxLiBVc2UgdGhlIG1lYW4vbW9kZSBpbXB1dGF0aW9uIG1ldGhvZCB0byBpbXB1dGUgdmFsdWVzIGZvciB0aGUgbWlzc2luZyBkYXRhLiAKCjIuIFVzZSByZWdyZXNzaW9uIHRvIGltcHV0ZSB2YWx1ZXMgZm9yIHRoZSBtaXNzaW5nIGRhdGEuCgozLiBVc2UgcmVncmVzc2lvbiB3aXRoIHBlcnR1cmJhdGlvbiB0byBpbXB1dGUgdmFsdWVzIGZvciB0aGUgbWlzc2luZyBkYXRhLgoKNC4gKE9wdGlvbmFsKSBDb21wYXJlIHRoZSByZXN1bHRzIGFuZCBxdWFsaXR5IG9mIGNsYXNzaWZpY2F0aW9uIG1vZGVscyAoZS5nLiwgU1ZNLCBLTk4pIGJ1aWxkIHVzaW5nCgooMSkgdGhlIGRhdGEgc2V0cyBmcm9tIHF1ZXN0aW9ucyAxLDIsMzsKCigyKSB0aGUgZGF0YSB0aGF0IHJlbWFpbnMgYWZ0ZXIgZGF0YSBwb2ludHMgd2l0aCBtaXNzaW5nIHZhbHVlcyBhcmUgcmVtb3ZlZDsgYW5kICgzKSB0aGUgZGF0YSBzZXQgd2hlbiBhIGJpbmFyeSB2YXJpYWJsZSBpcyBpbnRyb2R1Y2VkIHRvIGluZGljYXRlIG1pc3NpbmcgdmFsdWVzLgoKCmBgYHtyfQpEYXRhID0gcmVhZC5jc3YoImJyZWFzdC1jYW5jZXItd2lzY29uc2luLmRhdGEudHh0IixoZWFkZXIgPSBGQUxTRSxuYS5zdHJpbmdzID0gIj8iKQpzdHIoRGF0YSkKcHJpbnQoc3VtbWFyeShEYXRhKSkKIyAxLiBTYW1wbGUgY29kZSBudW1iZXI6IGlkIG51bWJlcgojIDIuIENsdW1wIFRoaWNrbmVzczogMSAtIDEwCiMgMy4gVW5pZm9ybWl0eSBvZiBDZWxsIFNpemU6IDEgLSAxMAojIDQuIFVuaWZvcm1pdHkgb2YgQ2VsbCBTaGFwZTogMSAtIDEwCiMgNS4gTWFyZ2luYWwgQWRoZXNpb246IDEgLSAxMAojIDYuIFNpbmdsZSBFcGl0aGVsaWFsIENlbGwgU2l6ZTogMSAtIDEwCiMgNy4gQmFyZSBOdWNsZWk6IDEgLSAxMAojIDguIEJsYW5kIENocm9tYXRpbjogMSAtIDEwCiMgOS4gTm9ybWFsIE51Y2xlb2xpOiAxIC0gMTAKIyAxMC4gTWl0b3NlczogMSAtIDEwCiMgMTEuIENsYXNzOiAoMiBmb3IgYmVuaWduLCA0IGZvciBtYWxpZ25hbnQpCgpjb2xuYW1lcyhEYXRhKSA8LSBjKCJJRCIsICJDbHVtcF9UaGlja25lc3MiLCAiQ2VsbF9TaXplIiwgIkNlbGxfU2hhcGUiLAogICAgICAgICAgICAgICAgICJNYXJnaW5hbF9BZGhlc2lvbiIsICJTaW5nbGVfRXBpdGhfQ2VsbF9TaXplIiwgIkJhcmVfTnVjbGVpIiwgCiAgICAgICAgICAgICAgICAgIkJsYW5kX0Nocm9tYXRpbiIsIk5vcm1hbF9OdWNsZW9saSIsICJNaXRvc2VzIiwgIkNsYXNzIikKCnJlcXVpcmUoZHBseXIpCkRhdGEgPC0gRGF0YSAlPiVtdXRhdGUoQ2xhc3MgPSBpZmVsc2UoQ2xhc3MgPT0gNCwxLDApKQoKZHJvcHMgPC0gYygiSUQiKQpEYXRhIDwtIERhdGFbICwgIShuYW1lcyhEYXRhKSAlaW4lIGRyb3BzKV0Kc3VtbWFyeShEYXRhKQpgYGAKCgpgYGB7cn0KcHJpbnQoc3VtbWFyeShEYXRhKSkKI0JhcmVfTnVjbGVpICBoYXMgMTYgbWlzc2luZyB2YWx1ZXMKbGlicmFyeShWSU0pCm1pY2VfcGxvdCA8LSBhZ2dyKERhdGEsIGNvbD1jKCduYXZ5Ymx1ZScsJ3llbGxvdycpLG51bWJlcnM9VFJVRSwgc29ydFZhcnM9VFJVRSwKbGFiZWxzPW5hbWVzKERhdGEpLCBjZXguYXhpcz0uNyxnYXA9MywgeWxhYj1jKCJNaXNzaW5nIGRhdGEiLCJQYXR0ZXJuIikpCgojIEZyb20gdGhlIGNoYXJ0IEkgc2VlOgojIFRoZXJlIGFyZSA2NyUgdmFsdWVzIGluIHRoZSBkYXRhIHNldCB3aXRoIG5vIG1pc3NpbmcgdmFsdWUuIFRoZXJlIGFyZSA5Ny44JSB3aXRoCiMgbm8gbWlzc2luZyB2YWx1ZXMuCiMgVGhpcyBjaGFydCBjYW4gYmUgdmVyeSB1c2VmdWwgaWYgbXVsdGlwbGUgY29sdW1ucyBoYXZlIG1pc3NpbmcgdmFsdWVzLgoKRGF0YV9pbXB1dF93X21lYW4gPC0gRGF0YQpEYXRhX2ltcHV0X3dfbWVhbiRCYXJlX051Y2xlaVtpcy5uYShEYXRhX2ltcHV0X3dfbWVhbiRCYXJlX051Y2xlaSldIDwtIG1lYW4oRGF0YV9pbXB1dF93X21lYW4kQmFyZV9OdWNsZWksIG5hLnJtPVRSVUUpCgpgYGAKCgpgYGB7cn0KRGF0YV9pbXB1dF93X21vZGUgPC0gRGF0YQpEYXRhX2ltcHV0X3dfbW9kZSRCYXJlX051Y2xlaVtpcy5uYShEYXRhX2ltcHV0X3dfbWVhbiRCYXJlX051Y2xlaSldIDwtIG1vZGUoRGF0YV9pbXB1dF93X21lYW4kQmFyZV9OdWNsZWkpCgpgYGAKYGBge3J9CiMgVXNlIHJlZ3Jlc3Npb24gdG8gaW1wdXRlIG1pc3NpbmcgdmFsdWVzCmxpYnJhcnkobWljZSkKRGF0YV9pbXB1dF93X3JlZ3Jlc3Npb24gPC0gRGF0YQppbXAgPC0gbWljZShEYXRhLCBtZXRob2Q9Im5vcm0ucHJlZGljdCIsbT0xKQojbWV0aG9kPSJub3JtLnByZWRpY3QiCiNMaW5lYXIgcmVncmVzc2lvbiwgcHJlZGljdGVkIHZhbHVlcyAobnVtZXJpYykKaW1wCiMgbGlzdCB0aGUgYWN0dWFsIGltcHV0YXRpb25zIGZvciBCYXJlX051Y2xlaQoKIyBDbGFzczogbWlkcwojIE51bWJlciBvZiBtdWx0aXBsZSBpbXB1dGF0aW9uczogIDEgCiMgSW1wdXRhdGlvbiBtZXRob2RzOgojICAgICAgICBDbHVtcF9UaGlja25lc3MgICAgICAgICAgICAgIENlbGxfU2l6ZSAgICAgICAgICAgICBDZWxsX1NoYXBlIAojICAgICAgICAgICAgICAgICAgICAgIiIgICAgICAgICAgICAgICAgICAgICAiIiAgICAgICAgICAgICAgICAgICAgICIiIAojICAgICAgTWFyZ2luYWxfQWRoZXNpb24gU2luZ2xlX0VwaXRoX0NlbGxfU2l6ZSAgICAgICAgICAgIEJhcmVfTnVjbGVpIAojICAgICAgICAgICAgICAgICAgICAgIiIgICAgICAgICAgICAgICAgICAgICAiIiAgICAgICAgICJub3JtLnByZWRpY3QiIAojICAgICAgICBCbGFuZF9DaHJvbWF0aW4gICAgICAgIE5vcm1hbF9OdWNsZW9saSAgICAgICAgICAgICAgICBNaXRvc2VzIAojICAgICAgICAgICAgICAgICAgICAgIiIgICAgICAgICAgICAgICAgICAgICAiIiAgICAgICAgICAgICAgICAgICAgICIiIAojICAgICAgICAgICAgICAgICAgQ2xhc3MgCiMgICAgICAgICAgICAgICAgICAgICAiIiAKIyBQcmVkaWN0b3JNYXRyaXg6CiMgICAgICAgICAgICAgICAgICAgICAgICBDbHVtcF9UaGlja25lc3MgQ2VsbF9TaXplIENlbGxfU2hhcGUgTWFyZ2luYWxfQWRoZXNpb24KIyBDbHVtcF9UaGlja25lc3MgICAgICAgICAgICAgICAgICAgICAgMCAgICAgICAgIDEgICAgICAgICAgMSAgICAgICAgICAgICAgICAgMQojIENlbGxfU2l6ZSAgICAgICAgICAgICAgICAgICAgICAgICAgICAxICAgICAgICAgMCAgICAgICAgICAxICAgICAgICAgICAgICAgICAxCiMgQ2VsbF9TaGFwZSAgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgICAxICAgICAgICAgIDAgICAgICAgICAgICAgICAgIDEKIyBNYXJnaW5hbF9BZGhlc2lvbiAgICAgICAgICAgICAgICAgICAgMSAgICAgICAgIDEgICAgICAgICAgMSAgICAgICAgICAgICAgICAgMAojIFNpbmdsZV9FcGl0aF9DZWxsX1NpemUgICAgICAgICAgICAgICAxICAgICAgICAgMSAgICAgICAgICAxICAgICAgICAgICAgICAgICAxCiMgQmFyZV9OdWNsZWkgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgICAxICAgICAgICAgIDEgICAgICAgICAgICAgICAgIDEKIyAgICAgICAgICAgICAgICAgICAgICAgIFNpbmdsZV9FcGl0aF9DZWxsX1NpemUgQmFyZV9OdWNsZWkgQmxhbmRfQ2hyb21hdGluCiMgQ2x1bXBfVGhpY2tuZXNzICAgICAgICAgICAgICAgICAgICAgICAgICAgICAxICAgICAgICAgICAxICAgICAgICAgICAgICAgMQojIENlbGxfU2l6ZSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgMSAgICAgICAgICAgMSAgICAgICAgICAgICAgIDEKIyBDZWxsX1NoYXBlICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgICAgIDEgICAgICAgICAgICAgICAxCiMgTWFyZ2luYWxfQWRoZXNpb24gICAgICAgICAgICAgICAgICAgICAgICAgICAxICAgICAgICAgICAxICAgICAgICAgICAgICAgMQojIFNpbmdsZV9FcGl0aF9DZWxsX1NpemUgICAgICAgICAgICAgICAgICAgICAgMCAgICAgICAgICAgMSAgICAgICAgICAgICAgIDEKIyBCYXJlX051Y2xlaSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgICAgIDAgICAgICAgICAgICAgICAxCiMgICAgICAgICAgICAgICAgICAgICAgICBOb3JtYWxfTnVjbGVvbGkgTWl0b3NlcyBDbGFzcwojIENsdW1wX1RoaWNrbmVzcyAgICAgICAgICAgICAgICAgICAgICAxICAgICAgIDEgICAgIDEKIyBDZWxsX1NpemUgICAgICAgICAgICAgICAgICAgICAgICAgICAgMSAgICAgICAxICAgICAxCiMgQ2VsbF9TaGFwZSAgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgMSAgICAgMQojIE1hcmdpbmFsX0FkaGVzaW9uICAgICAgICAgICAgICAgICAgICAxICAgICAgIDEgICAgIDEKIyBTaW5nbGVfRXBpdGhfQ2VsbF9TaXplICAgICAgICAgICAgICAgMSAgICAgICAxICAgICAxCiMgQmFyZV9OdWNsZWkgICAgICAgICAgICAgICAgICAgICAgICAgIDEgICAgICAgMSAgICAgMQpEYXRhX2ltcHV0X3dfcmVncmVzc2lvbiA8LSBjb21wbGV0ZShpbXApCgoKI1VzZSByZWdyZXNzaW9uIHdpdGggcGVydHVyYmF0aW9uIGltcHV0ZSB2YWx1ZXMKaW1wX3BlcnR1cmJhdGlvbiA8LSBtaWNlKERhdGEsIG1ldGhvZD0ibm9ybS5ub2IiLG09MSkKI21ldGhvZD0ibm9ybS5ub2IiCiNMaW5lYXIgcmVncmVzc2lvbiwgcHJlZGljdGVkIHZhbHVlcyAobnVtZXJpYykKaW1wX3BlcnR1cmJhdGlvbgojIGxpc3QgdGhlIGFjdHVhbCBpbXB1dGF0aW9ucyBmb3IgQmFyZV9OdWNsZWkKRGF0YV9pbXB1dF93X3JlZ3Jlc3Npb25fcGVydCA8LSBjb21wbGV0ZShpbXBfcGVydHVyYmF0aW9uKQoKYGBgCgo0LiAoT3B0aW9uYWwpIENvbXBhcmUgdGhlIHJlc3VsdHMgYW5kIHF1YWxpdHkgb2YgY2xhc3NpZmljYXRpb24gbW9kZWxzIChlLmcuLCBTVk0sIEtOTikgYnVpbGQgdXNpbmcKCigxKSB0aGUgZGF0YSBzZXRzIGZyb20gcXVlc3Rpb25zIDEsMiwzOwoKKDIpIHRoZSBkYXRhIHRoYXQgcmVtYWlucyBhZnRlciBkYXRhIHBvaW50cyB3aXRoIG1pc3NpbmcgdmFsdWVzIGFyZSByZW1vdmVkOyBhbmQgKDMpIHRoZSBkYXRhIHNldCB3aGVuIGEgYmluYXJ5IHZhcmlhYmxlIGlzIGludHJvZHVjZWQgdG8gaW5kaWNhdGUgbWlzc2luZyB2YWx1ZXMuCmBgYHtyfQoKc3BsaXRfdGVzdF90cmFpbiA8LSBmdW5jdGlvbihkZikgewogIHNldC5zZWVkKDEpCiAgdHJhaW4gPSBzYW1wbGUoMTpucm93KGRmKSwgMC43NSpucm93KGRmKSkKICB0cmFpbl9kZiA8LSBkZlt0cmFpbixdCiAgdGVzdF9kZiA8LSBkZlstdHJhaW4sXQogIHJldHVybiAobGlzdCgidHJhaW4iID0gdHJhaW5fZGYsICJ0ZXN0IiA9IHRlc3RfZGYpKQp9CgpyZXN1bHRfd19tZWFuIDwtIHNwbGl0X3Rlc3RfdHJhaW4oRGF0YV9pbXB1dF93X21lYW4pCgpzdW1tYXJ5KHJlc3VsdF93X21lYW4kdHJhaW4pCmRpbShyZXN1bHRfd19tZWFuJHRyYWluKQpkaW0ocmVzdWx0X3dfbWVhbiR0ZXN0KQpkYXRhX3RyYWluX3RhcmdldF9jYXRlZ29yeSA8LSByZXN1bHRfd19tZWFuJHRyYWluWywiQ2xhc3MiXQpkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5IDwtIHJlc3VsdF93X21lYW4kdGVzdFssIkNsYXNzIl0KCiNydW4gS05OIG1vZGVscwpsaWJyYXJ5KGNsYXNzKQoKcHIgPC0ga25uKHJlc3VsdF93X21lYW4kdHJhaW4scmVzdWx0X3dfbWVhbiR0ZXN0LCBjbD1kYXRhX3RyYWluX3RhcmdldF9jYXRlZ29yeSkKcHIKCiMjY3JlYXRlIGNvbmZ1c2lvbiBtYXRyaXgKY2YgPC0gdGFibGUocHIsZGF0YV90ZXN0X3RhcmdldF9jYXRlZ29yeSkKY2YKIyAgIGRhdGFfdGVzdF90YXJnZXRfY2F0ZWdvcnkKI3ByICAgMCAgMQojICAwIDc5IDM4CiMgIDEgMzEgMjcKCiAjI3RoaXMgZnVuY3Rpb24gZGl2aWRlcyB0aGUgY29ycmVjdCBwcmVkaWN0aW9ucyBieSB0b3RhbCBudW1iZXIgb2YgcHJlZGljdGlvbnMgdGhhdCB0ZWxsIHVzIGhvdyBhY2N1cmF0ZSB0ZWggbW9kZWwgaXMuCiAKYWNjdXJhY3kgPC0gZnVuY3Rpb24oeCl7c3VtKGRpYWcoeCkvKHN1bShyb3dTdW1zKHgpKSkpICogMTAwfQphY2N1cmFjeShjZikKIzYwLjUlIGFjY3VyYWN5IG9mIHRoZSBkYXRhIGltcHV0ZWQgd2l0aCBtZWFuCgoKI0RhdGFfaW1wdXRfd19yZWdyZXNzaW9uCgpyZXN1bHRfd19yZWdyZXNzaW9uIDwtIHNwbGl0X3Rlc3RfdHJhaW4oRGF0YV9pbXB1dF93X3JlZ3Jlc3Npb24pCgpzdW1tYXJ5KHJlc3VsdF93X3JlZ3Jlc3Npb24kdHJhaW4pCmRpbShyZXN1bHRfd19yZWdyZXNzaW9uJHRyYWluKQpkaW0ocmVzdWx0X3dfcmVncmVzc2lvbiR0ZXN0KQoKZGF0YV90cmFpbl90YXJnZXRfY2F0ZWdvcnkgPC0gcmVzdWx0X3dfcmVncmVzc2lvbiR0cmFpblssIkNsYXNzIl0KZGF0YV90ZXN0X3RhcmdldF9jYXRlZ29yeSA8LSByZXN1bHRfd19yZWdyZXNzaW9uJHRlc3RbLCJDbGFzcyJdCgoKcHIgPC0ga25uKHJlc3VsdF93X3JlZ3Jlc3Npb24kdHJhaW4scmVzdWx0X3dfcmVncmVzc2lvbiR0ZXN0LCBjbD1kYXRhX3RyYWluX3RhcmdldF9jYXRlZ29yeSkKCiMjY3JlYXRlIGNvbmZ1c2lvbiBtYXRyaXgKY2YgPC0gdGFibGUocHIsZGF0YV90ZXN0X3RhcmdldF9jYXRlZ29yeSkKY2YKIyAgICBkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5CiMgcHIgICAgMCAgIDEKIyAgIDAgMTEwICAgNgojICAgMSAgIDAgIDU5CgphY2N1cmFjeShjZikKIzk2JSBhY2N1cmFjeSBvZiB0aGUgZGF0YSBpbXB1dGVkIHdpdGggcmVncmVzc2lvbgoKCiNEYXRhX2ltcHV0X3dfcmVncmVzc2lvbl93X3BlcnR1cmJhdGlvbgoKcmVzdWx0X3dfcmVncmVzc2lvbl9wZXJ0IDwtIHNwbGl0X3Rlc3RfdHJhaW4oRGF0YV9pbXB1dF93X3JlZ3Jlc3Npb25fcGVydCkKCnN1bW1hcnkocmVzdWx0X3dfcmVncmVzc2lvbl9wZXJ0JHRyYWluKQoKZGF0YV90cmFpbl90YXJnZXRfY2F0ZWdvcnkgPC0gcmVzdWx0X3dfcmVncmVzc2lvbl9wZXJ0JHRyYWluWywiQ2xhc3MiXQpkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5IDwtIHJlc3VsdF93X3JlZ3Jlc3Npb25fcGVydCR0ZXN0WywiQ2xhc3MiXQoKCnByIDwtIGtubihyZXN1bHRfd19yZWdyZXNzaW9uX3BlcnQkdHJhaW4scmVzdWx0X3dfcmVncmVzc2lvbl9wZXJ0JHRlc3QsIGNsPWRhdGFfdHJhaW5fdGFyZ2V0X2NhdGVnb3J5KQpwcgoKIyNjcmVhdGUgY29uZnVzaW9uIG1hdHJpeApjZiA8LSB0YWJsZShwcixkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5KQpjZgojICAgIGRhdGFfdGVzdF90YXJnZXRfY2F0ZWdvcnkKIyBwciAgICAwICAgMQojICAgMCAxMTAgICA2CiMgICAxICAgMCAgNTkKCmFjY3VyYWN5KGNmKQojOTYlIGFjY3VyYWN5IG9mIHRoZSBkYXRhIGltcHV0ZWQgd2l0aCByZWdyZXNzaW9uCgojRGF0YSB3aXRoIHJvd3Mgdy8gbWlzc2luZyB2YWx1ZXMgcmVtb3ZlZAoKZGF0YV93X25vX25hID0gRGF0YVtjb21wbGV0ZS5jYXNlcyhEYXRhKSwgXQpkaW0oZGF0YV93X25vX25hKQoKcmVzdWx0X3dfbm9fbmEgPC0gc3BsaXRfdGVzdF90cmFpbihkYXRhX3dfbm9fbmEpCmRhdGFfdHJhaW5fdGFyZ2V0X2NhdGVnb3J5IDwtIHJlc3VsdF93X25vX25hJHRyYWluWywiQ2xhc3MiXQpkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5IDwtIHJlc3VsdF93X25vX25hJHRlc3RbLCJDbGFzcyJdCgoKcHIgPC0ga25uKHJlc3VsdF93X25vX25hJHRyYWluLHJlc3VsdF93X25vX25hJHRlc3QsIGNsPWRhdGFfdHJhaW5fdGFyZ2V0X2NhdGVnb3J5KQpwcgoKIyNjcmVhdGUgY29uZnVzaW9uIG1hdHJpeApjZiA8LSB0YWJsZShwcixkYXRhX3Rlc3RfdGFyZ2V0X2NhdGVnb3J5KQpjZgojICAgZGF0YV90ZXN0X3RhcmdldF9jYXRlZ29yeQojIHByICAgIDAgICAxCiMgICAwIDExMyAgIDMKIyAgIDEgICAzICA1MgoKYWNjdXJhY3koY2YpCiM5NiUgYWNjdXJhY3kgb2YgdGhlIGRhdGEgd2l0aG91dCBtaXNzaW5nIHZhbHVlcyAKCiMgVXNpbmcgS05OIG1vZGVscyB3aXRoIAojKDEpIGRhdGEgd2l0aG91dCBtaXNzaW5nIHZhbHVlcyByb3dzCiMoMikgZGF0YSBpbXB1dGVkIHdpdGggbWVhbgojKDMpIGRhdGEgaW1wdXRlZCB3aXRoIHJlZ3Jlc3Npb24KIyg0KSBkYXRhIGltcHV0ZWQgd2l0aCByZWdyZXNzaW9uIHdpdGggcGVydHVyYmF0aW9uCiMoMSkgcGVyZm9ybXMgdGhlIHdvcnN0LCB3aGlsZSB0aGUgb3RoZXIgdGhyZWUgcGVyZm9ybSBzaW1pbGFybHkgd2VsbCAKI3dpdGggYWNjdXJhY3kgb2YgOTYlIAojVGhpcyBzZWVtcyBzdXNwaWNpb3VzbHkgaGlnaC4gVGhpcyBtaWdodCBkdWUgdG8gb3ZlcmZpdHRpbmcgZnJvbSBpbXB1dGF0aW9uCiNiYXNlZCBvbiBtdWx0aXBsZSBpbmRlcGVuZGVudCB2YXJpYWJsZXMuIAoKCmBgYApRdWVzdGlvbiAxNS4xCkRlc2NyaWJlIGEgc2l0dWF0aW9uIG9yIHByb2JsZW0gZnJvbSB5b3VyIGpvYiwgZXZlcnlkYXkgbGlmZSwgY3VycmVudCBldmVudHMsIGV0Yy4sIGZvciB3aGljaCBvcHRpbWl6YXRpb24gd291bGQgYmUgYXBwcm9wcmlhdGUuIFdoYXQgZGF0YSB3b3VsZCB5b3UgbmVlZD8KCk9wdGltaXphdGlvbiBjYW4gYmUgdXNlZCBmb3IgYWlybGluZSB0byBkZWNpZGUgdGhlIG51bWJlciBvZiBmaXJzdC1jbGFzcyB0aWNrZXRzLCBwcmVtaXVtIHRpY2tldHMsIGFuZCBjb2FjaCB0aWNrZXRzIHRoZXkgc2hvdWxkIHNlbGwgdG8gbWF4aW1pemUgdGhlaXIgcHJvZml0cyBmb3IgdGhlaXIgZmxpZ2h0cy4gVGhlIGNvbXBhbnkgbWlnaHQgbmVlZCB0byBjb25zaWRlciBjZXJ0YWluIGNvbnN0cmFpbnRzLCBmb3IgZXhhbXBsZSwgb25seSBtYXguIG51bWJlciBvZiBmaXJzdCBjbGFzcyBhbmQgY29hY2ggc2VhdHMgYXJlIGF2YWlsYWJsZSBvbiB0aGUgZmxpZ2h0LiBUaGUgbnVtYmVyIG9mIHN0YWZmcyB0byBzZXJ2ZSBmaXJzdCBjbGFzcyBtaWdodCBiZSBsaW1pdGVkLCBldGMuCgoK