Name: Uyen Nguyen
Adult Income Level Prediction using Machine Learning Classification Techniques

I. Introduction

This project utilizes the Adult Income Dataset from UCI Machine Learning Repository. The response variable is income level, which is binary data that takes 2 values. >50k indicates an individual earns more than $50000 annually while <=50k specifies that they make less than or equal to $50000 a year. The explanatory variables include age, workclass, final weight, education, education number, marital status, occupation, relationship, race, sex, capital gain, capital loss, hours per week, native country.

First I will perform data cleaning, then exploratory data analysis and data visualization to have the initial understanding of the relations between different features and income level. Finally, I will utilize machine learning models to perform the classification, namely random forest, logistic regression, support vector machine and decision tree to find out what methods are most effective in predicting an individual’s income based on many factors.

  1. Data

To begin with, I will upload the necessary library and load the csv file of the dataset into RStudio and check the first 6 rows to ensure the dataset is correct. There are 32561 rows and 15 columns.

#import library
library("ggplot2")
Warning: package ‘ggplot2’ was built under R version 4.2.3
library("dplyr")

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
#load file
df <- read.csv('C:/Users/Gia Uyen/Downloads/Adult_Income.csv')
print(head(df))
dim(df)
[1] 32561    15

Cleaning data

It is noticed that there are some columns labeled with the wrong data types. To fix it, I create a list of columns with wrong data types and convert them from character to factor. Also, it turns out that the missing values are not labeled as NA but instead with a question mark “?”. Thus, I replace “?” values with NA and then examine the percentage of NA values in the dataset. Na values appear in workclass, occupation and native country columns, with 5.63%, 5.66% and 1.79% respectively. Because the NA values only account for a small percentage, they are removed from the dataset. I also removed extra whitespace present in many values to avoid creating confusion when filtering out data.

#Missing values
sum(is.na(df))
[1] 0
#replace ? with NA
df[df == " ?"] <- NA

#Calculate NA percentage in the dataset
print(sapply(df, function(df){ sum(is.na(df)==T) * 100 /length(df) }))
             age        workclass     final.weight        education education.number   marital.status       occupation     relationship 
        0.000000         5.638647         0.000000         0.000000         0.000000         0.000000         5.660146         0.000000 
            race              sex     capital.gain     capital.loss   hours.per.week   native.country           income 
        0.000000         0.000000         0.000000         0.000000         0.000000         1.790486         0.000000 
#Because the NA values only account for more than 5% at max, so we can remove it
df <- na.omit(df)

sum(is.na(df))
[1] 0
#remove whitespace from the beginning of the string of each column
df[]<- lapply(df,trimws)
#change data types to numeric and factor
num_var <- c("age","final.weight","education.number",
             "capital.gain","capital.loss","hours.per.week")
df[num_var] <- sapply(df[num_var], as.numeric)

categ_var <- c("workclass","education","marital.status",
               "occupation","relationship","race",
               "sex","native.country")
df[,categ_var] <- lapply(df[,categ_var],factor)

str(df)
'data.frame':   30162 obs. of  15 variables:
 $ age             : num  39 50 38 53 28 37 49 52 31 42 ...
 $ workclass       : Factor w/ 7 levels "Federal-gov",..: 6 5 3 3 3 3 3 5 3 3 ...
 $ final.weight    : num  77516 83311 215646 234721 338409 ...
 $ education       : Factor w/ 16 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
 $ education.number: num  13 13 9 7 13 14 5 9 14 13 ...
 $ marital.status  : Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
 $ occupation      : Factor w/ 14 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
 $ relationship    : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
 $ race            : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
 $ sex             : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
 $ capital.gain    : num  2174 0 0 0 0 ...
 $ capital.loss    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hours.per.week  : num  40 13 40 40 40 40 16 45 50 40 ...
 $ native.country  : Factor w/ 41 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
 $ income          : chr  "<=50K" "<=50K" "<=50K" "<=50K" ...
 - attr(*, "na.action")= 'omit' Named int [1:2399] 15 28 39 52 62 70 78 94 107 129 ...
  ..- attr(*, "names")= chr [1:2399] "15" "28" "39" "52" ...

The graph below shows the correlation between the income level and quantitative features, including age, final weight, education number, capital gain, capital loss, hours per week. Since the income column is categorical data, I turned the data into 0 and 1 to be able to compare with other quantitative features, in which 0 represents income less than or equal to $50k and 1 represents income more than $50k. The correlation heatmap shows that the final weight does not have the least correlation with income, with r = -0.01 while education number appears to be more correlated with income (r = 0.34).

corr_var <- c("age","final.weight","education.number",
             "capital.gain","capital.loss","hours.per.week","income")
df$income<-ifelse(df$income =='<=50K',0,1)
df$income <- as.numeric(df$income)
correlation <- round(cor(df[corr_var]),2)
correlation
                   age final.weight education.number capital.gain capital.loss hours.per.week income
age               1.00        -0.08             0.04         0.08         0.06           0.10   0.24
final.weight     -0.08         1.00            -0.04         0.00        -0.01          -0.02  -0.01
education.number  0.04        -0.04             1.00         0.12         0.08           0.15   0.34
capital.gain      0.08         0.00             0.12         1.00        -0.03           0.08   0.22
capital.loss      0.06        -0.01             0.08        -0.03         1.00           0.05   0.15
hours.per.week    0.10        -0.02             0.15         0.08         0.05           1.00   0.23
income            0.24        -0.01             0.34         0.22         0.15           0.23   1.00
library(reshape2)
Warning: package ‘reshape2’ was built under R version 4.2.3
melted_corr <- melt(correlation)
ggplot(data = melted_corr, aes(x=Var1, y=Var2, fill=value)) + 
  geom_tile()

Because the final weight column doesn’t have much impact while capital gain and capital loss columns contain many zero values, I removed the columns out of the dataset. Additionally, I removed the native country column because a majority of people come from the United States and only a small number come from other countries.

#Because final weight column doesn't have much impact and capital gain and capital loss columns contain many zero value, so we will remove the columns out of the dataset
df$final.weight = NULL
df$capital.gain = NULL
df$capital.loss = NULL
df$native.country = NULL
summary(df)
      age                   workclass            education    education.number               marital.status            occupation  
 Min.   :17.00   Federal-gov     :  943   HS-grad     :9840   Min.   : 1.00    Divorced             : 4214   Prof-specialty :4038  
 1st Qu.:28.00   Local-gov       : 2067   Some-college:6678   1st Qu.: 9.00    Married-AF-spouse    :   21   Craft-repair   :4030  
 Median :37.00   Private         :22286   Bachelors   :5044   Median :10.00    Married-civ-spouse   :14065   Exec-managerial:3992  
 Mean   :38.44   Self-emp-inc    : 1074   Masters     :1627   Mean   :10.12    Married-spouse-absent:  370   Adm-clerical   :3721  
 3rd Qu.:47.00   Self-emp-not-inc: 2499   Assoc-voc   :1307   3rd Qu.:13.00    Never-married        : 9726   Sales          :3584  
 Max.   :90.00   State-gov       : 1279   11th        :1048   Max.   :16.00    Separated            :  939   Other-service  :3212  
                 Without-pay     :   14   (Other)     :4618                    Widowed              :  827   (Other)        :7585  
         relationship                   race           sex        hours.per.week      income      
 Husband       :12463   Amer-Indian-Eskimo:  286   Female: 9782   Min.   : 1.00   Min.   :0.0000  
 Not-in-family : 7726   Asian-Pac-Islander:  895   Male  :20380   1st Qu.:40.00   1st Qu.:0.0000  
 Other-relative:  889   Black             : 2817                  Median :40.00   Median :0.0000  
 Own-child     : 4466   Other             :  231                  Mean   :40.93   Mean   :0.2489  
 Unmarried     : 3212   White             :25933                  3rd Qu.:45.00   3rd Qu.:0.0000  
 Wife          : 1406                                             Max.   :99.00   Max.   :1.0000  
                                                                                                  

Age vs Income

The figure below shows that the age distribution skews right, with a majority of the entries having ages between 27 and 50.

df$age = as.numeric(df$age)

hist(df$age,col = 'lavender', main = "Age Distribution", 
     xlab = "Age", ylab = "Number of people",breaks = 100,prob = T)
abline(v=quantile(df$age, .25), col='red', lwd = 2, lty = 'dashed')
abline(v=quantile(df$age, .50), col='red', lwd = 2, lty = 'dashed')
abline(v=quantile(df$age, .75), col='red', lwd = 2, lty = 'dashed')
lines(density(df$age),col='purple',lwd = 2)

The stacked histogram and boxplot below illustrate the income distribution by age. Most people who make more than $50k are between their mid thirties to mid fifties.

df$income <- as.factor(df$income)
ggplot(df, aes(x=age, fill=income)) + 
  geom_histogram(alpha=0.5, bins=30, color = 'black') +
  geom_density(aes(y=after_stat(density), fill=income), alpha=0.5)+
  scale_fill_manual(values=c("lightgreen", "salmon")) +
  labs(x="Age", y="Count") +
  ggtitle('Income Classification by Age')

  theme_classic()
List of 97
 $ line                      :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                      :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                      :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                     : NULL
 $ aspect.ratio              : NULL
 $ axis.title                : NULL
 $ axis.title.x              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top          :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom       : NULL
 $ axis.title.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left         : NULL
 $ axis.title.y.right        :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top           :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom        : NULL
 $ axis.text.y               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left          : NULL
 $ axis.text.y.right         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                :List of 6
  ..$ colour       : chr "grey20"
  ..$ linewidth    : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ axis.ticks.x              : NULL
 $ axis.ticks.x.top          : NULL
 $ axis.ticks.x.bottom       : NULL
 $ axis.ticks.y              : NULL
 $ axis.ticks.y.left         : NULL
 $ axis.ticks.y.right        : NULL
 $ axis.ticks.length         : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x       : NULL
 $ axis.ticks.length.x.top   : NULL
 $ axis.ticks.length.x.bottom: NULL
 $ axis.ticks.length.y       : NULL
 $ axis.ticks.length.y.left  : NULL
 $ axis.ticks.length.y.right : NULL
 $ axis.line                 :List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : 'rel' num 1
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ axis.line.x               : NULL
 $ axis.line.x.top           : NULL
 $ axis.line.x.bottom        : NULL
 $ axis.line.y               : NULL
 $ axis.line.y.left          : NULL
 $ axis.line.y.right         : NULL
 $ legend.background         :List of 5
  ..$ fill         : NULL
  ..$ colour       : logi NA
  ..$ linewidth    : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ legend.margin             : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing            : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x          : NULL
 $ legend.spacing.y          : NULL
 $ legend.key                : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size           : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height         : NULL
 $ legend.key.width          : NULL
 $ legend.text               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.align         : NULL
 $ legend.title              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.align        : NULL
 $ legend.position           : chr "right"
 $ legend.direction          : NULL
 $ legend.justification      : chr "center"
 $ legend.box                : NULL
 $ legend.box.just           : NULL
 $ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background     : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing        : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ panel.background          :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : logi NA
  ..$ linewidth    : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ panel.border              : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.spacing             : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ panel.spacing.x           : NULL
 $ panel.spacing.y           : NULL
 $ panel.grid                :List of 6
  ..$ colour       : chr "grey92"
  ..$ linewidth    : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ panel.grid.major          : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.grid.minor          : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.grid.major.x        : NULL
 $ panel.grid.major.y        : NULL
 $ panel.grid.minor.x        : NULL
 $ panel.grid.minor.y        : NULL
 $ panel.ontop               : logi FALSE
 $ plot.background           :List of 5
  ..$ fill         : NULL
  ..$ colour       : chr "white"
  ..$ linewidth    : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ plot.title                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.title.position       : chr "panel"
 $ plot.subtitle             :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : num 1
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption.position     : chr "panel"
 $ plot.tag                  :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.tag.position         : chr "topleft"
 $ plot.margin               : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ strip.background          :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ linewidth    : 'rel' num 2
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ strip.background.x        : NULL
 $ strip.background.y        : NULL
 $ strip.clip                : chr "inherit"
 $ strip.placement           : chr "inside"
 $ strip.text                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey10"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.text.x              : NULL
 $ strip.text.x.bottom       : NULL
 $ strip.text.x.top          : NULL
 $ strip.text.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.text.y.left         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.text.y.right        : NULL
 $ strip.switch.pad.grid     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ strip.switch.pad.wrap     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE
ggplot(df, aes(x= income, y=age)) + 
  geom_boxplot(fill="salmon") +
  labs(x="Income", y="Age") +
  ggtitle("Age Distribution by Income Levels") +
  theme_classic()

NA

For the workclass column, I combine “State-gov”, “Local-gov”, “Federal-gov” as “Government”, and “Self-emp-not-inc”, “Self-emp-inc” as “Self-employment”, and change “Without-pay” to “Unemployment”. This would allow me to interpret the models late on easier.

#Combining like factors of workclass column
df$workclass <- as.character(df$workclass)

df$workclass[df$workclass == "State-gov" | df$workclass == "Local-gov" | df$workclass == "Federal-gov"] <- "Government"

df$workclass[df$workclass == "Self-emp-not-inc" | df$workclass == "Self-emp-inc"] <- "Self_Employment"

df$workclass[df$workclass == "Without-pay"] <- "Unemployment"

unique(df$workclass)
[1] "Government"      "Self_Employment" "Private"         "Unemployment"   

From the graph, people who work for private corporations tend to earn income more than $50k the most. There seems to be no distinguishable difference between working for Governmen and Self-Employment.

ggplot(df, aes(x=workclass, fill=income)) + 
  geom_bar() +
  labs(x="Workclass", y="Count") +
  ggtitle('Income Classification by Workclass')+
  theme_classic()

Next, I also combine factors in Education column to make the data easier to interpret. “Bachelors”, “Masters”, “Doctorate”, “Prof-school” are lumped as “Higher Education”. “Assoc-acdm”, “Assoc-voc” are lumped as “Associates Degree”. “HS-grad”, “12th”, “11th”, “10th” are lumped as “High School”. I keep the “Some college” factor and the remaining factors are lumped as “Others”.

#Combining like factors of workclass column
df$education <- as.character(df$education)

df$education[df$education == "Bachelors" | df$education == "Masters" | df$education == "Doctorate" | df$education == "Prof-school"] <- "Higher Education"

df$education[df$education == "Assoc-acdm" | df$education == "Assoc-voc"] <- "Associates Degree"

df$education[df$education == "HS-grad" | df$education == "12th" | df$education == "11th" | df$education == "10th"] <- "High School"

df$education[df$education == "Some-college"] <- "Some College"

df$education[df$education == "Preschool" | df$education == "1st-4th" | df$education == "5th-6th" | df$education == "7th-8th" | df$education == "9th"] <- "Others"

unique(df$education)
[1] "Higher Education"  "High School"       "Others"            "Some College"      "Associates Degree"

In this dataset, the majority of people finished high school. However, people who have higher education (who have Bachelors degree or above) are most likely to to earn more than $50k. This result is consistent when we look at the education number. People who have 13 years of education tend to have higher income than others. Also, this observation explains the high correlation between education number and income columns as stated previously.

# plot the education distribution and sort the values
ggplot(df, aes(y = education, fill = income)) +
  geom_bar(position = "dodge") +
  scale_y_discrete(limits = rev(levels(factor(df$education)))) +
  scale_fill_manual(values = c("lightgreen", "salmon")) +
  labs(x = "Count", y = "Education Level", fill = "Income") +
  theme_classic()  

ggplot(df, aes(y = education.number,fill=income)) + 
  geom_bar() +
  labs(x = "Count", y = "Years of education", 
       title = "Income Distribution by Years of education") +
  theme_classic() 

In the martial status column, “Married-civ-spouse”, “Married-spouse-absent”, “Married-AF-spouse” are combined as “Married”, and I change “Never-married” to “Single” and keep the remaining factors.

df$marital.status <- as.character(df$marital.status)

df$marital.status[df$marital.status == "Never_married"] <- "Single"

df$marital.status[df$marital.status == "Married-civ-spouse" | df$marital.status == "Married-spouse-absent" | df$marital.status == "Married-AF-spouse"] <- "Married"

unique(df$marital.status)
[1] "Never-married" "Married"       "Divorced"      "Separated"     "Widowed"      

I did the same to the occupation where I combine the factors into “White-collar”, “Blue-collar”, “Professisonal”, “Service”, “Sales” and “Other”.

df$occupation <- as.character(df$occupation)

df$occupation[df$occupation == "Adm-clerical" | df$occupation == "Exec-managerial"] <- "White-collar"
df$occupation[df$occupation == "Handlers-cleaners" |df$occupation ==  "Transport-moving" | df$occupation == "Farming-fishing" |df$occupation == "Machine-op-inspct" |df$occupation == "Craft-repair" ] <- "Blue-collar"
df$occupation[df$occupation == "Tech-support" | df$occupation == "Protective-serv" | df$occupation == "Priv-house-serv" | df$occupation == "Other-service"] <- "Service"
df$occupation[df$occupation == "Prof-specialty"] <- "Professional"
df$occupation[df$occupation == "Armed-Forces"] <- "Other"

unique(df$occupation)
[1] "White-collar" "Blue-collar"  "Professional" "Service"      "Sales"        "Other"       

In the bar graph, people who have white collar and professional jobs are more likely to earn more than $50k.

ggplot(df, aes(y = reorder(occupation, -table(occupation)[occupation]), fill = income)) +
  geom_bar(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  labs(y = "Occupation", x = "Count", fill = "Income") 

It is noticeable that those who work 40 hours per week have much higher chance to earn higher than $50k than those who work overtime or undertime. This makes sense because those who work overtime often do blue-collar job and get paid the minimum wage so they still have a low income despite their working hours per week.

ggplot(df, aes(x= hours.per.week, fill=income)) + 
  geom_bar() +
  labs(x="Hours per week", y="Count") +
  ggtitle('Income Classification by Hours per week')+
  theme_classic()

As shown in the graph and table below, 24720 people earn less than or equal to $50000 while there are only 7841 earning more than $50000, which creates class imbalance, a common problem in classification that can affect the model accuracy.

table(df$income)

    0     1 
22654  7508 
barplot(table(df$income),main = 'Income Classification',col='pink',ylab ='Number of people')


df$income <- as.factor(df$income)
#A majority of people in this dataset earn below 50k

To perform machine learning techniques, the original dataset is split into training set, which accounts for 70% of the data, and testing set for the remaining 30% of the data. As mentioned earlier, the data imbalance in the response variable income needs to be addressed to prevent model inaccuracy. In a dataset with highly unbalanced classes, the classifier will always pick the most common one without actually performing any classification. Therefore, resampling techniques, oversampling and undersampling are used alternately to deal with class imbalance.

#split data
df$income <- as.factor(df$income)

library(caTools)
Warning: package ‘caTools’ was built under R version 4.2.3
set.seed(200)

sample <- sample.split(df$income,SplitRatio = 0.7)
train <- subset(df, sample == TRUE)
test <- subset(df, sample == FALSE)

head(test)
#Address imbalanced data 
library("ROSE")
Warning: package ‘ROSE’ was built under R version 4.2.3Loaded ROSE 0.0-4
balanced_data <- ovun.sample(income~.,data = train,method = "both")$data
print(table(train$income))

    0     1 
15858  5256 
print(table(balanced_data$income))

    0     1 
10583 10531 

Before I do resampling in the train data, only 5256 people who earn more than $50k and 15858 earn less than 50k, which will create class imbalance. However, after performing sampling technique, the number of people who earn more than $50k and those who earn less than $50k are roughly the same. This will enhance the performance of the models later on.

  1. Methods and Results

For every model created, we will use the function confusionMatrix to compare the model performance by using the accuracy, sensitivity, specificity of each model.

Logistic Regression The Logistic Regression model has the accuracy of 77.69%, sensitivity of 75.84%, and specificity of 83.26%

library('caret')
Warning: package ‘caret’ was built under R version 4.2.3Loading required package: lattice
library('lattice')
log_model <- glm(income ~ ., family = binomial(), balanced_data)
log_pred <- predict(log_model, test, type = "response")
log_pred <- ifelse(log_pred > 0.5, "1", "0")
confusionMatrix(as.factor(log_pred), as.factor(test$income))
Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 5154  377
         1 1642 1875
                                          
               Accuracy : 0.7769          
                 95% CI : (0.7681, 0.7854)
    No Information Rate : 0.7511          
    P-Value [Acc > NIR] : 5.333e-09       
                                          
                  Kappa : 0.4975          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.7584          
            Specificity : 0.8326          
         Pos Pred Value : 0.9318          
         Neg Pred Value : 0.5331          
             Prevalence : 0.7511          
         Detection Rate : 0.5696          
   Detection Prevalence : 0.6113          
      Balanced Accuracy : 0.7955          
                                          
       'Positive' Class : 0               
                                          

According to the logistic regression model, the most important features to determine whether an individual’s income is more than $50k is age, education number, hours per week, relationship and martial status.

summary(log_model)

Call:
glm(formula = income ~ ., family = binomial(), data = balanced_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.3651  -0.6007  -0.0443   0.7098   3.0277  

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -7.560940   0.451568 -16.744  < 2e-16 ***
age                           0.034344   0.001805  19.025  < 2e-16 ***
workclassPrivate              0.002873   0.054182   0.053  0.95772    
workclassSelf_Employment     -0.283572   0.069979  -4.052 5.07e-05 ***
workclassUnemployment       -12.854263  83.755868  -0.153  0.87803    
educationHigh School          0.180923   0.101129   1.789  0.07361 .  
educationHigher Education     0.285250   0.090577   3.149  0.00164 ** 
educationOthers               0.270863   0.244922   1.106  0.26876    
educationSome College         0.176595   0.082572   2.139  0.03246 *  
education.number              0.295505   0.028453  10.386  < 2e-16 ***
marital.statusMarried         0.618989   0.157202   3.938 8.23e-05 ***
marital.statusNever-married  -0.571545   0.079204  -7.216 5.35e-13 ***
marital.statusSeparated      -0.040462   0.148019  -0.273  0.78458    
marital.statusWidowed         0.081058   0.137936   0.588  0.55677    
occupationOther               0.063129   1.148453   0.055  0.95616    
occupationProfessional        0.761634   0.070892  10.744  < 2e-16 ***
occupationSales               0.654951   0.064838  10.101  < 2e-16 ***
occupationService             0.202202   0.064654   3.127  0.00176 ** 
occupationWhite-collar        0.786312   0.052995  14.837  < 2e-16 ***
relationshipNot-in-family    -0.880874   0.153632  -5.734 9.83e-09 ***
relationshipOther-relative   -0.967560   0.185017  -5.230 1.70e-07 ***
relationshipOwn-child        -2.148161   0.192497 -11.159  < 2e-16 ***
relationshipUnmarried        -1.149195   0.170763  -6.730 1.70e-11 ***
relationshipWife              1.463172   0.102542  14.269  < 2e-16 ***
raceAsian-Pac-Islander        0.212937   0.248924   0.855  0.39231    
raceBlack                    -0.285794   0.231320  -1.235  0.21665    
raceOther                    -0.551936   0.336939  -1.638  0.10140    
raceWhite                    -0.038290   0.221093  -0.173  0.86250    
sexMale                       1.156552   0.071179  16.248  < 2e-16 ***
hours.per.week                0.037388   0.001884  19.850  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 29270  on 21113  degrees of freedom
Residual deviance: 18189  on 21084  degrees of freedom
AIC: 18249

Number of Fisher Scoring iterations: 11

Random Forest

This time I build Random Forest model with the number of trees equals to 500. The Random Forest model has the accuracy of 78.15%, sensitivity of 82.86%, and specificity of 76.59%

library('randomForest')
Warning: package ‘randomForest’ was built under R version 4.2.3randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.

Attaching package: ‘randomForest’

The following object is masked from ‘package:dplyr’:

    combine

The following object is masked from ‘package:ggplot2’:

    margin
rf <- randomForest(income ~ ., data = balanced_data, ntree = 500)
rf.pred <- predict(rf, newdata = test)
confusionMatrix(as.factor(rf.pred),as.factor(test$income),positive = "1")
Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 5205  386
         1 1591 1866
                                        
               Accuracy : 0.7815        
                 95% CI : (0.7728, 0.79)
    No Information Rate : 0.7511        
    P-Value [Acc > NIR] : 6.48e-12      
                                        
                  Kappa : 0.5043        
                                        
 Mcnemar's Test P-Value : < 2.2e-16     
                                        
            Sensitivity : 0.8286        
            Specificity : 0.7659        
         Pos Pred Value : 0.5398        
         Neg Pred Value : 0.9310        
             Prevalence : 0.2489        
         Detection Rate : 0.2062        
   Detection Prevalence : 0.3821        
      Balanced Accuracy : 0.7972        
                                        
       'Positive' Class : 1             
                                        

Support Vector Machine The Support Vector Machine model has the accuracy of 76.77%, sensitivity of 87.30%, and specificity of 73.28%

library('e1071')
Warning: package ‘e1071’ was built under R version 4.2.3
svm_model <- svm(income ~ ., data = balanced_data)
svm.pred <- predict(svm_model, newdata = test)
confusionMatrix(svm.pred,test$income,positive = "1")
Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 4980  286
         1 1816 1966
                                          
               Accuracy : 0.7677          
                 95% CI : (0.7588, 0.7764)
    No Information Rate : 0.7511          
    P-Value [Acc > NIR] : 0.0001257       
                                          
                  Kappa : 0.4937          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.8730          
            Specificity : 0.7328          
         Pos Pred Value : 0.5198          
         Neg Pred Value : 0.9457          
             Prevalence : 0.2489          
         Detection Rate : 0.2173          
   Detection Prevalence : 0.4180          
      Balanced Accuracy : 0.8029          
                                          
       'Positive' Class : 1               
                                          

Decision Tree

library("rpart")
library("rpart.plot")
Warning: package ‘rpart.plot’ was built under R version 4.2.3
dec_tree <- rpart(income~.,data=balanced_data,method='class')
rpart.plot(dec_tree, box.col=c("salmon", "lavender"))

The Decision Tree model has the accuracy of 79.05%, sensitivity of 73.49%, and specificity of 80.89%

dec_tree.pred <- predict(dec_tree, newdata = test,type="class")
confusionMatrix(dec_tree.pred,test$income,positive = "1")
Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 5497  597
         1 1299 1655
                                          
               Accuracy : 0.7905          
                 95% CI : (0.7819, 0.7988)
    No Information Rate : 0.7511          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.4924          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.7349          
            Specificity : 0.8089          
         Pos Pred Value : 0.5603          
         Neg Pred Value : 0.9020          
             Prevalence : 0.2489          
         Detection Rate : 0.1829          
   Detection Prevalence : 0.3265          
      Balanced Accuracy : 0.7719          
                                          
       'Positive' Class : 1               
                                          
  1. Conclusion

Performance Comparison As the graph shows, based on accuracy, the Decision model seems to have the best performance while the Support Vector Machine model has the lowest accuracy among four models. However, the accuracy difference between models is not large.

accuracy<-data.frame(Model=c('Logistic Regression','Random Forest','Support Vector Machine','Decision Tree'),accuracy_of_models = c(0.7769,0.7815,0.7677,0.7905))
ggplot(accuracy,aes(x=Model,y=accuracy_of_models,fill=Model))+geom_bar(stat = 'identity')+ggtitle('Accuracy of each model')

Application Building the machine learning models to predict whether an individual’s income will exceed $50k or not can have a huge application in real life. It can benefit researches about income inequality and inform the government on which groups of people might not have a good living standard and need financial assistance. One limitation of this study is that because this is a classification problem, we cannot build predictive model to predict the actual income of an individual.

LS0tDQp0aXRsZTogIkFkdWx0IEluY29tZSBEYXRhc2V0IEFuYWx5c2lzIg0Kb3V0cHV0Og0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KLS0tDQoNCk5hbWU6IFV5ZW4gTmd1eWVuICANCkFkdWx0IEluY29tZSBMZXZlbCBQcmVkaWN0aW9uIHVzaW5nIE1hY2hpbmUgTGVhcm5pbmcgQ2xhc3NpZmljYXRpb24gVGVjaG5pcXVlcw0KDQpJLiBJbnRyb2R1Y3Rpb24NCg0KVGhpcyBwcm9qZWN0IHV0aWxpemVzIHRoZSBBZHVsdCBJbmNvbWUgRGF0YXNldCBmcm9tIFVDSSBNYWNoaW5lIExlYXJuaW5nIFJlcG9zaXRvcnkuIFRoZSByZXNwb25zZSB2YXJpYWJsZSBpcyBpbmNvbWUgbGV2ZWwsIHdoaWNoIGlzIGJpbmFyeSBkYXRhIHRoYXQgdGFrZXMgMiB2YWx1ZXMuIFw+NTBrIGluZGljYXRlcyBhbiBpbmRpdmlkdWFsIGVhcm5zIG1vcmUgdGhhbiBcJDUwMDAwIGFubnVhbGx5IHdoaWxlIFw8PTUwayBzcGVjaWZpZXMgdGhhdCB0aGV5IG1ha2UgbGVzcyB0aGFuIG9yIGVxdWFsIHRvIFwkNTAwMDAgYSB5ZWFyLiBUaGUgZXhwbGFuYXRvcnkgdmFyaWFibGVzIGluY2x1ZGUgYWdlLCB3b3JrY2xhc3MsIGZpbmFsIHdlaWdodCwgZWR1Y2F0aW9uLCBlZHVjYXRpb24gbnVtYmVyLCBtYXJpdGFsIHN0YXR1cywgb2NjdXBhdGlvbiwgcmVsYXRpb25zaGlwLCByYWNlLCBzZXgsIGNhcGl0YWwgZ2FpbiwgY2FwaXRhbCBsb3NzLCBob3VycyBwZXIgd2VlaywgbmF0aXZlIGNvdW50cnkuDQoNCkZpcnN0IEkgd2lsbCBwZXJmb3JtIGRhdGEgY2xlYW5pbmcsIHRoZW4gZXhwbG9yYXRvcnkgZGF0YSBhbmFseXNpcyBhbmQgZGF0YSB2aXN1YWxpemF0aW9uIHRvIGhhdmUgdGhlIGluaXRpYWwgdW5kZXJzdGFuZGluZyBvZiB0aGUgcmVsYXRpb25zIGJldHdlZW4gZGlmZmVyZW50IGZlYXR1cmVzIGFuZCBpbmNvbWUgbGV2ZWwuIEZpbmFsbHksIEkgd2lsbCB1dGlsaXplIG1hY2hpbmUgbGVhcm5pbmcgbW9kZWxzIHRvIHBlcmZvcm0gdGhlIGNsYXNzaWZpY2F0aW9uLCBuYW1lbHkgcmFuZG9tIGZvcmVzdCwgbG9naXN0aWMgcmVncmVzc2lvbiwgc3VwcG9ydCB2ZWN0b3IgbWFjaGluZSBhbmQgZGVjaXNpb24gdHJlZSB0byBmaW5kIG91dCB3aGF0IG1ldGhvZHMgYXJlIG1vc3QgZWZmZWN0aXZlIGluIHByZWRpY3RpbmcgYW4gaW5kaXZpZHVhbCdzIGluY29tZSBiYXNlZCBvbiBtYW55IGZhY3RvcnMuDQoNCklJLiBEYXRhDQoNClRvIGJlZ2luIHdpdGgsIEkgd2lsbCB1cGxvYWQgdGhlIG5lY2Vzc2FyeSBsaWJyYXJ5IGFuZCBsb2FkIHRoZSBjc3YgZmlsZSBvZiB0aGUgZGF0YXNldCBpbnRvIFJTdHVkaW8gYW5kIGNoZWNrIHRoZSBmaXJzdCA2IHJvd3MgdG8gZW5zdXJlIHRoZSBkYXRhc2V0IGlzIGNvcnJlY3QuIFRoZXJlIGFyZSAzMjU2MSByb3dzIGFuZCAxNSBjb2x1bW5zLg0KDQpgYGB7cn0NCiNpbXBvcnQgbGlicmFyeQ0KbGlicmFyeSgiZ2dwbG90MiIpDQpsaWJyYXJ5KCJkcGx5ciIpDQoNCiNsb2FkIGZpbGUNCmRmIDwtIHJlYWQuY3N2KCdDOi9Vc2Vycy9HaWEgVXllbi9Eb3dubG9hZHMvQWR1bHRfSW5jb21lLmNzdicpDQpwcmludChoZWFkKGRmKSkNCmBgYA0KDQpgYGB7cn0NCmRpbShkZikNCmBgYA0KDQpDbGVhbmluZyBkYXRhDQoNCkl0IGlzIG5vdGljZWQgdGhhdCB0aGVyZSBhcmUgc29tZSBjb2x1bW5zIGxhYmVsZWQgd2l0aCB0aGUgd3JvbmcgZGF0YSB0eXBlcy4gVG8gZml4IGl0LCBJIGNyZWF0ZSBhIGxpc3Qgb2YgY29sdW1ucyB3aXRoIHdyb25nIGRhdGEgdHlwZXMgYW5kIGNvbnZlcnQgdGhlbSBmcm9tIGNoYXJhY3RlciB0byBmYWN0b3IuIEFsc28sIGl0IHR1cm5zIG91dCB0aGF0IHRoZSBtaXNzaW5nIHZhbHVlcyBhcmUgbm90IGxhYmVsZWQgYXMgTkEgYnV0IGluc3RlYWQgd2l0aCBhIHF1ZXN0aW9uIG1hcmsgIj8iLiBUaHVzLCBJIHJlcGxhY2UgIj8iIHZhbHVlcyB3aXRoIE5BIGFuZCB0aGVuIGV4YW1pbmUgdGhlIHBlcmNlbnRhZ2Ugb2YgTkEgdmFsdWVzIGluIHRoZSBkYXRhc2V0LiBOYSB2YWx1ZXMgYXBwZWFyIGluIHdvcmtjbGFzcywgb2NjdXBhdGlvbiBhbmQgbmF0aXZlIGNvdW50cnkgY29sdW1ucywgd2l0aCA1LjYzJSwgNS42NiUgYW5kIDEuNzklIHJlc3BlY3RpdmVseS4gQmVjYXVzZSB0aGUgTkEgdmFsdWVzIG9ubHkgYWNjb3VudCBmb3IgYSBzbWFsbCBwZXJjZW50YWdlLCB0aGV5IGFyZSByZW1vdmVkIGZyb20gdGhlIGRhdGFzZXQuIEkgYWxzbyByZW1vdmVkIGV4dHJhIHdoaXRlc3BhY2UgcHJlc2VudCBpbiBtYW55IHZhbHVlcyB0byBhdm9pZCBjcmVhdGluZyBjb25mdXNpb24gd2hlbiBmaWx0ZXJpbmcgb3V0IGRhdGEuDQoNCmBgYHtyfQ0KI01pc3NpbmcgdmFsdWVzDQpzdW0oaXMubmEoZGYpKQ0KYGBgDQoNCmBgYHtyfQ0KI3JlcGxhY2UgPyB3aXRoIE5BDQpkZltkZiA9PSAiID8iXSA8LSBOQQ0KDQojQ2FsY3VsYXRlIE5BIHBlcmNlbnRhZ2UgaW4gdGhlIGRhdGFzZXQNCnByaW50KHNhcHBseShkZiwgZnVuY3Rpb24oZGYpeyBzdW0oaXMubmEoZGYpPT1UKSAqIDEwMCAvbGVuZ3RoKGRmKSB9KSkNCmBgYA0KDQpgYGB7cn0NCiNCZWNhdXNlIHRoZSBOQSB2YWx1ZXMgb25seSBhY2NvdW50IGZvciBtb3JlIHRoYW4gNSUgYXQgbWF4LCBzbyB3ZSBjYW4gcmVtb3ZlIGl0DQpkZiA8LSBuYS5vbWl0KGRmKQ0KDQpzdW0oaXMubmEoZGYpKQ0KYGBgDQoNCmBgYHtyfQ0KI3JlbW92ZSB3aGl0ZXNwYWNlIGZyb20gdGhlIGJlZ2lubmluZyBvZiB0aGUgc3RyaW5nIG9mIGVhY2ggY29sdW1uDQpkZltdPC0gbGFwcGx5KGRmLHRyaW13cykNCmBgYA0KDQpgYGB7cn0NCiNjaGFuZ2UgZGF0YSB0eXBlcyB0byBudW1lcmljIGFuZCBmYWN0b3INCm51bV92YXIgPC0gYygiYWdlIiwiZmluYWwud2VpZ2h0IiwiZWR1Y2F0aW9uLm51bWJlciIsDQogICAgICAgICAgICAgImNhcGl0YWwuZ2FpbiIsImNhcGl0YWwubG9zcyIsImhvdXJzLnBlci53ZWVrIikNCmRmW251bV92YXJdIDwtIHNhcHBseShkZltudW1fdmFyXSwgYXMubnVtZXJpYykNCg0KY2F0ZWdfdmFyIDwtIGMoIndvcmtjbGFzcyIsImVkdWNhdGlvbiIsIm1hcml0YWwuc3RhdHVzIiwNCiAgICAgICAgICAgICAgICJvY2N1cGF0aW9uIiwicmVsYXRpb25zaGlwIiwicmFjZSIsDQogICAgICAgICAgICAgICAic2V4IiwibmF0aXZlLmNvdW50cnkiKQ0KZGZbLGNhdGVnX3Zhcl0gPC0gbGFwcGx5KGRmWyxjYXRlZ192YXJdLGZhY3RvcikNCg0Kc3RyKGRmKQ0KYGBgDQoNClRoZSBncmFwaCBiZWxvdyBzaG93cyB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiB0aGUgaW5jb21lIGxldmVsIGFuZCBxdWFudGl0YXRpdmUgZmVhdHVyZXMsIGluY2x1ZGluZyBhZ2UsIGZpbmFsIHdlaWdodCwgZWR1Y2F0aW9uIG51bWJlciwgY2FwaXRhbCBnYWluLCBjYXBpdGFsIGxvc3MsIGhvdXJzIHBlciB3ZWVrLiBTaW5jZSB0aGUgaW5jb21lIGNvbHVtbiBpcyBjYXRlZ29yaWNhbCBkYXRhLCBJIHR1cm5lZCB0aGUgZGF0YSBpbnRvIDAgYW5kIDEgdG8gYmUgYWJsZSB0byBjb21wYXJlIHdpdGggb3RoZXIgcXVhbnRpdGF0aXZlIGZlYXR1cmVzLCBpbiB3aGljaCAwIHJlcHJlc2VudHMgaW5jb21lIGxlc3MgdGhhbiBvciBlcXVhbCB0byBcJDUwayBhbmQgMSByZXByZXNlbnRzIGluY29tZSBtb3JlIHRoYW4gXCQ1MGsuIFRoZSBjb3JyZWxhdGlvbiBoZWF0bWFwIHNob3dzIHRoYXQgdGhlIGZpbmFsIHdlaWdodCBkb2VzIG5vdCBoYXZlIHRoZSBsZWFzdCBjb3JyZWxhdGlvbiB3aXRoIGluY29tZSwgd2l0aCByID0gLTAuMDEgd2hpbGUgZWR1Y2F0aW9uIG51bWJlciBhcHBlYXJzIHRvIGJlIG1vcmUgY29ycmVsYXRlZCB3aXRoIGluY29tZSAociA9IDAuMzQpLg0KDQpgYGB7cn0NCmNvcnJfdmFyIDwtIGMoImFnZSIsImZpbmFsLndlaWdodCIsImVkdWNhdGlvbi5udW1iZXIiLA0KICAgICAgICAgICAgICJjYXBpdGFsLmdhaW4iLCJjYXBpdGFsLmxvc3MiLCJob3Vycy5wZXIud2VlayIsImluY29tZSIpDQpkZiRpbmNvbWU8LWlmZWxzZShkZiRpbmNvbWUgPT0nPD01MEsnLDAsMSkNCmRmJGluY29tZSA8LSBhcy5udW1lcmljKGRmJGluY29tZSkNCmNvcnJlbGF0aW9uIDwtIHJvdW5kKGNvcihkZltjb3JyX3Zhcl0pLDIpDQpjb3JyZWxhdGlvbg0KYGBgDQoNCmBgYHtyfQ0KbGlicmFyeShyZXNoYXBlMikNCm1lbHRlZF9jb3JyIDwtIG1lbHQoY29ycmVsYXRpb24pDQpnZ3Bsb3QoZGF0YSA9IG1lbHRlZF9jb3JyLCBhZXMoeD1WYXIxLCB5PVZhcjIsIGZpbGw9dmFsdWUpKSArIA0KICBnZW9tX3RpbGUoKQ0KYGBgDQoNCkJlY2F1c2UgdGhlIGZpbmFsIHdlaWdodCBjb2x1bW4gZG9lc24ndCBoYXZlIG11Y2ggaW1wYWN0IHdoaWxlIGNhcGl0YWwgZ2FpbiBhbmQgY2FwaXRhbCBsb3NzIGNvbHVtbnMgY29udGFpbiBtYW55IHplcm8gdmFsdWVzLCBJIHJlbW92ZWQgdGhlIGNvbHVtbnMgb3V0IG9mIHRoZSBkYXRhc2V0LiBBZGRpdGlvbmFsbHksIEkgcmVtb3ZlZCB0aGUgbmF0aXZlIGNvdW50cnkgY29sdW1uIGJlY2F1c2UgYSBtYWpvcml0eSBvZiBwZW9wbGUgY29tZSBmcm9tIHRoZSBVbml0ZWQgU3RhdGVzIGFuZCBvbmx5IGEgc21hbGwgbnVtYmVyIGNvbWUgZnJvbSBvdGhlciBjb3VudHJpZXMuDQoNCmBgYHtyfQ0KI0JlY2F1c2UgZmluYWwgd2VpZ2h0IGNvbHVtbiBkb2Vzbid0IGhhdmUgbXVjaCBpbXBhY3QgYW5kIGNhcGl0YWwgZ2FpbiBhbmQgY2FwaXRhbCBsb3NzIGNvbHVtbnMgY29udGFpbiBtYW55IHplcm8gdmFsdWUsIHNvIHdlIHdpbGwgcmVtb3ZlIHRoZSBjb2x1bW5zIG91dCBvZiB0aGUgZGF0YXNldA0KZGYkZmluYWwud2VpZ2h0ID0gTlVMTA0KZGYkY2FwaXRhbC5nYWluID0gTlVMTA0KZGYkY2FwaXRhbC5sb3NzID0gTlVMTA0KZGYkbmF0aXZlLmNvdW50cnkgPSBOVUxMDQoNCmBgYA0KDQpgYGB7cn0NCnN1bW1hcnkoZGYpDQpgYGANCg0KQWdlIHZzIEluY29tZQ0KDQpUaGUgZmlndXJlIGJlbG93IHNob3dzIHRoYXQgdGhlIGFnZSBkaXN0cmlidXRpb24gc2tld3MgcmlnaHQsIHdpdGggYSBtYWpvcml0eSBvZiB0aGUgZW50cmllcyBoYXZpbmcgYWdlcyBiZXR3ZWVuIDI3IGFuZCA1MC4NCg0KYGBge3J9DQpkZiRhZ2UgPSBhcy5udW1lcmljKGRmJGFnZSkNCg0KaGlzdChkZiRhZ2UsY29sID0gJ2xhdmVuZGVyJywgbWFpbiA9ICJBZ2UgRGlzdHJpYnV0aW9uIiwgDQogICAgIHhsYWIgPSAiQWdlIiwgeWxhYiA9ICJOdW1iZXIgb2YgcGVvcGxlIixicmVha3MgPSAxMDAscHJvYiA9IFQpDQphYmxpbmUodj1xdWFudGlsZShkZiRhZ2UsIC4yNSksIGNvbD0ncmVkJywgbHdkID0gMiwgbHR5ID0gJ2Rhc2hlZCcpDQphYmxpbmUodj1xdWFudGlsZShkZiRhZ2UsIC41MCksIGNvbD0ncmVkJywgbHdkID0gMiwgbHR5ID0gJ2Rhc2hlZCcpDQphYmxpbmUodj1xdWFudGlsZShkZiRhZ2UsIC43NSksIGNvbD0ncmVkJywgbHdkID0gMiwgbHR5ID0gJ2Rhc2hlZCcpDQpsaW5lcyhkZW5zaXR5KGRmJGFnZSksY29sPSdwdXJwbGUnLGx3ZCA9IDIpDQpgYGANCg0KVGhlIHN0YWNrZWQgaGlzdG9ncmFtIGFuZCBib3hwbG90IGJlbG93IGlsbHVzdHJhdGUgdGhlIGluY29tZSBkaXN0cmlidXRpb24gYnkgYWdlLiBNb3N0IHBlb3BsZSB3aG8gbWFrZSBtb3JlIHRoYW4gXCQ1MGsgYXJlIGJldHdlZW4gdGhlaXIgbWlkIHRoaXJ0aWVzIHRvIG1pZCBmaWZ0aWVzLg0KDQpgYGB7cn0NCmRmJGluY29tZSA8LSBhcy5mYWN0b3IoZGYkaW5jb21lKQ0KZ2dwbG90KGRmLCBhZXMoeD1hZ2UsIGZpbGw9aW5jb21lKSkgKyANCiAgZ2VvbV9oaXN0b2dyYW0oYWxwaGE9MC41LCBiaW5zPTMwLCBjb2xvciA9ICdibGFjaycpICsNCiAgZ2VvbV9kZW5zaXR5KGFlcyh5PWFmdGVyX3N0YXQoZGVuc2l0eSksIGZpbGw9aW5jb21lKSwgYWxwaGE9MC41KSsNCiAgc2NhbGVfZmlsbF9tYW51YWwodmFsdWVzPWMoImxpZ2h0Z3JlZW4iLCAic2FsbW9uIikpICsNCiAgbGFicyh4PSJBZ2UiLCB5PSJDb3VudCIpICsNCiAgZ2d0aXRsZSgnSW5jb21lIENsYXNzaWZpY2F0aW9uIGJ5IEFnZScpDQogIHRoZW1lX2NsYXNzaWMoKQ0KYGBgDQoNCmBgYHtyfQ0KZ2dwbG90KGRmLCBhZXMoeD0gaW5jb21lLCB5PWFnZSkpICsgDQogIGdlb21fYm94cGxvdChmaWxsPSJzYWxtb24iKSArDQogIGxhYnMoeD0iSW5jb21lIiwgeT0iQWdlIikgKw0KICBnZ3RpdGxlKCJBZ2UgRGlzdHJpYnV0aW9uIGJ5IEluY29tZSBMZXZlbHMiKSArDQogIHRoZW1lX2NsYXNzaWMoKQ0KICANCmBgYA0KDQpGb3IgdGhlIHdvcmtjbGFzcyBjb2x1bW4sIEkgY29tYmluZSAiU3RhdGUtZ292IiwgIkxvY2FsLWdvdiIsICJGZWRlcmFsLWdvdiIgYXMgIkdvdmVybm1lbnQiLCBhbmQgIlNlbGYtZW1wLW5vdC1pbmMiLCAiU2VsZi1lbXAtaW5jIiBhcyAiU2VsZi1lbXBsb3ltZW50IiwgYW5kIGNoYW5nZSAiV2l0aG91dC1wYXkiIHRvICJVbmVtcGxveW1lbnQiLiBUaGlzIHdvdWxkIGFsbG93IG1lIHRvIGludGVycHJldCB0aGUgbW9kZWxzIGxhdGUgb24gZWFzaWVyLg0KDQpgYGB7cn0NCiNDb21iaW5pbmcgbGlrZSBmYWN0b3JzIG9mIHdvcmtjbGFzcyBjb2x1bW4NCmRmJHdvcmtjbGFzcyA8LSBhcy5jaGFyYWN0ZXIoZGYkd29ya2NsYXNzKQ0KDQpkZiR3b3JrY2xhc3NbZGYkd29ya2NsYXNzID09ICJTdGF0ZS1nb3YiIHwgZGYkd29ya2NsYXNzID09ICJMb2NhbC1nb3YiIHwgZGYkd29ya2NsYXNzID09ICJGZWRlcmFsLWdvdiJdIDwtICJHb3Zlcm5tZW50Ig0KDQpkZiR3b3JrY2xhc3NbZGYkd29ya2NsYXNzID09ICJTZWxmLWVtcC1ub3QtaW5jIiB8IGRmJHdvcmtjbGFzcyA9PSAiU2VsZi1lbXAtaW5jIl0gPC0gIlNlbGZfRW1wbG95bWVudCINCg0KZGYkd29ya2NsYXNzW2RmJHdvcmtjbGFzcyA9PSAiV2l0aG91dC1wYXkiXSA8LSAiVW5lbXBsb3ltZW50Ig0KDQp1bmlxdWUoZGYkd29ya2NsYXNzKQ0KYGBgDQoNCkZyb20gdGhlIGdyYXBoLCBwZW9wbGUgd2hvIHdvcmsgZm9yIHByaXZhdGUgY29ycG9yYXRpb25zIHRlbmQgdG8gZWFybiBpbmNvbWUgbW9yZSB0aGFuIFwkNTBrIHRoZSBtb3N0LiBUaGVyZSBzZWVtcyB0byBiZSBubyBkaXN0aW5ndWlzaGFibGUgZGlmZmVyZW5jZSBiZXR3ZWVuIHdvcmtpbmcgZm9yIEdvdmVybm1lbiBhbmQgU2VsZi1FbXBsb3ltZW50Lg0KDQpgYGB7cn0NCmdncGxvdChkZiwgYWVzKHg9d29ya2NsYXNzLCBmaWxsPWluY29tZSkpICsgDQogIGdlb21fYmFyKCkgKw0KICBsYWJzKHg9IldvcmtjbGFzcyIsIHk9IkNvdW50IikgKw0KICBnZ3RpdGxlKCdJbmNvbWUgQ2xhc3NpZmljYXRpb24gYnkgV29ya2NsYXNzJykrDQogIHRoZW1lX2NsYXNzaWMoKQ0KYGBgDQoNCk5leHQsIEkgYWxzbyBjb21iaW5lIGZhY3RvcnMgaW4gRWR1Y2F0aW9uIGNvbHVtbiB0byBtYWtlIHRoZSBkYXRhIGVhc2llciB0byBpbnRlcnByZXQuICJCYWNoZWxvcnMiLCAiTWFzdGVycyIsICJEb2N0b3JhdGUiLCAiUHJvZi1zY2hvb2wiIGFyZSBsdW1wZWQgYXMgIkhpZ2hlciBFZHVjYXRpb24iLiAiQXNzb2MtYWNkbSIsICJBc3NvYy12b2MiIGFyZSBsdW1wZWQgYXMgIkFzc29jaWF0ZXMgRGVncmVlIi4gIkhTLWdyYWQiLCAiMTJ0aCIsICIxMXRoIiwgIjEwdGgiIGFyZSBsdW1wZWQgYXMgIkhpZ2ggU2Nob29sIi4gSSBrZWVwIHRoZSAiU29tZSBjb2xsZWdlIiBmYWN0b3IgYW5kIHRoZSByZW1haW5pbmcgZmFjdG9ycyBhcmUgbHVtcGVkIGFzICJPdGhlcnMiLg0KDQpgYGB7cn0NCiNDb21iaW5pbmcgbGlrZSBmYWN0b3JzIG9mIHdvcmtjbGFzcyBjb2x1bW4NCmRmJGVkdWNhdGlvbiA8LSBhcy5jaGFyYWN0ZXIoZGYkZWR1Y2F0aW9uKQ0KDQpkZiRlZHVjYXRpb25bZGYkZWR1Y2F0aW9uID09ICJCYWNoZWxvcnMiIHwgZGYkZWR1Y2F0aW9uID09ICJNYXN0ZXJzIiB8IGRmJGVkdWNhdGlvbiA9PSAiRG9jdG9yYXRlIiB8IGRmJGVkdWNhdGlvbiA9PSAiUHJvZi1zY2hvb2wiXSA8LSAiSGlnaGVyIEVkdWNhdGlvbiINCg0KZGYkZWR1Y2F0aW9uW2RmJGVkdWNhdGlvbiA9PSAiQXNzb2MtYWNkbSIgfCBkZiRlZHVjYXRpb24gPT0gIkFzc29jLXZvYyJdIDwtICJBc3NvY2lhdGVzIERlZ3JlZSINCg0KZGYkZWR1Y2F0aW9uW2RmJGVkdWNhdGlvbiA9PSAiSFMtZ3JhZCIgfCBkZiRlZHVjYXRpb24gPT0gIjEydGgiIHwgZGYkZWR1Y2F0aW9uID09ICIxMXRoIiB8IGRmJGVkdWNhdGlvbiA9PSAiMTB0aCJdIDwtICJIaWdoIFNjaG9vbCINCg0KZGYkZWR1Y2F0aW9uW2RmJGVkdWNhdGlvbiA9PSAiU29tZS1jb2xsZWdlIl0gPC0gIlNvbWUgQ29sbGVnZSINCg0KZGYkZWR1Y2F0aW9uW2RmJGVkdWNhdGlvbiA9PSAiUHJlc2Nob29sIiB8IGRmJGVkdWNhdGlvbiA9PSAiMXN0LTR0aCIgfCBkZiRlZHVjYXRpb24gPT0gIjV0aC02dGgiIHwgZGYkZWR1Y2F0aW9uID09ICI3dGgtOHRoIiB8IGRmJGVkdWNhdGlvbiA9PSAiOXRoIl0gPC0gIk90aGVycyINCg0KdW5pcXVlKGRmJGVkdWNhdGlvbikNCg0KYGBgDQoNCkluIHRoaXMgZGF0YXNldCwgdGhlIG1ham9yaXR5IG9mIHBlb3BsZSBmaW5pc2hlZCBoaWdoIHNjaG9vbC4gSG93ZXZlciwgcGVvcGxlIHdobyBoYXZlIGhpZ2hlciBlZHVjYXRpb24gKHdobyBoYXZlIEJhY2hlbG9ycyBkZWdyZWUgb3IgYWJvdmUpIGFyZSBtb3N0IGxpa2VseSB0byB0byBlYXJuIG1vcmUgdGhhbiBcJDUway4gVGhpcyByZXN1bHQgaXMgY29uc2lzdGVudCB3aGVuIHdlIGxvb2sgYXQgdGhlIGVkdWNhdGlvbiBudW1iZXIuIFBlb3BsZSB3aG8gaGF2ZSAxMyB5ZWFycyBvZiBlZHVjYXRpb24gdGVuZCB0byBoYXZlIGhpZ2hlciBpbmNvbWUgdGhhbiBvdGhlcnMuIEFsc28sIHRoaXMgb2JzZXJ2YXRpb24gZXhwbGFpbnMgdGhlIGhpZ2ggY29ycmVsYXRpb24gYmV0d2VlbiBlZHVjYXRpb24gbnVtYmVyIGFuZCBpbmNvbWUgY29sdW1ucyBhcyBzdGF0ZWQgcHJldmlvdXNseS4NCg0KYGBge3J9DQojIHBsb3QgdGhlIGVkdWNhdGlvbiBkaXN0cmlidXRpb24gYW5kIHNvcnQgdGhlIHZhbHVlcw0KZ2dwbG90KGRmLCBhZXMoeSA9IGVkdWNhdGlvbiwgZmlsbCA9IGluY29tZSkpICsNCiAgZ2VvbV9iYXIocG9zaXRpb24gPSAiZG9kZ2UiKSArDQogIHNjYWxlX3lfZGlzY3JldGUobGltaXRzID0gcmV2KGxldmVscyhmYWN0b3IoZGYkZWR1Y2F0aW9uKSkpKSArDQogIHNjYWxlX2ZpbGxfbWFudWFsKHZhbHVlcyA9IGMoImxpZ2h0Z3JlZW4iLCAic2FsbW9uIikpICsNCiAgbGFicyh4ID0gIkNvdW50IiwgeSA9ICJFZHVjYXRpb24gTGV2ZWwiLCBmaWxsID0gIkluY29tZSIpICsNCiAgdGhlbWVfY2xhc3NpYygpICANCmBgYA0KDQpgYGB7cn0NCmdncGxvdChkZiwgYWVzKHkgPSBlZHVjYXRpb24ubnVtYmVyLGZpbGw9aW5jb21lKSkgKyANCiAgZ2VvbV9iYXIoKSArDQogIGxhYnMoeCA9ICJDb3VudCIsIHkgPSAiWWVhcnMgb2YgZWR1Y2F0aW9uIiwgDQogICAgICAgdGl0bGUgPSAiSW5jb21lIERpc3RyaWJ1dGlvbiBieSBZZWFycyBvZiBlZHVjYXRpb24iKSArDQogIHRoZW1lX2NsYXNzaWMoKSANCmBgYA0KDQpJbiB0aGUgbWFydGlhbCBzdGF0dXMgY29sdW1uLCAiTWFycmllZC1jaXYtc3BvdXNlIiwgIk1hcnJpZWQtc3BvdXNlLWFic2VudCIsICJNYXJyaWVkLUFGLXNwb3VzZSIgYXJlIGNvbWJpbmVkIGFzICJNYXJyaWVkIiwgYW5kIEkgY2hhbmdlICJOZXZlci1tYXJyaWVkIiB0byAiU2luZ2xlIiBhbmQga2VlcCB0aGUgcmVtYWluaW5nIGZhY3RvcnMuDQoNCmBgYHtyfQ0KZGYkbWFyaXRhbC5zdGF0dXMgPC0gYXMuY2hhcmFjdGVyKGRmJG1hcml0YWwuc3RhdHVzKQ0KDQpkZiRtYXJpdGFsLnN0YXR1c1tkZiRtYXJpdGFsLnN0YXR1cyA9PSAiTmV2ZXJfbWFycmllZCJdIDwtICJTaW5nbGUiDQoNCmRmJG1hcml0YWwuc3RhdHVzW2RmJG1hcml0YWwuc3RhdHVzID09ICJNYXJyaWVkLWNpdi1zcG91c2UiIHwgZGYkbWFyaXRhbC5zdGF0dXMgPT0gIk1hcnJpZWQtc3BvdXNlLWFic2VudCIgfCBkZiRtYXJpdGFsLnN0YXR1cyA9PSAiTWFycmllZC1BRi1zcG91c2UiXSA8LSAiTWFycmllZCINCg0KdW5pcXVlKGRmJG1hcml0YWwuc3RhdHVzKQ0KYGBgDQoNCkkgZGlkIHRoZSBzYW1lIHRvIHRoZSBvY2N1cGF0aW9uIHdoZXJlIEkgY29tYmluZSB0aGUgZmFjdG9ycyBpbnRvICJXaGl0ZS1jb2xsYXIiLCAiQmx1ZS1jb2xsYXIiLCAiUHJvZmVzc2lzb25hbCIsICJTZXJ2aWNlIiwgIlNhbGVzIiBhbmQgIk90aGVyIi4NCg0KYGBge3J9DQpkZiRvY2N1cGF0aW9uIDwtIGFzLmNoYXJhY3RlcihkZiRvY2N1cGF0aW9uKQ0KDQpkZiRvY2N1cGF0aW9uW2RmJG9jY3VwYXRpb24gPT0gIkFkbS1jbGVyaWNhbCIgfCBkZiRvY2N1cGF0aW9uID09ICJFeGVjLW1hbmFnZXJpYWwiXSA8LSAiV2hpdGUtY29sbGFyIg0KZGYkb2NjdXBhdGlvbltkZiRvY2N1cGF0aW9uID09ICJIYW5kbGVycy1jbGVhbmVycyIgfGRmJG9jY3VwYXRpb24gPT0gICJUcmFuc3BvcnQtbW92aW5nIiB8IGRmJG9jY3VwYXRpb24gPT0gIkZhcm1pbmctZmlzaGluZyIgfGRmJG9jY3VwYXRpb24gPT0gIk1hY2hpbmUtb3AtaW5zcGN0IiB8ZGYkb2NjdXBhdGlvbiA9PSAiQ3JhZnQtcmVwYWlyIiBdIDwtICJCbHVlLWNvbGxhciINCmRmJG9jY3VwYXRpb25bZGYkb2NjdXBhdGlvbiA9PSAiVGVjaC1zdXBwb3J0IiB8IGRmJG9jY3VwYXRpb24gPT0gIlByb3RlY3RpdmUtc2VydiIgfCBkZiRvY2N1cGF0aW9uID09ICJQcml2LWhvdXNlLXNlcnYiIHwgZGYkb2NjdXBhdGlvbiA9PSAiT3RoZXItc2VydmljZSJdIDwtICJTZXJ2aWNlIg0KZGYkb2NjdXBhdGlvbltkZiRvY2N1cGF0aW9uID09ICJQcm9mLXNwZWNpYWx0eSJdIDwtICJQcm9mZXNzaW9uYWwiDQpkZiRvY2N1cGF0aW9uW2RmJG9jY3VwYXRpb24gPT0gIkFybWVkLUZvcmNlcyJdIDwtICJPdGhlciINCg0KdW5pcXVlKGRmJG9jY3VwYXRpb24pDQpgYGANCg0KSW4gdGhlIGJhciBncmFwaCwgcGVvcGxlIHdobyBoYXZlIHdoaXRlIGNvbGxhciBhbmQgcHJvZmVzc2lvbmFsIGpvYnMgYXJlIG1vcmUgbGlrZWx5IHRvIGVhcm4gbW9yZSB0aGFuIFwkNTBrLg0KDQpgYGB7cn0NCmdncGxvdChkZiwgYWVzKHkgPSByZW9yZGVyKG9jY3VwYXRpb24sIC10YWJsZShvY2N1cGF0aW9uKVtvY2N1cGF0aW9uXSksIGZpbGwgPSBpbmNvbWUpKSArDQogIGdlb21fYmFyKHBvc2l0aW9uID0gImRvZGdlIikgKw0KICBzY2FsZV9maWxsX2JyZXdlcihwYWxldHRlID0gIlNldDIiKSArDQogIGxhYnMoeSA9ICJPY2N1cGF0aW9uIiwgeCA9ICJDb3VudCIsIGZpbGwgPSAiSW5jb21lIikgDQpgYGANCg0KSXQgaXMgbm90aWNlYWJsZSB0aGF0IHRob3NlIHdobyB3b3JrIDQwIGhvdXJzIHBlciB3ZWVrIGhhdmUgbXVjaCBoaWdoZXIgY2hhbmNlIHRvIGVhcm4gaGlnaGVyIHRoYW4gXCQ1MGsgdGhhbiB0aG9zZSB3aG8gd29yayBvdmVydGltZSBvciB1bmRlcnRpbWUuIFRoaXMgbWFrZXMgc2Vuc2UgYmVjYXVzZSB0aG9zZSB3aG8gd29yayBvdmVydGltZSBvZnRlbiBkbyBibHVlLWNvbGxhciBqb2IgYW5kIGdldCBwYWlkIHRoZSBtaW5pbXVtIHdhZ2Ugc28gdGhleSBzdGlsbCBoYXZlIGEgbG93IGluY29tZSBkZXNwaXRlIHRoZWlyIHdvcmtpbmcgaG91cnMgcGVyIHdlZWsuDQoNCmBgYHtyfQ0KZ2dwbG90KGRmLCBhZXMoeD0gaG91cnMucGVyLndlZWssIGZpbGw9aW5jb21lKSkgKyANCiAgZ2VvbV9iYXIoKSArDQogIGxhYnMoeD0iSG91cnMgcGVyIHdlZWsiLCB5PSJDb3VudCIpICsNCiAgZ2d0aXRsZSgnSW5jb21lIENsYXNzaWZpY2F0aW9uIGJ5IEhvdXJzIHBlciB3ZWVrJykrDQogIHRoZW1lX2NsYXNzaWMoKQ0KYGBgDQoNCkFzIHNob3duIGluIHRoZSBncmFwaCBhbmQgdGFibGUgYmVsb3csIDI0NzIwIHBlb3BsZSBlYXJuIGxlc3MgdGhhbiBvciBlcXVhbCB0byBcJDUwMDAwIHdoaWxlIHRoZXJlIGFyZSBvbmx5IDc4NDEgZWFybmluZyBtb3JlIHRoYW4gXCQ1MDAwMCwgd2hpY2ggY3JlYXRlcyBjbGFzcyBpbWJhbGFuY2UsIGEgY29tbW9uIHByb2JsZW0gaW4gY2xhc3NpZmljYXRpb24gdGhhdCBjYW4gYWZmZWN0IHRoZSBtb2RlbCBhY2N1cmFjeS4NCg0KYGBge3J9DQp0YWJsZShkZiRpbmNvbWUpDQpgYGANCg0KYGBge3J9DQpiYXJwbG90KHRhYmxlKGRmJGluY29tZSksbWFpbiA9ICdJbmNvbWUgQ2xhc3NpZmljYXRpb24nLGNvbD0ncGluaycseWxhYiA9J051bWJlciBvZiBwZW9wbGUnKQ0KDQpkZiRpbmNvbWUgPC0gYXMuZmFjdG9yKGRmJGluY29tZSkNCiNBIG1ham9yaXR5IG9mIHBlb3BsZSBpbiB0aGlzIGRhdGFzZXQgZWFybiBiZWxvdyA1MGsNCmBgYA0KDQpUbyBwZXJmb3JtIG1hY2hpbmUgbGVhcm5pbmcgdGVjaG5pcXVlcywgdGhlIG9yaWdpbmFsIGRhdGFzZXQgaXMgc3BsaXQgaW50byB0cmFpbmluZyBzZXQsIHdoaWNoIGFjY291bnRzIGZvciA3MCUgb2YgdGhlIGRhdGEsIGFuZCB0ZXN0aW5nIHNldCBmb3IgdGhlIHJlbWFpbmluZyAzMCUgb2YgdGhlIGRhdGEuIEFzIG1lbnRpb25lZCBlYXJsaWVyLCB0aGUgZGF0YSBpbWJhbGFuY2UgaW4gdGhlIHJlc3BvbnNlIHZhcmlhYmxlIGluY29tZSBuZWVkcyB0byBiZSBhZGRyZXNzZWQgdG8gcHJldmVudCBtb2RlbCBpbmFjY3VyYWN5LiBJbiBhIGRhdGFzZXQgd2l0aCBoaWdobHkgdW5iYWxhbmNlZCBjbGFzc2VzLCB0aGUgY2xhc3NpZmllciB3aWxsIGFsd2F5cyBwaWNrIHRoZSBtb3N0IGNvbW1vbiBvbmUgd2l0aG91dCBhY3R1YWxseSBwZXJmb3JtaW5nIGFueSBjbGFzc2lmaWNhdGlvbi4gVGhlcmVmb3JlLCByZXNhbXBsaW5nIHRlY2huaXF1ZXMsIG92ZXJzYW1wbGluZyBhbmQgdW5kZXJzYW1wbGluZyBhcmUgdXNlZCBhbHRlcm5hdGVseSB0byBkZWFsIHdpdGggY2xhc3MgaW1iYWxhbmNlLg0KDQpgYGB7cn0NCiNzcGxpdCBkYXRhDQpkZiRpbmNvbWUgPC0gYXMuZmFjdG9yKGRmJGluY29tZSkNCg0KbGlicmFyeShjYVRvb2xzKQ0Kc2V0LnNlZWQoMjAwKQ0KDQpzYW1wbGUgPC0gc2FtcGxlLnNwbGl0KGRmJGluY29tZSxTcGxpdFJhdGlvID0gMC43KQ0KdHJhaW4gPC0gc3Vic2V0KGRmLCBzYW1wbGUgPT0gVFJVRSkNCnRlc3QgPC0gc3Vic2V0KGRmLCBzYW1wbGUgPT0gRkFMU0UpDQoNCmhlYWQodGVzdCkNCmBgYA0KDQpgYGB7cn0NCiNBZGRyZXNzIGltYmFsYW5jZWQgZGF0YSANCmxpYnJhcnkoIlJPU0UiKQ0KYmFsYW5jZWRfZGF0YSA8LSBvdnVuLnNhbXBsZShpbmNvbWV+LixkYXRhID0gdHJhaW4sbWV0aG9kID0gImJvdGgiKSRkYXRhDQpwcmludCh0YWJsZSh0cmFpbiRpbmNvbWUpKQ0KcHJpbnQodGFibGUoYmFsYW5jZWRfZGF0YSRpbmNvbWUpKQ0KYGBgDQoNCkJlZm9yZSBJIGRvIHJlc2FtcGxpbmcgaW4gdGhlIHRyYWluIGRhdGEsIG9ubHkgNTI1NiBwZW9wbGUgd2hvIGVhcm4gbW9yZSB0aGFuIFwkNTBrIGFuZCAxNTg1OCBlYXJuIGxlc3MgdGhhbiA1MGssIHdoaWNoIHdpbGwgY3JlYXRlIGNsYXNzIGltYmFsYW5jZS4gSG93ZXZlciwgYWZ0ZXIgcGVyZm9ybWluZyBzYW1wbGluZyB0ZWNobmlxdWUsIHRoZSBudW1iZXIgb2YgcGVvcGxlIHdobyBlYXJuIG1vcmUgdGhhbiBcJDUwayBhbmQgdGhvc2Ugd2hvIGVhcm4gbGVzcyB0aGFuIFwkNTBrIGFyZSByb3VnaGx5IHRoZSBzYW1lLiBUaGlzIHdpbGwgZW5oYW5jZSB0aGUgcGVyZm9ybWFuY2Ugb2YgdGhlIG1vZGVscyBsYXRlciBvbi4NCg0KSUlJLiBNZXRob2RzIGFuZCBSZXN1bHRzDQoNCkZvciBldmVyeSBtb2RlbCBjcmVhdGVkLCB3ZSB3aWxsIHVzZSB0aGUgZnVuY3Rpb24gY29uZnVzaW9uTWF0cml4IHRvIGNvbXBhcmUgdGhlIG1vZGVsIHBlcmZvcm1hbmNlIGJ5IHVzaW5nIHRoZSBhY2N1cmFjeSwgc2Vuc2l0aXZpdHksIHNwZWNpZmljaXR5IG9mIGVhY2ggbW9kZWwuDQoNCkxvZ2lzdGljIFJlZ3Jlc3Npb24gVGhlIExvZ2lzdGljIFJlZ3Jlc3Npb24gbW9kZWwgaGFzIHRoZSBhY2N1cmFjeSBvZiA3Ny42OSUsIHNlbnNpdGl2aXR5IG9mIDc1Ljg0JSwgYW5kIHNwZWNpZmljaXR5IG9mIDgzLjI2JQ0KDQpgYGB7cn0NCmxpYnJhcnkoJ2NhcmV0JykNCmxpYnJhcnkoJ2xhdHRpY2UnKQ0KbG9nX21vZGVsIDwtIGdsbShpbmNvbWUgfiAuLCBmYW1pbHkgPSBiaW5vbWlhbCgpLCBiYWxhbmNlZF9kYXRhKQ0KbG9nX3ByZWQgPC0gcHJlZGljdChsb2dfbW9kZWwsIHRlc3QsIHR5cGUgPSAicmVzcG9uc2UiKQ0KbG9nX3ByZWQgPC0gaWZlbHNlKGxvZ19wcmVkID4gMC41LCAiMSIsICIwIikNCmNvbmZ1c2lvbk1hdHJpeChhcy5mYWN0b3IobG9nX3ByZWQpLCBhcy5mYWN0b3IodGVzdCRpbmNvbWUpKQ0KYGBgDQoNCkFjY29yZGluZyB0byB0aGUgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCwgdGhlIG1vc3QgaW1wb3J0YW50IGZlYXR1cmVzIHRvIGRldGVybWluZSB3aGV0aGVyIGFuIGluZGl2aWR1YWwncyBpbmNvbWUgaXMgbW9yZSB0aGFuIFwkNTBrIGlzIGFnZSwgZWR1Y2F0aW9uIG51bWJlciwgaG91cnMgcGVyIHdlZWssIHJlbGF0aW9uc2hpcCBhbmQgbWFydGlhbCBzdGF0dXMuDQoNCmBgYHtyfQ0Kc3VtbWFyeShsb2dfbW9kZWwpDQpgYGANCg0KUmFuZG9tIEZvcmVzdA0KDQpUaGlzIHRpbWUgSSBidWlsZCBSYW5kb20gRm9yZXN0IG1vZGVsIHdpdGggdGhlIG51bWJlciBvZiB0cmVlcyBlcXVhbHMgdG8gNTAwLiBUaGUgUmFuZG9tIEZvcmVzdCBtb2RlbCBoYXMgdGhlIGFjY3VyYWN5IG9mIDc4LjE1JSwgc2Vuc2l0aXZpdHkgb2YgODIuODYlLCBhbmQgc3BlY2lmaWNpdHkgb2YgNzYuNTklDQoNCmBgYHtyfQ0KbGlicmFyeSgncmFuZG9tRm9yZXN0JykNCg0KcmYgPC0gcmFuZG9tRm9yZXN0KGluY29tZSB+IC4sIGRhdGEgPSBiYWxhbmNlZF9kYXRhLCBudHJlZSA9IDUwMCkNCnJmLnByZWQgPC0gcHJlZGljdChyZiwgbmV3ZGF0YSA9IHRlc3QpDQpjb25mdXNpb25NYXRyaXgoYXMuZmFjdG9yKHJmLnByZWQpLGFzLmZhY3Rvcih0ZXN0JGluY29tZSkscG9zaXRpdmUgPSAiMSIpDQoNCmBgYA0KDQpTdXBwb3J0IFZlY3RvciBNYWNoaW5lIFRoZSBTdXBwb3J0IFZlY3RvciBNYWNoaW5lIG1vZGVsIGhhcyB0aGUgYWNjdXJhY3kgb2YgNzYuNzclLCBzZW5zaXRpdml0eSBvZiA4Ny4zMCUsIGFuZCBzcGVjaWZpY2l0eSBvZiA3My4yOCUNCg0KYGBge3J9DQpsaWJyYXJ5KCdlMTA3MScpDQpzdm1fbW9kZWwgPC0gc3ZtKGluY29tZSB+IC4sIGRhdGEgPSBiYWxhbmNlZF9kYXRhKQ0Kc3ZtLnByZWQgPC0gcHJlZGljdChzdm1fbW9kZWwsIG5ld2RhdGEgPSB0ZXN0KQ0KY29uZnVzaW9uTWF0cml4KHN2bS5wcmVkLHRlc3QkaW5jb21lLHBvc2l0aXZlID0gIjEiKQ0KDQpgYGANCg0KRGVjaXNpb24gVHJlZQ0KDQpgYGB7cn0NCmxpYnJhcnkoInJwYXJ0IikNCmxpYnJhcnkoInJwYXJ0LnBsb3QiKQ0KZGVjX3RyZWUgPC0gcnBhcnQoaW5jb21lfi4sZGF0YT1iYWxhbmNlZF9kYXRhLG1ldGhvZD0nY2xhc3MnKQ0KcnBhcnQucGxvdChkZWNfdHJlZSwgYm94LmNvbD1jKCJzYWxtb24iLCAibGF2ZW5kZXIiKSkNCmBgYA0KDQpUaGUgRGVjaXNpb24gVHJlZSBtb2RlbCBoYXMgdGhlIGFjY3VyYWN5IG9mIDc5LjA1JSwgc2Vuc2l0aXZpdHkgb2YgNzMuNDklLCBhbmQgc3BlY2lmaWNpdHkgb2YgODAuODklDQoNCmBgYHtyfQ0KZGVjX3RyZWUucHJlZCA8LSBwcmVkaWN0KGRlY190cmVlLCBuZXdkYXRhID0gdGVzdCx0eXBlPSJjbGFzcyIpDQpjb25mdXNpb25NYXRyaXgoZGVjX3RyZWUucHJlZCx0ZXN0JGluY29tZSxwb3NpdGl2ZSA9ICIxIikNCmBgYA0KDQpJVi4gQ29uY2x1c2lvbg0KDQpQZXJmb3JtYW5jZSBDb21wYXJpc29uIEFzIHRoZSBncmFwaCBzaG93cywgYmFzZWQgb24gYWNjdXJhY3ksIHRoZSBEZWNpc2lvbiBtb2RlbCBzZWVtcyB0byBoYXZlIHRoZSBiZXN0IHBlcmZvcm1hbmNlIHdoaWxlIHRoZSBTdXBwb3J0IFZlY3RvciBNYWNoaW5lIG1vZGVsIGhhcyB0aGUgbG93ZXN0IGFjY3VyYWN5IGFtb25nIGZvdXIgbW9kZWxzLiBIb3dldmVyLCB0aGUgYWNjdXJhY3kgZGlmZmVyZW5jZSBiZXR3ZWVuIG1vZGVscyBpcyBub3QgbGFyZ2UuDQoNCmBgYHtyfQ0KYWNjdXJhY3k8LWRhdGEuZnJhbWUoTW9kZWw9YygnTG9naXN0aWMgUmVncmVzc2lvbicsJ1JhbmRvbSBGb3Jlc3QnLCdTdXBwb3J0IFZlY3RvciBNYWNoaW5lJywnRGVjaXNpb24gVHJlZScpLGFjY3VyYWN5X29mX21vZGVscyA9IGMoMC43NzY5LDAuNzgxNSwwLjc2NzcsMC43OTA1KSkNCmdncGxvdChhY2N1cmFjeSxhZXMoeD1Nb2RlbCx5PWFjY3VyYWN5X29mX21vZGVscyxmaWxsPU1vZGVsKSkrZ2VvbV9iYXIoc3RhdCA9ICdpZGVudGl0eScpK2dndGl0bGUoJ0FjY3VyYWN5IG9mIGVhY2ggbW9kZWwnKQ0KDQpgYGANCg0KQXBwbGljYXRpb24gQnVpbGRpbmcgdGhlIG1hY2hpbmUgbGVhcm5pbmcgbW9kZWxzIHRvIHByZWRpY3Qgd2hldGhlciBhbiBpbmRpdmlkdWFsJ3MgaW5jb21lIHdpbGwgZXhjZWVkIFwkNTBrIG9yIG5vdCBjYW4gaGF2ZSBhIGh1Z2UgYXBwbGljYXRpb24gaW4gcmVhbCBsaWZlLiBJdCBjYW4gYmVuZWZpdCByZXNlYXJjaGVzIGFib3V0IGluY29tZSBpbmVxdWFsaXR5IGFuZCBpbmZvcm0gdGhlIGdvdmVybm1lbnQgb24gd2hpY2ggZ3JvdXBzIG9mIHBlb3BsZSBtaWdodCBub3QgaGF2ZSBhIGdvb2QgbGl2aW5nIHN0YW5kYXJkIGFuZCBuZWVkIGZpbmFuY2lhbCBhc3Npc3RhbmNlLiBPbmUgbGltaXRhdGlvbiBvZiB0aGlzIHN0dWR5IGlzIHRoYXQgYmVjYXVzZSB0aGlzIGlzIGEgY2xhc3NpZmljYXRpb24gcHJvYmxlbSwgd2UgY2Fubm90IGJ1aWxkIHByZWRpY3RpdmUgbW9kZWwgdG8gcHJlZGljdCB0aGUgYWN0dWFsIGluY29tZSBvZiBhbiBpbmRpdmlkdWFsLg0K