Getting started

Life circle of data

Life circle of data

Data Wrangling is a very important part (probably no less than 50% of the time of an analysis project is allocated to this section). In this section we will use the tidyverse ecosystem to implement this task.

Practical data today is ebola. This data can be downloaded at : https://data.world/brianray/ebola-cases

# packages installations

rm(list = ls()) 

my_packages <- c("tidyverse", "ggthemes", "rvest", "magrittr","ggplot2")

install.packages(my_packages, repos = "http://cran.rstudio.com")
## package 'tidyverse' successfully unpacked and MD5 sums checked
## package 'ggthemes' successfully unpacked and MD5 sums checked
## package 'rvest' successfully unpacked and MD5 sums checked
## package 'magrittr' successfully unpacked and MD5 sums checked
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mlcl.local\AppData\Local\Temp\RtmpqOGtM6\downloaded_packages
# Load some packages for scrapping data and data manipulation: 

p <- c("rvest", "tidyverse", "magrittr")

lapply(p, library, character.only = TRUE)
## [[1]]
## [1] "rvest"     "xml2"      "stats"     "graphics"  "grDevices" "utils"    
## [7] "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [6] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "rvest"    
## [11] "xml2"      "stats"     "graphics"  "grDevices" "utils"    
## [16] "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "magrittr"  "forcats"   "stringr"   "dplyr"     "purrr"    
##  [6] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [11] "rvest"     "xml2"      "stats"     "graphics"  "grDevices"
## [16] "utils"     "datasets"  "methods"   "base"
# Loading data

ebola <- read.csv("C:/Users/mlcl.local/Downloads/ebola_data_db_format.csv", header = TRUE, sep = ',')

ebola %>% head(10)
##                                                              Indicator
## 1   Cumulative number of confirmed, probable and suspected Ebola cases
## 2                           Cumulative number of confirmed Ebola cases
## 3                            Cumulative number of probable Ebola cases
## 4                           Cumulative number of suspected Ebola cases
## 5  Cumulative number of confirmed, probable and suspected Ebola deaths
## 6                          Cumulative number of confirmed Ebola deaths
## 7                           Cumulative number of probable Ebola deaths
## 8   Cumulative number of confirmed, probable and suspected Ebola cases
## 9                           Cumulative number of confirmed Ebola cases
## 10                           Cumulative number of probable Ebola cases
##    Country       Date value
## 1   Guinea 10/03/2015  3285
## 2   Guinea 10/03/2015  2871
## 3   Guinea 10/03/2015   392
## 4   Guinea 10/03/2015    22
## 5   Guinea 10/03/2015  2170
## 6   Guinea 10/03/2015  1778
## 7   Guinea 10/03/2015   392
## 8  Liberia 10/03/2015  9343
## 9  Liberia 10/03/2015  3150
## 10 Liberia 10/03/2015  1879

filter()

This command is used to filter data by line. For example, we want to extract lines that are Cumulative number of confirmed Ebola deaths in the column Indicator, we can:

df <- ebola %>% 
  filter(Indicator == "Cumulative number of confirmed Ebola deaths")

head(df)
##                                     Indicator                  Country
## 1 Cumulative number of confirmed Ebola deaths                   Guinea
## 2 Cumulative number of confirmed Ebola deaths             Sierra Leone
## 3 Cumulative number of confirmed Ebola deaths                  Nigeria
## 4 Cumulative number of confirmed Ebola deaths                  Senegal
## 5 Cumulative number of confirmed Ebola deaths                    Spain
## 6 Cumulative number of confirmed Ebola deaths United States of America
##         Date value
## 1 10/03/2015  1778
## 2 10/03/2015  3263
## 3 10/03/2015     7
## 4 10/03/2015     0
## 5 10/03/2015     0
## 6 10/03/2015     1

group_by()

Suppose we want to calculate the total number of people who have been confirmed dead by Ebola by country. This time, the group_by () command will be extremely useful. But note that the command group_by () only becomes useful if it comes with other commands, like in this case, sum() - sum:

 df %>% 
  group_by(Country) %>% 
  summarise(total = sum(value)) ->> ebola_deaths

ebola_deaths %>% 
  knitr::kable(col.names = c("Country", "Total of deaths"))
Country Total of deaths
Guinea 461211
Guinea 2 1
Italy 0
Liberia 26721
Liberia 2 215
Mali 21
Nigeria 1784
Senegal 0
Sierra Leone 797984
Spain 0
United Kingdom 0
United States of America 242
# Or want to see the first 10 of total number of deaths over time:

  df %>% 
    group_by(Date) %>% 
    summarise(total = sum(value)) %>% 
    head(10)
## # A tibble: 10 x 2
##    Date       total
##    <fct>      <dbl>
##  1 1/05/2015   5511
##  2 1/06/2015   5562
##  3 1/07/2015   5613
##  4 1/09/2015   5674
##  5 1/10/2014   2006
##  6 1/10/2015   5679
##  7 1/12/2014   2431
##  8 1/12/2015   5682
##  9 10/02/2015  8452
## 10 10/03/2015  5049

arrange()

This command is used to rearrange the rows of data by the value of, for example, the total variable column:

# Sort by descending:

  ebola_deaths %>% 
    arrange(-total)  
## # A tibble: 12 x 2
##    Country                   total
##    <fct>                     <dbl>
##  1 Sierra Leone             797984
##  2 Guinea                   461211
##  3 Liberia                   26721
##  4 Nigeria                    1784
##  5 United States of America    242
##  6 Liberia 2                   215
##  7 Mali                         21
##  8 Guinea 2                      1
##  9 Italy                         0
## 10 Senegal                       0
## 11 Spain                         0
## 12 United Kingdom                0
# Sort by acending: 

  ebola_deaths %>% 
    arrange(total)
## # A tibble: 12 x 2
##    Country                   total
##    <fct>                     <dbl>
##  1 Italy                         0
##  2 Senegal                       0
##  3 Spain                         0
##  4 United Kingdom                0
##  5 Guinea 2                      1
##  6 Mali                         21
##  7 Liberia 2                   215
##  8 United States of America    242
##  9 Nigeria                    1784
## 10 Liberia                   26721
## 11 Guinea                   461211
## 12 Sierra Leone             797984

Apply()

The syntax of this command is apply (x, MARGIN, FUN, …) where:

  • x is a matrix or an array,

  • FUN is a function applied to: (1) if MARGIN = 2, then the function for the column is applied, (2) if MARGIN = 1, then MARGIN = c (1, 2) Functions for both rows and columns.

For example we creat a matrix:

# Create a matrix X: 

set.seed(1)
X <- matrix(rnorm(30), nrow = 5, ncol = 6)

X
##            [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
## [1,] -0.6264538 -0.8204684  1.5117812 -0.04493361  0.91897737 -0.05612874
## [2,]  0.1836433  0.4874291  0.3898432 -0.01619026  0.78213630 -0.15579551
## [3,] -0.8356286  0.7383247 -0.6212406  0.94383621  0.07456498 -1.47075238
## [4,]  1.5952808  0.5757814 -2.2146999  0.82122120 -1.98935170 -0.47815006
## [5,]  0.3295078 -0.3053884  1.1249309  0.59390132  0.61982575  0.41794156
# Calculate the average of the columns of this matrix:
apply(X, MARGIN = 2 , FUN = mean) %>% head()
## [1]  0.12926990  0.13513567  0.03812297  0.45956697  0.08123054 -0.34857703
# Calculate the mean for the rows
apply(X, 1, mean)
## [1]  0.1471290  0.2785110 -0.1951493 -0.2816530  0.4634532

lapply()

The lapply statement has input data as a list, data frame, or vector. The result is a list:

# Data input is a dataframe:

attach(trees)

lapply(trees, mean)
## $Girth
## [1] 13.24839
## 
## $Height
## [1] 76
## 
## $Volume
## [1] 30.17097
# If the input data is a list:

my_list <- list(trees, iris) # Create a simulation of a list

lapply(my_list, summary)
## [[1]]
##      Girth           Height       Volume     
##  Min.   : 8.30   Min.   :63   Min.   :10.20  
##  1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
##  Median :12.90   Median :76   Median :24.20  
##  Mean   :13.25   Mean   :76   Mean   :30.17  
##  3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
##  Max.   :20.60   Max.   :87   Max.   :77.00  
## 
## [[2]]
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

sapply()

This command works similarly to *lapply. The difference is that the result is returned as the simplest data structure as possible.

sapply(trees, mean)
##    Girth   Height   Volume 
## 13.24839 76.00000 30.17097
# Check the result
is.vector(sapply(trees, mean))
## [1] TRUE

Dealing with missing data

Filling missing data

In this part we use a home loan dataset. This dataset contains lots of missing values, empty spaces…

# Load data from the internet

hmeq <- read.csv("http://www.creditriskanalytics.net/uploads/1/9/5/1/19511601/hmeq.csv")

# Data 

head(hmeq)
##   BAD LOAN MORTDUE  VALUE  REASON    JOB  YOJ DEROG DELINQ     CLAGE NINQ
## 1   1 1100   25860  39025 HomeImp  Other 10.5     0      0  94.36667    1
## 2   1 1300   70053  68400 HomeImp  Other  7.0     0      2 121.83333    0
## 3   1 1500   13500  16700 HomeImp  Other  4.0     0      0 149.46667    1
## 4   1 1500      NA     NA                  NA    NA     NA        NA   NA
## 5   0 1700   97800 112000 HomeImp Office  3.0     0      0  93.33333    0
## 6   1 1700   30548  40320 HomeImp  Other  9.0     0      0 101.46600    1
##   CLNO  DEBTINC
## 1    9       NA
## 2   14       NA
## 3   10       NA
## 4   NA       NA
## 5   14       NA
## 6    8 37.11361
  • Functions created for data manipulation
 # Function for detecting NA observations: 

  na_rate <- function(x) {x %>% is.na() %>% sum() / length(x)}
  sapply(hmeq, na_rate) %>% round(2)
##     BAD    LOAN MORTDUE   VALUE  REASON     JOB     YOJ   DEROG  DELINQ 
##    0.00    0.00    0.09    0.02    0.00    0.00    0.09    0.12    0.10 
##   CLAGE    NINQ    CLNO DEBTINC 
##    0.05    0.09    0.04    0.21
 # Function replaces NA by mean: 

  replace_by_mean <- function(x) {
    x[is.na(x)] <- mean(x, na.rm = TRUE)
    return(x)
  }
# A function imputes NA observations for categorical variables 
  
  replace_na_categorical <- function(x) {
    x %>% 
      table() %>% 
      as.data.frame() %>% 
      arrange(-Freq) ->> my_df
    
    n_obs <- sum(my_df$Freq)
    pop <- my_df$. %>% as.character()
    set.seed(29)
    x[is.na(x)] <- sample(pop, sum(is.na(x)), replace = TRUE, prob = my_df$Freq)
    return(x)
  }

Re-lables data

## Relable for "JOB"
name_job <- function(x){
  x %<>% as.character()
  ELSE <- TRUE
  job_name <- c("Mgr", "Office", "Other", "ProfExe", "Sales", "Self")
  case_when(!x %in% job_name ~ "Other", 
            ELSE ~ x) 

} 
# Relable for "REASON"
name_reason <- function(x){
  ELSE <- TRUE
  x %<>% as.character()
  case_when(!x  %in%  c("DebtCon", "HomeImp") ~ "Unknown",
            ELSE ~ x)
    
}
# Relable for "BAD"

label_rename <- function(x){
  case_when(x==1 ~ "BAD",
            x==0 ~ "GOOD")
}
# Use the two functions: 
  df <- hmeq %>% 
    mutate_if(is.factor, as.character) %>% 
    mutate(REASON = case_when(REASON == "" ~ NA_character_, TRUE ~ REASON), 
           JOB = case_when(JOB == "" ~ NA_character_, TRUE ~ JOB)) %>%
    mutate_if(is_character, as.factor) %>% 
    mutate_if(is.numeric, replace_by_mean) %>% 
    mutate_if(is.factor, replace_na_categorical) %>% 
    mutate_at("REASON", name_reason) %>% 
    mutate_at("JOB", name_job) %>% 
    mutate(BAD = label_rename(BAD))
  


# See the data after filling in 

head(df, n=20) %>% knitr::kable(col.names = c("BAD", "LOAN", "MORTDUE", "VALUE","REASON","JOB",
                             "JOY","DEROG", "DELINQ","CLAGE","NINQ","CLNO","DEBTINC"))
BAD LOAN MORTDUE VALUE REASON JOB JOY DEROG DELINQ CLAGE NINQ CLNO DEBTINC
BAD 1100 25860.00 39025 HomeImp Other 10.500000 0.0000000 0.0000000 94.36667 1.000000 9.0000 33.779915
BAD 1300 70053.00 68400 HomeImp Other 7.000000 0.0000000 2.0000000 121.83333 0.000000 14.0000 33.779915
BAD 1500 13500.00 16700 HomeImp Other 4.000000 0.0000000 0.0000000 149.46667 1.000000 10.0000 33.779915
BAD 1500 73760.82 101776 DebtCon Other 8.922268 0.2545697 0.4494424 179.76628 1.186055 21.2961 33.779915
GOOD 1700 97800.00 112000 HomeImp Office 3.000000 0.0000000 0.0000000 93.33333 0.000000 14.0000 33.779915
BAD 1700 30548.00 40320 HomeImp Other 9.000000 0.0000000 0.0000000 101.46600 1.000000 8.0000 37.113614
BAD 1800 48649.00 57037 HomeImp Other 5.000000 3.0000000 2.0000000 77.10000 1.000000 17.0000 33.779915
BAD 1800 28502.00 43034 HomeImp Other 11.000000 0.0000000 0.0000000 88.76603 0.000000 8.0000 36.884894
BAD 2000 32700.00 46740 HomeImp Other 3.000000 0.0000000 2.0000000 216.93333 1.000000 12.0000 33.779915
BAD 2000 73760.82 62250 HomeImp Sales 16.000000 0.0000000 0.0000000 115.80000 0.000000 13.0000 33.779915
BAD 2000 22608.00 101776 DebtCon Other 18.000000 0.2545697 0.4494424 179.76628 1.186055 21.2961 33.779915
BAD 2000 20627.00 29800 HomeImp Office 11.000000 0.0000000 1.0000000 122.53333 1.000000 9.0000 33.779915
BAD 2000 45000.00 55000 HomeImp Other 3.000000 0.0000000 0.0000000 86.06667 2.000000 25.0000 33.779915
GOOD 2000 64536.00 87400 DebtCon Mgr 2.500000 0.0000000 0.0000000 147.13333 0.000000 24.0000 33.779915
BAD 2100 71000.00 83850 HomeImp Other 8.000000 0.0000000 1.0000000 123.00000 0.000000 16.0000 33.779915
BAD 2200 24280.00 34687 HomeImp Other 8.922268 0.0000000 1.0000000 300.86667 0.000000 8.0000 33.779915
BAD 2200 90957.00 102600 HomeImp Mgr 7.000000 2.0000000 6.0000000 122.90000 1.000000 22.0000 33.779915
BAD 2200 23030.00 101776 DebtCon Other 19.000000 0.2545697 0.4494424 179.76628 1.186055 21.2961 3.711312
BAD 2300 28192.00 40150 HomeImp Other 4.500000 0.0000000 0.0000000 54.60000 1.000000 16.0000 33.779915
GOOD 2300 102370.00 120953 HomeImp Office 2.000000 0.0000000 0.0000000 90.99253 0.000000 13.0000 31.588503
LS0tDQp0aXRsZTogIkRhdGEgV3JhbmdsaW5nIGZyb20gMCB0byAxIg0KYXV0aG9yOiAiSmVubnkgTmd1eWVuIg0Kb3V0cHV0Og0KICBodG1sX2RvY3VtZW50Og0KICAgIGNvZGVfZG93bmxvYWQ6IHllcw0KICAgIGhpZ2hsaWdodDogcHlnbWVudHMNCiAgICB0aGVtZTogZmxhdGx5DQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZmxvYXQ6IHllcw0KLS0tDQoNCmBgYHtyIHNldHVwLGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUsIHdhcm5pbmcgPSBGQUxTRSwgbWVzc2FnZSA9IEZBTFNFKQ0KYGBgDQoNCg0KIyAqKkdldHRpbmcgc3RhcnRlZCoqDQoNCg0KIVtMaWZlIGNpcmNsZSBvZiBkYXRhXShDOi9Vc2Vycy9tbGNsLmxvY2FsL0Rlc2t0b3AvbWxjL3Byb2Nlc3MucG5nKQ0KDQoNCkRhdGEgV3JhbmdsaW5nIGlzIGEgdmVyeSBpbXBvcnRhbnQgcGFydCAocHJvYmFibHkgbm8gbGVzcyB0aGFuIDUwJSBvZiB0aGUgdGltZSBvZiBhbiBhbmFseXNpcyBwcm9qZWN0IGlzIGFsbG9jYXRlZCB0byB0aGlzIHNlY3Rpb24pLiBJbiB0aGlzIHNlY3Rpb24gd2Ugd2lsbCB1c2UgdGhlICoqdGlkeXZlcnNlIGVjb3N5c3RlbSoqIHRvIGltcGxlbWVudCB0aGlzIHRhc2suDQoNClByYWN0aWNhbCBkYXRhIHRvZGF5IGlzICoqZWJvbGEqKi4gVGhpcyBkYXRhIGNhbiBiZSBkb3dubG9hZGVkIGF0IDogPGh0dHBzOi8vZGF0YS53b3JsZC9icmlhbnJheS9lYm9sYS1jYXNlcz4NCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KIyBwYWNrYWdlcyBpbnN0YWxsYXRpb25zDQoNCnJtKGxpc3QgPSBscygpKSANCg0KbXlfcGFja2FnZXMgPC0gYygidGlkeXZlcnNlIiwgImdndGhlbWVzIiwgInJ2ZXN0IiwgIm1hZ3JpdHRyIiwiZ2dwbG90MiIpDQoNCmluc3RhbGwucGFja2FnZXMobXlfcGFja2FnZXMsIHJlcG9zID0gImh0dHA6Ly9jcmFuLnJzdHVkaW8uY29tIikNCg0KDQoNCg0KIyBMb2FkIHNvbWUgcGFja2FnZXMgZm9yIHNjcmFwcGluZyBkYXRhIGFuZCBkYXRhIG1hbmlwdWxhdGlvbjogDQoNCnAgPC0gYygicnZlc3QiLCAidGlkeXZlcnNlIiwgIm1hZ3JpdHRyIikNCg0KbGFwcGx5KHAsIGxpYnJhcnksIGNoYXJhY3Rlci5vbmx5ID0gVFJVRSkNCg0KYGBgDQoNCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIExvYWRpbmcgZGF0YQ0KDQplYm9sYSA8LSByZWFkLmNzdigiQzovVXNlcnMvbWxjbC5sb2NhbC9Eb3dubG9hZHMvZWJvbGFfZGF0YV9kYl9mb3JtYXQuY3N2IiwgaGVhZGVyID0gVFJVRSwgc2VwID0gJywnKQ0KDQplYm9sYSAlPiUgaGVhZCgxMCkNCmBgYA0KDQojIyAqKmZpbHRlcigpKioNCg0KVGhpcyBjb21tYW5kIGlzIHVzZWQgdG8gZmlsdGVyIGRhdGEgYnkgbGluZS4gRm9yIGV4YW1wbGUsIHdlIHdhbnQgdG8gZXh0cmFjdCBsaW5lcyB0aGF0IGFyZSAqKkN1bXVsYXRpdmUgbnVtYmVyIG9mIGNvbmZpcm1lZCBFYm9sYSBkZWF0aHMqKiBpbiB0aGUgY29sdW1uIEluZGljYXRvciwgd2UgY2FuOg0KDQoNCmBgYHtyIGVjaG89VFJVRSwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCmRmIDwtIGVib2xhICU+JSANCiAgZmlsdGVyKEluZGljYXRvciA9PSAiQ3VtdWxhdGl2ZSBudW1iZXIgb2YgY29uZmlybWVkIEVib2xhIGRlYXRocyIpDQoNCmhlYWQoZGYpDQoNCmBgYA0KDQoNCiMjICoqZ3JvdXBfYnkoKSoqDQoNClN1cHBvc2Ugd2Ugd2FudCB0byBjYWxjdWxhdGUgdGhlIHRvdGFsIG51bWJlciBvZiBwZW9wbGUgd2hvIGhhdmUgYmVlbiBjb25maXJtZWQgZGVhZCBieSBFYm9sYSBieSBjb3VudHJ5Lg0KVGhpcyB0aW1lLCB0aGUgZ3JvdXBfYnkgKCkgY29tbWFuZCB3aWxsIGJlIGV4dHJlbWVseSB1c2VmdWwuIEJ1dCBub3RlIHRoYXQgdGhlIGNvbW1hbmQgZ3JvdXBfYnkgKCkgb25seSBiZWNvbWVzIHVzZWZ1bCBpZiBpdCBjb21lcyB3aXRoIG90aGVyIGNvbW1hbmRzLCBsaWtlIGluIHRoaXMgY2FzZSwgKnN1bSgpKiAtICpzdW0qOg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQogZGYgJT4lIA0KICBncm91cF9ieShDb3VudHJ5KSAlPiUgDQogIHN1bW1hcmlzZSh0b3RhbCA9IHN1bSh2YWx1ZSkpIC0+PiBlYm9sYV9kZWF0aHMNCg0KZWJvbGFfZGVhdGhzICU+JSANCiAga25pdHI6OmthYmxlKGNvbC5uYW1lcyA9IGMoIkNvdW50cnkiLCAiVG90YWwgb2YgZGVhdGhzIikpDQoNCmBgYA0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQoNCiMgT3Igd2FudCB0byBzZWUgdGhlIGZpcnN0IDEwIG9mIHRvdGFsIG51bWJlciBvZiBkZWF0aHMgb3ZlciB0aW1lOg0KDQogIGRmICU+JSANCiAgICBncm91cF9ieShEYXRlKSAlPiUgDQogICAgc3VtbWFyaXNlKHRvdGFsID0gc3VtKHZhbHVlKSkgJT4lIA0KICAgIGhlYWQoMTApDQoNCmBgYA0KDQojIyAqKmFycmFuZ2UoKSoqDQoNClRoaXMgY29tbWFuZCBpcyB1c2VkIHRvIHJlYXJyYW5nZSB0aGUgcm93cyBvZiBkYXRhIGJ5IHRoZSB2YWx1ZSBvZiwgZm9yIGV4YW1wbGUsIHRoZSB0b3RhbCB2YXJpYWJsZSBjb2x1bW46DQoNCmBgYHtyIGVjaG89VFJVRSwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiMgU29ydCBieSBkZXNjZW5kaW5nOg0KDQogIGVib2xhX2RlYXRocyAlPiUgDQogICAgYXJyYW5nZSgtdG90YWwpICANCg0KICANCmBgYA0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIFNvcnQgYnkgYWNlbmRpbmc6IA0KDQogIGVib2xhX2RlYXRocyAlPiUgDQogICAgYXJyYW5nZSh0b3RhbCkNCmBgYA0KDQojIyAqKkFwcGx5KCkqKg0KDQpUaGUgc3ludGF4IG9mIHRoaXMgY29tbWFuZCBpcyBhcHBseSAoeCwgTUFSR0lOLCBGVU4sIC4uLikgd2hlcmU6DQoNCi0geCBpcyBhIG1hdHJpeCBvciBhbiBhcnJheSwNCg0KLSBGVU4gaXMgYSBmdW5jdGlvbiBhcHBsaWVkIHRvOiAoMSkgaWYgTUFSR0lOID0gMiwgdGhlbiB0aGUgZnVuY3Rpb24gZm9yIHRoZSBjb2x1bW4gaXMgYXBwbGllZCwgKDIpIGlmIE1BUkdJTiA9IDEsIHRoZW4gTUFSR0lOID0gYyAoMSwgMikgRnVuY3Rpb25zIGZvciBib3RoIHJvd3MgYW5kIGNvbHVtbnMuDQoNCkZvciBleGFtcGxlIHdlIGNyZWF0IGEgbWF0cml4OiANCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KIyBDcmVhdGUgYSBtYXRyaXggWDogDQoNCnNldC5zZWVkKDEpDQpYIDwtIG1hdHJpeChybm9ybSgzMCksIG5yb3cgPSA1LCBuY29sID0gNikNCg0KWA0KDQpgYGANCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIENhbGN1bGF0ZSB0aGUgYXZlcmFnZSBvZiB0aGUgY29sdW1ucyBvZiB0aGlzIG1hdHJpeDoNCmFwcGx5KFgsIE1BUkdJTiA9IDIgLCBGVU4gPSBtZWFuKSAlPiUgaGVhZCgpDQoNCmBgYA0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIENhbGN1bGF0ZSB0aGUgbWVhbiBmb3IgdGhlIHJvd3MNCmFwcGx5KFgsIDEsIG1lYW4pDQoNCmBgYA0KDQoNCiMjICoqbGFwcGx5KCkqKg0KDQpUaGUgbGFwcGx5IHN0YXRlbWVudCBoYXMgaW5wdXQgZGF0YSBhcyBhIGxpc3QsIGRhdGEgZnJhbWUsIG9yIHZlY3Rvci4gVGhlIHJlc3VsdCBpcyBhIGxpc3Q6DQoNCmBgYHtyIGVjaG89VFJVRSwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiMgRGF0YSBpbnB1dCBpcyBhIGRhdGFmcmFtZToNCg0KYXR0YWNoKHRyZWVzKQ0KDQpsYXBwbHkodHJlZXMsIG1lYW4pDQpgYGANCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KIyBJZiB0aGUgaW5wdXQgZGF0YSBpcyBhIGxpc3Q6DQoNCm15X2xpc3QgPC0gbGlzdCh0cmVlcywgaXJpcykgIyBDcmVhdGUgYSBzaW11bGF0aW9uIG9mIGEgbGlzdA0KDQpsYXBwbHkobXlfbGlzdCwgc3VtbWFyeSkNCg0KYGBgDQoNCiMjICoqc2FwcGx5KCkqKg0KDQpUaGlzIGNvbW1hbmQgd29ya3Mgc2ltaWxhcmx5IHRvICpsYXBwbHkuIFRoZSBkaWZmZXJlbmNlIGlzIHRoYXQgdGhlIHJlc3VsdCBpcyByZXR1cm5lZCBhcyB0aGUgc2ltcGxlc3QgZGF0YSBzdHJ1Y3R1cmUgYXMgcG9zc2libGUuDQoNCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0Kc2FwcGx5KHRyZWVzLCBtZWFuKQ0KDQoNCmBgYA0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIENoZWNrIHRoZSByZXN1bHQNCmlzLnZlY3RvcihzYXBwbHkodHJlZXMsIG1lYW4pKQ0KYGBgDQoNCg0KIyAqKkRlYWxpbmcgd2l0aCBtaXNzaW5nIGRhdGEqKg0KDQojIyAqKkZpbGxpbmcgbWlzc2luZyBkYXRhKioNCg0KSW4gdGhpcyBwYXJ0IHdlIHVzZSBhIGhvbWUgbG9hbiBkYXRhc2V0LiBUaGlzIGRhdGFzZXQgY29udGFpbnMgbG90cyBvZiBtaXNzaW5nIHZhbHVlcywgZW1wdHkgc3BhY2VzLi4uDQoNCmBgYHtyIGVjaG89VFJVRSwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiMgTG9hZCBkYXRhIGZyb20gdGhlIGludGVybmV0DQoNCmhtZXEgPC0gcmVhZC5jc3YoImh0dHA6Ly93d3cuY3JlZGl0cmlza2FuYWx5dGljcy5uZXQvdXBsb2Fkcy8xLzkvNS8xLzE5NTExNjAxL2htZXEuY3N2IikNCg0KIyBEYXRhIA0KDQpoZWFkKGhtZXEpDQpgYGANCg0KKiBGdW5jdGlvbnMgY3JlYXRlZCBmb3IgZGF0YSBtYW5pcHVsYXRpb24NCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KICMgRnVuY3Rpb24gZm9yIGRldGVjdGluZyBOQSBvYnNlcnZhdGlvbnM6IA0KDQogIG5hX3JhdGUgPC0gZnVuY3Rpb24oeCkge3ggJT4lIGlzLm5hKCkgJT4lIHN1bSgpIC8gbGVuZ3RoKHgpfQ0KICBzYXBwbHkoaG1lcSwgbmFfcmF0ZSkgJT4lIHJvdW5kKDIpDQpgYGANCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQogIyBGdW5jdGlvbiByZXBsYWNlcyBOQSBieSBtZWFuOiANCg0KICByZXBsYWNlX2J5X21lYW4gPC0gZnVuY3Rpb24oeCkgew0KICAgIHhbaXMubmEoeCldIDwtIG1lYW4oeCwgbmEucm0gPSBUUlVFKQ0KICAgIHJldHVybih4KQ0KICB9DQpgYGANCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIEEgZnVuY3Rpb24gaW1wdXRlcyBOQSBvYnNlcnZhdGlvbnMgZm9yIGNhdGVnb3JpY2FsIHZhcmlhYmxlcyANCiAgDQogIHJlcGxhY2VfbmFfY2F0ZWdvcmljYWwgPC0gZnVuY3Rpb24oeCkgew0KICAgIHggJT4lIA0KICAgICAgdGFibGUoKSAlPiUgDQogICAgICBhcy5kYXRhLmZyYW1lKCkgJT4lIA0KICAgICAgYXJyYW5nZSgtRnJlcSkgLT4+IG15X2RmDQogICAgDQogICAgbl9vYnMgPC0gc3VtKG15X2RmJEZyZXEpDQogICAgcG9wIDwtIG15X2RmJC4gJT4lIGFzLmNoYXJhY3RlcigpDQogICAgc2V0LnNlZWQoMjkpDQogICAgeFtpcy5uYSh4KV0gPC0gc2FtcGxlKHBvcCwgc3VtKGlzLm5hKHgpKSwgcmVwbGFjZSA9IFRSVUUsIHByb2IgPSBteV9kZiRGcmVxKQ0KICAgIHJldHVybih4KQ0KICB9DQpgYGANCg0KDQoNCiMjICoqUmUtbGFibGVzIGRhdGEqKg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIyBSZWxhYmxlIGZvciAiSk9CIg0KbmFtZV9qb2IgPC0gZnVuY3Rpb24oeCl7DQogIHggJTw+JSBhcy5jaGFyYWN0ZXIoKQ0KICBFTFNFIDwtIFRSVUUNCiAgam9iX25hbWUgPC0gYygiTWdyIiwgIk9mZmljZSIsICJPdGhlciIsICJQcm9mRXhlIiwgIlNhbGVzIiwgIlNlbGYiKQ0KICBjYXNlX3doZW4oIXggJWluJSBqb2JfbmFtZSB+ICJPdGhlciIsIA0KICAgICAgICAgICAgRUxTRSB+IHgpIA0KDQp9IA0KDQpgYGANCg0KYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KIyBSZWxhYmxlIGZvciAiUkVBU09OIg0KbmFtZV9yZWFzb24gPC0gZnVuY3Rpb24oeCl7DQogIEVMU0UgPC0gVFJVRQ0KICB4ICU8PiUgYXMuY2hhcmFjdGVyKCkNCiAgY2FzZV93aGVuKCF4ICAlaW4lICBjKCJEZWJ0Q29uIiwgIkhvbWVJbXAiKSB+ICJVbmtub3duIiwNCiAgICAgICAgICAgIEVMU0UgfiB4KQ0KICAgIA0KfQ0KDQpgYGANCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIFJlbGFibGUgZm9yICJCQUQiDQoNCmxhYmVsX3JlbmFtZSA8LSBmdW5jdGlvbih4KXsNCiAgY2FzZV93aGVuKHg9PTEgfiAiQkFEIiwNCiAgICAgICAgICAgIHg9PTAgfiAiR09PRCIpDQp9DQoNCg0KDQpgYGANCg0KDQpgYGB7ciBlY2hvPVRSVUUsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQojIFVzZSB0aGUgdHdvIGZ1bmN0aW9uczogDQogIGRmIDwtIGhtZXEgJT4lIA0KICAgIG11dGF0ZV9pZihpcy5mYWN0b3IsIGFzLmNoYXJhY3RlcikgJT4lIA0KICAgIG11dGF0ZShSRUFTT04gPSBjYXNlX3doZW4oUkVBU09OID09ICIiIH4gTkFfY2hhcmFjdGVyXywgVFJVRSB+IFJFQVNPTiksIA0KICAgICAgICAgICBKT0IgPSBjYXNlX3doZW4oSk9CID09ICIiIH4gTkFfY2hhcmFjdGVyXywgVFJVRSB+IEpPQikpICU+JQ0KICAgIG11dGF0ZV9pZihpc19jaGFyYWN0ZXIsIGFzLmZhY3RvcikgJT4lIA0KICAgIG11dGF0ZV9pZihpcy5udW1lcmljLCByZXBsYWNlX2J5X21lYW4pICU+JSANCiAgICBtdXRhdGVfaWYoaXMuZmFjdG9yLCByZXBsYWNlX25hX2NhdGVnb3JpY2FsKSAlPiUgDQogICAgbXV0YXRlX2F0KCJSRUFTT04iLCBuYW1lX3JlYXNvbikgJT4lIA0KICAgIG11dGF0ZV9hdCgiSk9CIiwgbmFtZV9qb2IpICU+JSANCiAgICBtdXRhdGUoQkFEID0gbGFiZWxfcmVuYW1lKEJBRCkpDQogIA0KDQoNCiMgU2VlIHRoZSBkYXRhIGFmdGVyIGZpbGxpbmcgaW4gDQoNCmhlYWQoZGYsIG49MjApICU+JSBrbml0cjo6a2FibGUoY29sLm5hbWVzID0gYygiQkFEIiwgIkxPQU4iLCAiTU9SVERVRSIsICJWQUxVRSIsIlJFQVNPTiIsIkpPQiIsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICJKT1kiLCJERVJPRyIsICJERUxJTlEiLCJDTEFHRSIsIk5JTlEiLCJDTE5PIiwiREVCVElOQyIpKQ0KICANCmBgYA0KDQoNCg0KDQoNCg0KDQo=