Required packages

library(readxl)
library(tidyr)
library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union
library(knitr)
library(stringr)
library(outliers)

Executive Summary

This assignment focused on the two datasets comes from World Bank Organization. In these data sets,it provides key health, nutrition and population statistics gathered from a variety of international and national sources. Themes include global surgery, health financing, HIV/AIDS, immunization, infectious diseases, medical resources and usage, noncommunicable diseases, nutrition, population dynamics, reproductive health, universal health coverage, and water and sanitation. It covers all the countries and regions in the earth and the data starts from year 1960 to 2018. So there are lots of information and data in these data base.

In terms of this assignment requirement, I only investigate the population cross the world in different age group and gender. At first, I load the excel files into dataframe by the related functions, print out the dataframe structures, dimensions and other information. Then, I quickly go through all the variables, observations cross the datasets, to grab some ideas to understand the relationships between datasets and have a look at it if the data is tidy or not. After that, I use the tools and libraries I learned in class to tidy up these data. After that, I joined the two datasets into one by a common variable, subset a small dataset from the large one as the original dataset are huge and covered lots of areas related to human.

In the last part of this assignment, after tidying up the data, I scanned the data to find some missing values, outlier, then I deal with these exceptions in the dataset. As required by the assignment, I did an appropriate transformation on a variable, print out the graphic as well.

Data

The dataset links in this assignment is: https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

As there are several excel files in the dataset, I picked up two of them to finish my assignment. First, loading the excels into dataframe and print out some structure information.

#Load Data.xlsx into Data
Data <- read_excel("HNP/Data.xlsx")

-
\
|
/
-
\
|
/
                                                                                          
#Print out the first five rows
head(Data)
#Load Series.xlsx into Series
Series <- read_excel("HNP/Series.xlsx")

-
/
                                                                                          
#Print out the first five rows 
head(Series)

Variables description (selected variables)

Country Name: the country name Country Code: the code of a country, usually it’s the abbrevation of the country name. Indicator Name: a brief inforamtion about the statistic data. Indicator Code: the code of the indicator name. Year: stands for when the data comes from Series Code: a code which is the same as Indicator Code, stands for the category of the statistic data. Topic: a group name for each data Periodicity: The frequency of the data updated Aggregation method: how the data aggregation.

Understand

By using the function “str” to inspect the data variables.

From the output, I can see that there are 82 variables and 20 variables in Data and Series. The most of types of these variables are character and number, but apparently, some of them shoud be factor,

#Print out the structures
str(Data)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   104377 obs. of  63 variables:
 $ Country Name  : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ Country Code  : chr  "ARB" "ARB" "ARB" "ARB" ...
 $ Indicator Name: chr  "Adolescent fertility rate (births per 1,000 women ages 15-19)" "Adults (ages 15+) and children (0-14 years) living with HIV" "Adults (ages 15+) and children (ages 0-14) newly infected with HIV" "Adults (ages 15+) living with HIV" ...
 $ Indicator Code: chr  "SP.ADO.TFRT" "SH.HIV.TOTL" "SH.HIV.INCD.TL" "SH.DYN.AIDS" ...
 $ 1960          : num  135 NA NA NA NA ...
 $ 1961          : num  135 NA NA NA NA ...
 $ 1962          : num  136 NA NA NA NA ...
 $ 1963          : num  136 NA NA NA NA ...
 $ 1964          : num  136 NA NA NA NA ...
 $ 1965          : num  135 NA NA NA NA ...
 $ 1966          : num  135 NA NA NA NA ...
 $ 1967          : num  134 NA NA NA NA ...
 $ 1968          : num  131 NA NA NA NA ...
 $ 1969          : num  129 NA NA NA NA ...
 $ 1970          : num  127 NA NA NA NA ...
 $ 1971          : num  124 NA NA NA NA ...
 $ 1972          : num  122 NA NA NA NA ...
 $ 1973          : num  120 NA NA NA NA ...
 $ 1974          : num  118 NA NA NA NA ...
 $ 1975          : num  115 NA NA NA NA ...
 $ 1976          : num  113 NA NA NA NA ...
 $ 1977          : num  110 NA NA NA NA ...
 $ 1978          : num  107 NA NA NA NA ...
 $ 1979          : num  103 NA NA NA NA ...
 $ 1980          : num  99 NA NA NA NA ...
 $ 1981          : num  95.4 NA NA NA NA ...
 $ 1982          : num  91.8 NA NA NA NA ...
 $ 1983          : num  88.9 NA NA NA NA ...
 $ 1984          : num  85.9 NA NA NA NA ...
 $ 1985          : num  83 NA NA NA NA ...
 $ 1986          : num  79.9 NA NA NA NA ...
 $ 1987          : num  76.9 NA NA NA NA ...
 $ 1988          : num  74.8 NA NA NA NA ...
 $ 1989          : num  72.8 NA NA NA NA ...
 $ 1990          : num  71.1 NA NA NA NA ...
 $ 1991          : num  69.1 NA NA NA NA ...
 $ 1992          : num  67.3 NA NA NA NA ...
 $ 1993          : num  65.3 NA NA NA NA ...
 $ 1994          : num  63.4 NA NA NA NA ...
 $ 1995          : num  61.2 NA NA NA NA ...
 $ 1996          : num  59.3 NA NA NA NA ...
 $ 1997          : num  57.3 NA NA NA NA ...
 $ 1998          : num  56.3 NA NA NA NA ...
 $ 1999          : num  55.3 NA NA NA NA ...
 $ 2000          : num  54.3 NA NA NA NA ...
 $ 2001          : num  53.3 NA NA NA NA ...
 $ 2002          : num  52.3 NA NA NA NA ...
 $ 2003          : num  51.9 NA NA NA NA ...
 $ 2004          : num  51.5 NA NA NA NA ...
 $ 2005          : num  51.2 NA NA NA NA ...
 $ 2006          : num  50.8 NA NA NA NA ...
 $ 2007          : num  50.5 NA NA NA NA ...
 $ 2008          : num  50.3 NA NA NA NA ...
 $ 2009          : num  50.1 NA NA NA NA ...
 $ 2010          : num  50 NA NA NA NA ...
 $ 2011          : num  49.9 NA NA NA NA ...
 $ 2012          : num  49.8 NA NA NA NA ...
 $ 2013          : num  49.3 NA NA NA NA ...
 $ 2014          : num  48.9 NA NA NA NA ...
 $ 2015          : num  48.3 NA NA NA NA ...
 $ 2016          : num  47.5 NA NA NA NA ...
 $ 2017          : num  46.7 NA NA NA NA ...
 $ 2018          : num  NA NA NA NA NA ...
#Print out structure inforamtion
str(Series)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   403 obs. of  20 variables:
 $ Series Code                        : chr  "HD.HCI.OVRL" "HD.HCI.OVRL.FE" "HD.HCI.OVRL.LB" "HD.HCI.OVRL.LB.FE" ...
 $ Topic                              : chr  "Public Sector: Policy & institutions" "Public Sector: Policy & institutions" "Public Sector: Policy & institutions" "Public Sector: Policy & institutions" ...
 $ Indicator Name                     : chr  "Human capital index (HCI) (scale 0-1)" "Human capital index (HCI), female (scale 0-1)" "Human capital index (HCI), lower bound (scale 0-1)" "Human capital index (HCI), female, lower bound (scale 0-1)" ...
 $ Short definition                   : chr  NA NA NA NA ...
 $ Long definition                    : chr  "The HCI calculates the contributions of health and education to worker productivity. The final index score rang"| __truncated__ "The HCI calculates the contributions of health and education to worker productivity. The final index score rang"| __truncated__ "The HCI lower bound reflects uncertainty in the measurement of the components and the overall index. It is obta"| __truncated__ "The HCI lower bound reflects uncertainty in the measurement of the components and the overall index. It is obta"| __truncated__ ...
 $ Unit of measure                    : logi  NA NA NA NA NA NA ...
 $ Periodicity                        : chr  NA NA NA NA ...
 $ Base Period                        : logi  NA NA NA NA NA NA ...
 $ Other notes                        : logi  NA NA NA NA NA NA ...
 $ Aggregation method                 : chr  NA NA NA NA ...
 $ Limitations and exceptions         : chr  NA NA NA NA ...
 $ Notes from original source         : chr  NA NA NA NA ...
 $ General comments                   : chr  NA NA NA NA ...
 $ Source                             : chr  "World Bank staff calculations based on the methodology described in World Bank (2018). https://openknowledge.wo"| __truncated__ "World Bank staff calculations based on the methodology described in World Bank (2018). https://openknowledge.wo"| __truncated__ "World Bank staff calculations based on the methodology described in World Bank (2018). https://openknowledge.wo"| __truncated__ "World Bank staff calculations based on the methodology described in World Bank (2018). https://openknowledge.wo"| __truncated__ ...
 $ Statistical concept and methodology: chr  NA NA NA NA ...
 $ Development relevance              : chr  NA NA NA NA ...
 $ Related source links               : logi  NA NA NA NA NA NA ...
 $ Other web links                    : logi  NA NA NA NA NA NA ...
 $ Related indicators                 : logi  NA NA NA NA NA NA ...
 $ License Type                       : chr  "CC BY-4.0" "CC BY-4.0" "CC BY-4.0" "CC BY-4.0" ...
#print out the column names
names(Data)
 [1] "Country Name"   "Country Code"   "Indicator Name" "Indicator Code" "1960"          
 [6] "1961"           "1962"           "1963"           "1964"           "1965"          
[11] "1966"           "1967"           "1968"           "1969"           "1970"          
[16] "1971"           "1972"           "1973"           "1974"           "1975"          
[21] "1976"           "1977"           "1978"           "1979"           "1980"          
[26] "1981"           "1982"           "1983"           "1984"           "1985"          
[31] "1986"           "1987"           "1988"           "1989"           "1990"          
[36] "1991"           "1992"           "1993"           "1994"           "1995"          
[41] "1996"           "1997"           "1998"           "1999"           "2000"          
[46] "2001"           "2002"           "2003"           "2004"           "2005"          
[51] "2006"           "2007"           "2008"           "2009"           "2010"          
[56] "2011"           "2012"           "2013"           "2014"           "2015"          
[61] "2016"           "2017"           "2018"          
#print out the column names.
names(Series)
 [1] "Series Code"                         "Topic"                              
 [3] "Indicator Name"                      "Short definition"                   
 [5] "Long definition"                     "Unit of measure"                    
 [7] "Periodicity"                         "Base Period"                        
 [9] "Other notes"                         "Aggregation method"                 
[11] "Limitations and exceptions"          "Notes from original source"         
[13] "General comments"                    "Source"                             
[15] "Statistical concept and methodology" "Development relevance"              
[17] "Related source links"                "Other web links"                    
[19] "Related indicators"                  "License Type"                       

Tidy & Manipulate Data I

Appearently, the Data dataset is not tidy up, column names are values instead of variables. Therefore I use gather function to transform data from wide to long format. After tidy up the Data dataframe, I selected a fews columns in Series dataframe, and join it with Data to generant a new dataframe.

# Gather some columns.
Data <- Data %>% gather(key="Year", value = "Value", -(1:4))
head(Data)
# Selected a fews variables from Series.
Series <- Series %>% select("Series Code", "Topic", "Periodicity", "Aggregation method")
head(Series)
# Rename the columns
names(Data)[names(Data) == "Country Code"] <- "CountryCode"
names(Data)[names(Data) == "Indicator Code"] <- "IndicatorCode"
names(Series)[names(Series) == "Series Code"] <- "IndicatorCode"
# Left join Data and Series by IndicatorCode
Data <- Data %>% left_join(Series)
Joining, by = "IndicatorCode"
head(Data)
# Filter the Data frame to pick up all human population data to generate a new dataframe Data_Pop
Data_Pop <- Data %>% filter(str_detect(IndicatorCode, "^SP.POP.(([0-9]{4})|(80UP)).(FE|MA)$"))
head(Data)

Tidy & Manipulate Data II

Continuly tidy and manipulate the dataframe.

First convert some character variables to factor, lable the IndicatorCode variable and order it. Then create two datasets targets to Female and Male ppopulation.

# Print the data structure
str(Data_Pop)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   519554 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : chr  "ARB" "ARB" "ARB" "ARB" ...
 $ Indicator Name    : chr  "Population ages 00-04, female" "Population ages 00-04, male" "Population ages 05-09, female" "Population ages 05-09, male" ...
 $ IndicatorCode     : chr  "SP.POP.0004.FE" "SP.POP.0004.MA" "SP.POP.0509.FE" "SP.POP.0509.MA" ...
 $ Year              : chr  "1960" "1960" "1960" "1960" ...
 $ Value             : num  8051199 8366535 6406768 6685409 5043026 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
# Print the unique value for ContryCode and Year column.
unique(Data_Pop$CountryCode)
  [1] "ARB" "CSS" "CEB" "EAR" "EAS" "EAP" "TEA" "EMU" "ECS" "ECA" "TEC" "EUU" "FCS" "HPC"
 [15] "HIC" "LTE" "LCN" "LAC" "TLA" "LDC" "LMY" "LIC" "LMC" "MEA" "MNA" "TMN" "MIC" "NAC"
 [29] "INX" "OED" "OSS" "PSS" "PST" "PRE" "SST" "SAS" "TSA" "SSF" "SSA" "TSS" "UMC" "WLD"
 [43] "AFG" "ALB" "DZA" "ASM" "AND" "AGO" "ATG" "ARG" "ARM" "ABW" "AUS" "AUT" "AZE" "BHS"
 [57] "BHR" "BGD" "BRB" "BLR" "BEL" "BLZ" "BEN" "BMU" "BTN" "BOL" "BIH" "BWA" "BRA" "VGB"
 [71] "BRN" "BGR" "BFA" "BDI" "CPV" "KHM" "CMR" "CAN" "CYM" "CAF" "TCD" "CHI" "CHL" "CHN"
 [85] "COL" "COM" "COD" "COG" "CRI" "CIV" "HRV" "CUB" "CUW" "CYP" "CZE" "DNK" "DJI" "DMA"
 [99] "DOM" "ECU" "EGY" "SLV" "GNQ" "ERI" "EST" "SWZ" "ETH" "FRO" "FJI" "FIN" "FRA" "PYF"
[113] "GAB" "GMB" "GEO" "DEU" "GHA" "GIB" "GRC" "GRL" "GRD" "GUM" "GTM" "GIN" "GNB" "GUY"
[127] "HTI" "HND" "HKG" "HUN" "ISL" "IND" "IDN" "IRN" "IRQ" "IRL" "IMN" "ISR" "ITA" "JAM"
[141] "JPN" "JOR" "KAZ" "KEN" "KIR" "PRK" "KOR" "XKX" "KWT" "KGZ" "LAO" "LVA" "LBN" "LSO"
[155] "LBR" "LBY" "LIE" "LTU" "LUX" "MAC" "MDG" "MWI" "MYS" "MDV" "MLI" "MLT" "MHL" "MRT"
[169] "MUS" "MEX" "FSM" "MDA" "MCO" "MNG" "MNE" "MAR" "MOZ" "MMR" "NAM" "NRU" "NPL" "NLD"
[183] "NCL" "NZL" "NIC" "NER" "NGA" "MKD" "MNP" "NOR" "OMN" "PAK" "PLW" "PAN" "PNG" "PRY"
[197] "PER" "PHL" "POL" "PRT" "PRI" "QAT" "ROU" "RUS" "RWA" "WSM" "SMR" "STP" "SAU" "SEN"
[211] "SRB" "SYC" "SLE" "SGP" "SXM" "SVK" "SVN" "SLB" "SOM" "ZAF" "SSD" "ESP" "LKA" "KNA"
[225] "LCA" "MAF" "VCT" "SDN" "SUR" "SWE" "CHE" "SYR" "TJK" "TZA" "THA" "TLS" "TGO" "TON"
[239] "TTO" "TUN" "TUR" "TKM" "TCA" "TUV" "UGA" "UKR" "ARE" "GBR" "USA" "URY" "UZB" "VUT"
[253] "VEN" "VNM" "VIR" "PSE" "YEM" "ZMB" "ZWE"
unique(Data_Pop$Year)
 [1] "1960" "1961" "1962" "1963" "1964" "1965" "1966" "1967" "1968" "1969" "1970" "1971"
[13] "1972" "1973" "1974" "1975" "1976" "1977" "1978" "1979" "1980" "1981" "1982" "1983"
[25] "1984" "1985" "1986" "1987" "1988" "1989" "1990" "1991" "1992" "1993" "1994" "1995"
[37] "1996" "1997" "1998" "1999" "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007"
[49] "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016" "2017" "2018"
# Convert CountryCode to factor
Data_Pop <- Data_Pop %>% mutate(CountryCode = factor(CountryCode))
# Conver Year to factor
Data_Pop <- Data_Pop %>% mutate(Year = factor(Year))
# Print structure
str(Data_Pop)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   519554 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : Factor w/ 259 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Indicator Name    : chr  "Population ages 00-04, female" "Population ages 00-04, male" "Population ages 05-09, female" "Population ages 05-09, male" ...
 $ IndicatorCode     : chr  "SP.POP.0004.FE" "SP.POP.0004.MA" "SP.POP.0509.FE" "SP.POP.0509.MA" ...
 $ Year              : Factor w/ 59 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Value             : num  8051199 8366535 6406768 6685409 5043026 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
# Pickup all the female population data into Data_Pop_FE 
Data_Pop_FE <- Data_Pop %>% filter(endsWith(IndicatorCode, "FE"))
head(Data_Pop_FE)
# Pickup all the male population data into Data_Pop_MA
Data_Pop_MA <- Data_Pop %>% filter(endsWith(IndicatorCode, "MA"))
head(Data_Pop_MA)
# Print out the structor
str(Data_Pop_FE)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   259777 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : Factor w/ 259 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Indicator Name    : chr  "Population ages 00-04, female" "Population ages 05-09, female" "Population ages 10-14, female" "Population ages 15-19, female" ...
 $ IndicatorCode     : chr  "SP.POP.0004.FE" "SP.POP.0509.FE" "SP.POP.1014.FE" "SP.POP.1519.FE" ...
 $ Year              : Factor w/ 59 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Value             : num  8051199 6406768 5043026 4091892 3677343 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
str(Data_Pop_MA)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   259777 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : Factor w/ 259 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Indicator Name    : chr  "Population ages 00-04, male" "Population ages 05-09, male" "Population ages 10-14, male" "Population ages 15-19, male" ...
 $ IndicatorCode     : chr  "SP.POP.0004.MA" "SP.POP.0509.MA" "SP.POP.1014.MA" "SP.POP.1519.MA" ...
 $ Year              : Factor w/ 59 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Value             : num  8366535 6685409 5347352 4223619 3776630 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
# Convert IndicatorCode to factor
Data_Pop_FE <- Data_Pop_FE %>% mutate(IndicatorCode = factor(IndicatorCode))
str(Data_Pop_FE)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   259777 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : Factor w/ 259 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Indicator Name    : chr  "Population ages 00-04, female" "Population ages 05-09, female" "Population ages 10-14, female" "Population ages 15-19, female" ...
 $ IndicatorCode     : Factor w/ 17 levels "SP.POP.0004.FE",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Year              : Factor w/ 59 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Value             : num  8051199 6406768 5043026 4091892 3677343 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
Data_Pop_MA <- Data_Pop_MA %>% mutate(IndicatorCode = factor(IndicatorCode))
str(Data_Pop_MA)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   259777 obs. of  9 variables:
 $ Country Name      : chr  "Arab World" "Arab World" "Arab World" "Arab World" ...
 $ CountryCode       : Factor w/ 259 levels "ABW","AFG","AGO",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ Indicator Name    : chr  "Population ages 00-04, male" "Population ages 05-09, male" "Population ages 10-14, male" "Population ages 15-19, male" ...
 $ IndicatorCode     : Factor w/ 17 levels "SP.POP.0004.MA",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Year              : Factor w/ 59 levels "1960","1961",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Value             : num  8366535 6685409 5347352 4223619 3776630 ...
 $ Topic             : chr  "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" "Health: Population: Structure" ...
 $ Periodicity       : chr  "Annual" "Annual" "Annual" "Annual" ...
 $ Aggregation method: chr  "Sum" "Sum" "Sum" "Sum" ...
# Label the IndocatorCode and order it
Data_Pop_FE <- Data_Pop_FE %>% mutate(IndicatorCode = factor(IndicatorCode, levels = c("SP.POP.0004.FE","SP.POP.0509.FE","SP.POP.1014.FE","SP.POP.1519.FE","SP.POP.2024.FE","SP.POP.2529.FE","SP.POP.3034.FE","SP.POP.3539.FE","SP.POP.4044.FE","SP.POP.4549.FE","SP.POP.5054.FE","SP.POP.5559.FE","SP.POP.6064.FE","SP.POP.6569.FE","SP.POP.7074.FE","SP.POP.7579.FE","SP.POP.80UP.FE"), labels = c("<4","5-9","10-14","15-19","20-24","25-29","30-34","35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80>="), ordered = TRUE))
head(Data_Pop_FE, 16)
Data_Pop_MA <- Data_Pop_MA %>% mutate(IndicatorCode = factor(IndicatorCode, levels = c("SP.POP.0004.MA","SP.POP.0509.MA","SP.POP.1014.MA","SP.POP.1519.MA","SP.POP.2024.MA","SP.POP.2529.MA","SP.POP.3034.MA","SP.POP.3539.MA","SP.POP.4044.MA","SP.POP.4549.MA","SP.POP.5054.MA","SP.POP.5559.MA","SP.POP.6064.MA","SP.POP.6569.MA","SP.POP.7074.MA","SP.POP.7579.MA","SP.POP.80UP.MA"), labels = c("<4","5-9","10-14","15-19","20-24","25-29","30-34","35-39","40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80>="), ordered = TRUE))
head(Data_Pop_MA, 16)

Scan I

Find missning values and replace them with mean value.

length(which(is.na(Data_Pop_FE$Value)))
[1] 25262
length(which(is.na(Data_Pop_MA$Value)))
[1] 25262
# Get all mean value for famale group
Pop_FE_Mean <- Data_Pop_FE %>% group_by(IndicatorCode) %>% summarise(Value = mean(Value, na.rm = TRUE))
kable(Pop_FE_Mean)

IndicatorCode Value
<4 10683790
5-9 9989082
10-14 9392177
15-19 8764482
20-24 8114239
25-29 7459173
30-34 6763063
35-39 6101743
40-44 5477162
45-49 4852887
50-54 4232561
55-59 3631057
60-64 3044799
65-69 2443921
70-74 1860036
75-79 1302138
80>= 1313987

# Get all mean value for male group
Pop_MA_Mean <- Data_Pop_MA %>% group_by(IndicatorCode) %>% summarise(Value = mean(Value, na.rm = TRUE))
kable(Pop_MA_Mean)

IndicatorCode Value
<4 11282938.2
5-9 10538623.7
10-14 9886519.6
15-19 9187551.1
20-24 8452430.8
25-29 7722641.5
30-34 6969390.3
35-39 6252901.2
40-44 5571914.3
45-49 4879638.8
50-54 4179905.9
55-59 3490302.2
60-64 2812459.0
65-69 2130532.6
70-74 1496447.4
75-79 938941.1
80>= 748580.0

# Replace mssing values by mean in female group.
Data_Pop_FE <- Data_Pop_FE %>% left_join(Pop_FE_Mean, by="IndicatorCode") %>% mutate(Value = ifelse(is.na(Value.x), Value.y,Value.x)) %>% select(-Value.x, -Value.y)
# Print how many missing value after replacing.
length(which(is.na(Data_Pop_FE$Value)))
[1] 0
# Replace mssing values by mean in male group.
Data_Pop_MA <- Data_Pop_MA %>% left_join(Pop_MA_Mean, by="IndicatorCode") %>% mutate(Value = ifelse(is.na(Value.x), Value.y,Value.x)) %>% select(-Value.x, -Value.y)
# Print how many missing value after replacing.
length(which(is.na(Data_Pop_FE$Value)))
[1] 0

Scan II

Deal with outliers for female group and male group. Pick up those population ages between 30 to 34 and replace the outliers values by mean

#Pick up all the female population aged between 30 to 34
Data_Pop_FE_30_34 <- Data_Pop_FE %>% filter(IndicatorCode == "30-34")
head(Data_Pop_FE_30_34, 16)
# Get z scroes
z.scores <- Data_Pop_FE_30_34$Value %>% scores(type="z")
z.scores %>% summary()
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.31186 -0.30828 -0.29648  0.00000 -0.07227 12.94023 
# Replace outliers by mean
Data_Pop_FE_30_34$Value[which(abs(z.scores) > 3)] <- mean(Data_Pop_FE_30_34$Value, na.rm = TRUE)
#Pick up all the female population aged between 30 to 34
Data_Pop_MA_30_34 <- Data_Pop_MA %>% filter(IndicatorCode == "30-34")
head(Data_Pop_MA_30_34, 16)
# Get z scroes
z.scores <- Data_Pop_MA_30_34$Value %>% scores(type="z")
z.scores %>% summary()
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.31060 -0.30699 -0.29581  0.00000 -0.07884 12.99371 
# Replace outliers by mean
Data_Pop_MA_30_34$Value[which(abs(z.scores) > 3)] <- mean(Data_Pop_MA_30_34$Value, na.rm = TRUE)

Transform

Apply an appropriate transformation for at least one of the variables. In addition to the R codes and outputs, explain everything that you do in this step. In this step, you should fulfil the minimum requirement #9.

# Print historgrams before transform
hist(Data_Pop_FE$Value, main = "Histogram Data in Female Group", xlab = "Population")

# Apply transform to value
Data_Pop_FE$Value <- log10(Data_Pop_FE$Value)
# Print histograms after transform.
hist(Data_Pop_FE$Value, main = "Histogram Data in Female Group",xlab = "Population")

# Print historgrams after transform
hist(Data_Pop_MA$Value, main = "Histogram Data in Male Group")

# Apply transform to value
Data_Pop_MA$Value <- log10(Data_Pop_MA$Value)
# Print historgrams after transform
hist(Data_Pop_MA$Value, main = "Histogram Data in Male Group",xlab = "Population")

LS0tCnRpdGxlOiAiTUFUSDIzNDkgU2VtZXN0ZXIgMiwgMjAxOSIKYXV0aG9yOiAiWWFuYm8gRGFuZyBzMzY5NzkwMyIKc3VidGl0bGU6IEFzc2lnbm1lbnQgMwpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAotLS0KCiMjIFJlcXVpcmVkIHBhY2thZ2VzIAoKYGBge3J9CmxpYnJhcnkocmVhZHhsKQpsaWJyYXJ5KHRpZHlyKQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KGtuaXRyKQpsaWJyYXJ5KHN0cmluZ3IpCmxpYnJhcnkob3V0bGllcnMpCmBgYAoKCiMjIEV4ZWN1dGl2ZSBTdW1tYXJ5IAoKVGhpcyBhc3NpZ25tZW50IGZvY3VzZWQgb24gdGhlIHR3byBkYXRhc2V0cyBjb21lcyBmcm9tIFdvcmxkIEJhbmsgT3JnYW5pemF0aW9uLiBJbiB0aGVzZSBkYXRhIHNldHPvvIxpdCBwcm92aWRlcyBrZXkgaGVhbHRoLCBudXRyaXRpb24gYW5kIHBvcHVsYXRpb24gc3RhdGlzdGljcyBnYXRoZXJlZCBmcm9tIGEgdmFyaWV0eSBvZiBpbnRlcm5hdGlvbmFsIGFuZCBuYXRpb25hbCBzb3VyY2VzLiBUaGVtZXMgaW5jbHVkZSBnbG9iYWwgc3VyZ2VyeSwgaGVhbHRoIGZpbmFuY2luZywgSElWL0FJRFMsIGltbXVuaXphdGlvbiwgaW5mZWN0aW91cyBkaXNlYXNlcywgbWVkaWNhbCByZXNvdXJjZXMgYW5kIHVzYWdlLCBub25jb21tdW5pY2FibGUgZGlzZWFzZXMsIG51dHJpdGlvbiwgcG9wdWxhdGlvbiBkeW5hbWljcywgcmVwcm9kdWN0aXZlIGhlYWx0aCwgdW5pdmVyc2FsIGhlYWx0aCBjb3ZlcmFnZSwgYW5kIHdhdGVyIGFuZCBzYW5pdGF0aW9uLiBJdCBjb3ZlcnMgYWxsIHRoZSBjb3VudHJpZXMgYW5kIHJlZ2lvbnMgaW4gdGhlIGVhcnRoIGFuZCB0aGUgZGF0YSBzdGFydHMgZnJvbSB5ZWFyIDE5NjAgdG8gMjAxOC4gU28gdGhlcmUgYXJlIGxvdHMgb2YgaW5mb3JtYXRpb24gYW5kIGRhdGEgaW4gdGhlc2UgZGF0YSBiYXNlLgoKSW4gdGVybXMgb2YgdGhpcyBhc3NpZ25tZW50IHJlcXVpcmVtZW50LCBJIG9ubHkgaW52ZXN0aWdhdGUgdGhlIHBvcHVsYXRpb24gY3Jvc3MgdGhlIHdvcmxkIGluIGRpZmZlcmVudCBhZ2UgZ3JvdXAgYW5kIGdlbmRlci4gQXQgZmlyc3QsIEkgbG9hZCB0aGUgZXhjZWwgZmlsZXMgaW50byBkYXRhZnJhbWUgYnkgdGhlIHJlbGF0ZWQgZnVuY3Rpb25zLCBwcmludCBvdXQgdGhlIGRhdGFmcmFtZSBzdHJ1Y3R1cmVzLCBkaW1lbnNpb25zIGFuZCBvdGhlciBpbmZvcm1hdGlvbi4gVGhlbiwgSSBxdWlja2x5IGdvIHRocm91Z2ggYWxsIHRoZSB2YXJpYWJsZXMsIG9ic2VydmF0aW9ucyBjcm9zcyB0aGUgZGF0YXNldHMsIHRvIGdyYWIgc29tZSBpZGVhcyB0byB1bmRlcnN0YW5kIHRoZSByZWxhdGlvbnNoaXBzIGJldHdlZW4gZGF0YXNldHMgYW5kIGhhdmUgYSBsb29rIGF0IGl0IGlmIHRoZSBkYXRhIGlzIHRpZHkgb3Igbm90LiBBZnRlciB0aGF0LCBJIHVzZSB0aGUgdG9vbHMgYW5kIGxpYnJhcmllcyBJIGxlYXJuZWQgaW4gY2xhc3MgdG8gdGlkeSB1cCB0aGVzZSBkYXRhLiBBZnRlciB0aGF0LCBJIGpvaW5lZCB0aGUgdHdvIGRhdGFzZXRzIGludG8gb25lIGJ5IGEgY29tbW9uIHZhcmlhYmxlLCBzdWJzZXQgYSBzbWFsbCBkYXRhc2V0IGZyb20gdGhlIGxhcmdlIG9uZSBhcyB0aGUgb3JpZ2luYWwgZGF0YXNldCBhcmUgaHVnZSBhbmQgY292ZXJlZCBsb3RzIG9mIGFyZWFzIHJlbGF0ZWQgdG8gaHVtYW4uCgpJbiB0aGUgbGFzdCBwYXJ0IG9mIHRoaXMgYXNzaWdubWVudCwgYWZ0ZXIgdGlkeWluZyB1cCB0aGUgZGF0YSwgSSBzY2FubmVkIHRoZSBkYXRhIHRvIGZpbmQgc29tZSBtaXNzaW5nIHZhbHVlcywgb3V0bGllciwgdGhlbiBJIGRlYWwgd2l0aCB0aGVzZSBleGNlcHRpb25zIGluIHRoZSBkYXRhc2V0LiBBcyByZXF1aXJlZCBieSB0aGUgYXNzaWdubWVudCwgSSBkaWQgYW4gYXBwcm9wcmlhdGUgdHJhbnNmb3JtYXRpb24gb24gYSB2YXJpYWJsZSwgcHJpbnQgb3V0IHRoZSBncmFwaGljIGFzIHdlbGwuCgoKIyMgRGF0YSAKClRoZSBkYXRhc2V0IGxpbmtzIGluIHRoaXMgYXNzaWdubWVudCBpczogaHR0cHM6Ly9kYXRhY2F0YWxvZy53b3JsZGJhbmsub3JnL2RhdGFzZXQvaGVhbHRoLW51dHJpdGlvbi1hbmQtcG9wdWxhdGlvbi1zdGF0aXN0aWNzCgpBcyB0aGVyZSBhcmUgc2V2ZXJhbCBleGNlbCBmaWxlcyBpbiB0aGUgZGF0YXNldCwgSSBwaWNrZWQgdXAgdHdvIG9mIHRoZW0gdG8gZmluaXNoIG15IGFzc2lnbm1lbnQuIEZpcnN0LCBsb2FkaW5nIHRoZSBleGNlbHMgaW50byBkYXRhZnJhbWUgYW5kIHByaW50IG91dCBzb21lIHN0cnVjdHVyZSBpbmZvcm1hdGlvbi4KCmBgYHtyfQoKI0xvYWQgRGF0YS54bHN4IGludG8gRGF0YQpEYXRhIDwtIHJlYWRfZXhjZWwoIkhOUC9EYXRhLnhsc3giKQoKI1ByaW50IG91dCB0aGUgZmlyc3QgZml2ZSByb3dzCmhlYWQoRGF0YSkKCiNMb2FkIFNlcmllcy54bHN4IGludG8gU2VyaWVzClNlcmllcyA8LSByZWFkX2V4Y2VsKCJITlAvU2VyaWVzLnhsc3giKQoKI1ByaW50IG91dCB0aGUgZmlyc3QgZml2ZSByb3dzIApoZWFkKFNlcmllcykKYGBgCgojIyBWYXJpYWJsZXMgZGVzY3JpcHRpb24gKHNlbGVjdGVkIHZhcmlhYmxlcykKCkNvdW50cnkgTmFtZTogdGhlIGNvdW50cnkgbmFtZQpDb3VudHJ5IENvZGU6IHRoZSBjb2RlIG9mIGEgY291bnRyeSwgdXN1YWxseSBpdCdzIHRoZSBhYmJyZXZhdGlvbiBvZiB0aGUgY291bnRyeSBuYW1lLgpJbmRpY2F0b3IgTmFtZTogYSBicmllZiBpbmZvcmFtdGlvbiBhYm91dCB0aGUgc3RhdGlzdGljIGRhdGEuCkluZGljYXRvciBDb2RlOiB0aGUgY29kZSBvZiB0aGUgaW5kaWNhdG9yIG5hbWUuClllYXI6IHN0YW5kcyBmb3Igd2hlbiB0aGUgZGF0YSBjb21lcyBmcm9tClNlcmllcyBDb2RlOiBhIGNvZGUgd2hpY2ggaXMgdGhlIHNhbWUgYXMgSW5kaWNhdG9yIENvZGUsIHN0YW5kcyBmb3IgdGhlIGNhdGVnb3J5IG9mIHRoZSBzdGF0aXN0aWMgZGF0YS4KVG9waWM6IGEgZ3JvdXAgbmFtZSBmb3IgZWFjaCBkYXRhClBlcmlvZGljaXR5OiBUaGUgZnJlcXVlbmN5IG9mIHRoZSBkYXRhIHVwZGF0ZWQKQWdncmVnYXRpb24gbWV0aG9kOiBob3cgdGhlIGRhdGEgYWdncmVnYXRpb24uCgoKIyMgVW5kZXJzdGFuZCAKCkJ5IHVzaW5nIHRoZSBmdW5jdGlvbiAic3RyIiB0byBpbnNwZWN0IHRoZSBkYXRhIHZhcmlhYmxlcy4gCgpGcm9tIHRoZSBvdXRwdXQsIEkgY2FuIHNlZSB0aGF0IHRoZXJlIGFyZSA4MiB2YXJpYWJsZXMgYW5kIDIwIHZhcmlhYmxlcyBpbiBEYXRhIGFuZCBTZXJpZXMuIFRoZSBtb3N0IG9mIHR5cGVzIG9mIHRoZXNlIHZhcmlhYmxlcyBhcmUgY2hhcmFjdGVyIGFuZCBudW1iZXIsIGJ1dCBhcHBhcmVudGx5LCBzb21lIG9mIHRoZW0gc2hvdWQgYmUgZmFjdG9yLAoKYGBge3J9CgojUHJpbnQgb3V0IHRoZSBzdHJ1Y3R1cmVzCnN0cihEYXRhKQoKI1ByaW50IG91dCBzdHJ1Y3R1cmUgaW5mb3JhbXRpb24Kc3RyKFNlcmllcykKCiNwcmludCBvdXQgdGhlIGNvbHVtbiBuYW1lcwpuYW1lcyhEYXRhKQoKI3ByaW50IG91dCB0aGUgY29sdW1uIG5hbWVzLgpuYW1lcyhTZXJpZXMpCmBgYAoKCiMjCVRpZHkgJiBNYW5pcHVsYXRlIERhdGEgSSAKCkFwcGVhcmVudGx5LCB0aGUgRGF0YSBkYXRhc2V0IGlzIG5vdCB0aWR5IHVwLCBjb2x1bW4gbmFtZXMgYXJlIHZhbHVlcyBpbnN0ZWFkIG9mIHZhcmlhYmxlcy4gVGhlcmVmb3JlIEkgdXNlIGdhdGhlciBmdW5jdGlvbiB0byB0cmFuc2Zvcm0gZGF0YSBmcm9tIHdpZGUgdG8gbG9uZyBmb3JtYXQuIEFmdGVyIHRpZHkgdXAgdGhlIERhdGEgZGF0YWZyYW1lLCBJIHNlbGVjdGVkIGEgZmV3cyBjb2x1bW5zIGluIFNlcmllcyBkYXRhZnJhbWUsIGFuZCBqb2luIGl0IHdpdGggRGF0YSB0byBnZW5lcmFudCBhIG5ldyBkYXRhZnJhbWUuCgpgYGB7cn0KCiMgR2F0aGVyIHNvbWUgY29sdW1ucy4KRGF0YSA8LSBEYXRhICU+JSBnYXRoZXIoa2V5PSJZZWFyIiwgdmFsdWUgPSAiVmFsdWUiLCAtKDE6NCkpCmhlYWQoRGF0YSkKCiMgU2VsZWN0ZWQgYSBmZXdzIHZhcmlhYmxlcyBmcm9tIFNlcmllcy4KU2VyaWVzIDwtIFNlcmllcyAlPiUgc2VsZWN0KCJTZXJpZXMgQ29kZSIsICJUb3BpYyIsICJQZXJpb2RpY2l0eSIsICJBZ2dyZWdhdGlvbiBtZXRob2QiKQpoZWFkKFNlcmllcykKCiMgUmVuYW1lIHRoZSBjb2x1bW5zCm5hbWVzKERhdGEpW25hbWVzKERhdGEpID09ICJDb3VudHJ5IENvZGUiXSA8LSAiQ291bnRyeUNvZGUiCm5hbWVzKERhdGEpW25hbWVzKERhdGEpID09ICJJbmRpY2F0b3IgQ29kZSJdIDwtICJJbmRpY2F0b3JDb2RlIgpuYW1lcyhTZXJpZXMpW25hbWVzKFNlcmllcykgPT0gIlNlcmllcyBDb2RlIl0gPC0gIkluZGljYXRvckNvZGUiCgojIExlZnQgam9pbiBEYXRhIGFuZCBTZXJpZXMgYnkgSW5kaWNhdG9yQ29kZQpEYXRhIDwtIERhdGEgJT4lIGxlZnRfam9pbihTZXJpZXMpCmhlYWQoRGF0YSkKCiMgRmlsdGVyIHRoZSBEYXRhIGZyYW1lIHRvIHBpY2sgdXAgYWxsIGh1bWFuIHBvcHVsYXRpb24gZGF0YSB0byBnZW5lcmF0ZSBhIG5ldyBkYXRhZnJhbWUgRGF0YV9Qb3AKRGF0YV9Qb3AgPC0gRGF0YSAlPiUgZmlsdGVyKHN0cl9kZXRlY3QoSW5kaWNhdG9yQ29kZSwgIl5TUC5QT1AuKChbMC05XXs0fSl8KDgwVVApKS4oRkV8TUEpJCIpKQpoZWFkKERhdGEpCgpgYGAKCiMjCVRpZHkgJiBNYW5pcHVsYXRlIERhdGEgSUkgCgpDb250aW51bHkgdGlkeSBhbmQgbWFuaXB1bGF0ZSB0aGUgZGF0YWZyYW1lLiAKCkZpcnN0IGNvbnZlcnQgc29tZSBjaGFyYWN0ZXIgdmFyaWFibGVzIHRvIGZhY3RvciwgbGFibGUgdGhlIEluZGljYXRvckNvZGUgdmFyaWFibGUgYW5kIG9yZGVyIGl0LiBUaGVuIGNyZWF0ZSB0d28gZGF0YXNldHMgdGFyZ2V0cyB0byBGZW1hbGUgYW5kIE1hbGUgcHBvcHVsYXRpb24uCgpgYGB7cn0KCiMgUHJpbnQgdGhlIGRhdGEgc3RydWN0dXJlCnN0cihEYXRhX1BvcCkKCiMgUHJpbnQgdGhlIHVuaXF1ZSB2YWx1ZSBmb3IgQ29udHJ5Q29kZSBhbmQgWWVhciBjb2x1bW4uCnVuaXF1ZShEYXRhX1BvcCRDb3VudHJ5Q29kZSkKdW5pcXVlKERhdGFfUG9wJFllYXIpCgojIENvbnZlcnQgQ291bnRyeUNvZGUgdG8gZmFjdG9yCkRhdGFfUG9wIDwtIERhdGFfUG9wICU+JSBtdXRhdGUoQ291bnRyeUNvZGUgPSBmYWN0b3IoQ291bnRyeUNvZGUpKQoKIyBDb252ZXIgWWVhciB0byBmYWN0b3IKRGF0YV9Qb3AgPC0gRGF0YV9Qb3AgJT4lIG11dGF0ZShZZWFyID0gZmFjdG9yKFllYXIpKQoKIyBQcmludCBzdHJ1Y3R1cmUKc3RyKERhdGFfUG9wKQoKIyBQaWNrdXAgYWxsIHRoZSBmZW1hbGUgcG9wdWxhdGlvbiBkYXRhIGludG8gRGF0YV9Qb3BfRkUgCkRhdGFfUG9wX0ZFIDwtIERhdGFfUG9wICU+JSBmaWx0ZXIoZW5kc1dpdGgoSW5kaWNhdG9yQ29kZSwgIkZFIikpCmhlYWQoRGF0YV9Qb3BfRkUpCgojIFBpY2t1cCBhbGwgdGhlIG1hbGUgcG9wdWxhdGlvbiBkYXRhIGludG8gRGF0YV9Qb3BfTUEKRGF0YV9Qb3BfTUEgPC0gRGF0YV9Qb3AgJT4lIGZpbHRlcihlbmRzV2l0aChJbmRpY2F0b3JDb2RlLCAiTUEiKSkKaGVhZChEYXRhX1BvcF9NQSkKCiMgUHJpbnQgb3V0IHRoZSBzdHJ1Y3RvcgpzdHIoRGF0YV9Qb3BfRkUpCnN0cihEYXRhX1BvcF9NQSkKCiMgQ29udmVydCBJbmRpY2F0b3JDb2RlIHRvIGZhY3RvcgpEYXRhX1BvcF9GRSA8LSBEYXRhX1BvcF9GRSAlPiUgbXV0YXRlKEluZGljYXRvckNvZGUgPSBmYWN0b3IoSW5kaWNhdG9yQ29kZSkpCnN0cihEYXRhX1BvcF9GRSkKCkRhdGFfUG9wX01BIDwtIERhdGFfUG9wX01BICU+JSBtdXRhdGUoSW5kaWNhdG9yQ29kZSA9IGZhY3RvcihJbmRpY2F0b3JDb2RlKSkKc3RyKERhdGFfUG9wX01BKQoKIyBMYWJlbCB0aGUgSW5kb2NhdG9yQ29kZSBhbmQgb3JkZXIgaXQKRGF0YV9Qb3BfRkUgPC0gRGF0YV9Qb3BfRkUgJT4lIG11dGF0ZShJbmRpY2F0b3JDb2RlID0gZmFjdG9yKEluZGljYXRvckNvZGUsIGxldmVscyA9IGMoIlNQLlBPUC4wMDA0LkZFIiwiU1AuUE9QLjA1MDkuRkUiLCJTUC5QT1AuMTAxNC5GRSIsIlNQLlBPUC4xNTE5LkZFIiwiU1AuUE9QLjIwMjQuRkUiLCJTUC5QT1AuMjUyOS5GRSIsIlNQLlBPUC4zMDM0LkZFIiwiU1AuUE9QLjM1MzkuRkUiLCJTUC5QT1AuNDA0NC5GRSIsIlNQLlBPUC40NTQ5LkZFIiwiU1AuUE9QLjUwNTQuRkUiLCJTUC5QT1AuNTU1OS5GRSIsIlNQLlBPUC42MDY0LkZFIiwiU1AuUE9QLjY1NjkuRkUiLCJTUC5QT1AuNzA3NC5GRSIsIlNQLlBPUC43NTc5LkZFIiwiU1AuUE9QLjgwVVAuRkUiKSwgbGFiZWxzID0gYygiPDQiLCI1LTkiLCIxMC0xNCIsIjE1LTE5IiwiMjAtMjQiLCIyNS0yOSIsIjMwLTM0IiwiMzUtMzkiLCI0MC00NCIsIjQ1LTQ5IiwiNTAtNTQiLCI1NS01OSIsIjYwLTY0IiwiNjUtNjkiLCI3MC03NCIsIjc1LTc5IiwiODA+PSIpLCBvcmRlcmVkID0gVFJVRSkpCgpoZWFkKERhdGFfUG9wX0ZFLCAxNikKCkRhdGFfUG9wX01BIDwtIERhdGFfUG9wX01BICU+JSBtdXRhdGUoSW5kaWNhdG9yQ29kZSA9IGZhY3RvcihJbmRpY2F0b3JDb2RlLCBsZXZlbHMgPSBjKCJTUC5QT1AuMDAwNC5NQSIsIlNQLlBPUC4wNTA5Lk1BIiwiU1AuUE9QLjEwMTQuTUEiLCJTUC5QT1AuMTUxOS5NQSIsIlNQLlBPUC4yMDI0Lk1BIiwiU1AuUE9QLjI1MjkuTUEiLCJTUC5QT1AuMzAzNC5NQSIsIlNQLlBPUC4zNTM5Lk1BIiwiU1AuUE9QLjQwNDQuTUEiLCJTUC5QT1AuNDU0OS5NQSIsIlNQLlBPUC41MDU0Lk1BIiwiU1AuUE9QLjU1NTkuTUEiLCJTUC5QT1AuNjA2NC5NQSIsIlNQLlBPUC42NTY5Lk1BIiwiU1AuUE9QLjcwNzQuTUEiLCJTUC5QT1AuNzU3OS5NQSIsIlNQLlBPUC44MFVQLk1BIiksIGxhYmVscyA9IGMoIjw0IiwiNS05IiwiMTAtMTQiLCIxNS0xOSIsIjIwLTI0IiwiMjUtMjkiLCIzMC0zNCIsIjM1LTM5IiwiNDAtNDQiLCI0NS00OSIsIjUwLTU0IiwiNTUtNTkiLCI2MC02NCIsIjY1LTY5IiwiNzAtNzQiLCI3NS03OSIsIjgwPj0iKSwgb3JkZXJlZCA9IFRSVUUpKQoKaGVhZChEYXRhX1BvcF9NQSwgMTYpCgpgYGAKCgojIwlTY2FuIEkgCgpGaW5kIG1pc3NuaW5nIHZhbHVlcyBhbmQgcmVwbGFjZSB0aGVtIHdpdGggbWVhbiB2YWx1ZS4KCmBgYHtyfQoKbGVuZ3RoKHdoaWNoKGlzLm5hKERhdGFfUG9wX0ZFJFZhbHVlKSkpCgpsZW5ndGgod2hpY2goaXMubmEoRGF0YV9Qb3BfTUEkVmFsdWUpKSkKCiMgR2V0IGFsbCBtZWFuIHZhbHVlIGZvciBmYW1hbGUgZ3JvdXAKUG9wX0ZFX01lYW4gPC0gRGF0YV9Qb3BfRkUgJT4lIGdyb3VwX2J5KEluZGljYXRvckNvZGUpICU+JSBzdW1tYXJpc2UoVmFsdWUgPSBtZWFuKFZhbHVlLCBuYS5ybSA9IFRSVUUpKQoKa2FibGUoUG9wX0ZFX01lYW4pCgojIEdldCBhbGwgbWVhbiB2YWx1ZSBmb3IgbWFsZSBncm91cApQb3BfTUFfTWVhbiA8LSBEYXRhX1BvcF9NQSAlPiUgZ3JvdXBfYnkoSW5kaWNhdG9yQ29kZSkgJT4lIHN1bW1hcmlzZShWYWx1ZSA9IG1lYW4oVmFsdWUsIG5hLnJtID0gVFJVRSkpCgprYWJsZShQb3BfTUFfTWVhbikKCiMgUmVwbGFjZSBtc3NpbmcgdmFsdWVzIGJ5IG1lYW4gaW4gZmVtYWxlIGdyb3VwLgpEYXRhX1BvcF9GRSA8LSBEYXRhX1BvcF9GRSAlPiUgbGVmdF9qb2luKFBvcF9GRV9NZWFuLCBieT0iSW5kaWNhdG9yQ29kZSIpICU+JSBtdXRhdGUoVmFsdWUgPSBpZmVsc2UoaXMubmEoVmFsdWUueCksIFZhbHVlLnksVmFsdWUueCkpICU+JSBzZWxlY3QoLVZhbHVlLngsIC1WYWx1ZS55KQoKIyBQcmludCBob3cgbWFueSBtaXNzaW5nIHZhbHVlIGFmdGVyIHJlcGxhY2luZy4KbGVuZ3RoKHdoaWNoKGlzLm5hKERhdGFfUG9wX0ZFJFZhbHVlKSkpCgojIFJlcGxhY2UgbXNzaW5nIHZhbHVlcyBieSBtZWFuIGluIG1hbGUgZ3JvdXAuCkRhdGFfUG9wX01BIDwtIERhdGFfUG9wX01BICU+JSBsZWZ0X2pvaW4oUG9wX01BX01lYW4sIGJ5PSJJbmRpY2F0b3JDb2RlIikgJT4lIG11dGF0ZShWYWx1ZSA9IGlmZWxzZShpcy5uYShWYWx1ZS54KSwgVmFsdWUueSxWYWx1ZS54KSkgJT4lIHNlbGVjdCgtVmFsdWUueCwgLVZhbHVlLnkpCgojIFByaW50IGhvdyBtYW55IG1pc3NpbmcgdmFsdWUgYWZ0ZXIgcmVwbGFjaW5nLgpsZW5ndGgod2hpY2goaXMubmEoRGF0YV9Qb3BfRkUkVmFsdWUpKSkKCmBgYAoKCiMjCVNjYW4gSUkKCkRlYWwgd2l0aCBvdXRsaWVycyBmb3IgZmVtYWxlIGdyb3VwIGFuZCBtYWxlIGdyb3VwLiBQaWNrIHVwIHRob3NlIHBvcHVsYXRpb24gYWdlcyBiZXR3ZWVuIDMwIHRvIDM0IGFuZCByZXBsYWNlIHRoZSBvdXRsaWVycyB2YWx1ZXMgYnkgbWVhbgoKYGBge3J9CgojUGljayB1cCBhbGwgdGhlIGZlbWFsZSBwb3B1bGF0aW9uIGFnZWQgYmV0d2VlbiAzMCB0byAzNApEYXRhX1BvcF9GRV8zMF8zNCA8LSBEYXRhX1BvcF9GRSAlPiUgZmlsdGVyKEluZGljYXRvckNvZGUgPT0gIjMwLTM0IikKaGVhZChEYXRhX1BvcF9GRV8zMF8zNCwgMTYpCgojIEdldCB6IHNjcm9lcwp6LnNjb3JlcyA8LSBEYXRhX1BvcF9GRV8zMF8zNCRWYWx1ZSAlPiUgc2NvcmVzKHR5cGU9InoiKQp6LnNjb3JlcyAlPiUgc3VtbWFyeSgpCgojIFJlcGxhY2Ugb3V0bGllcnMgYnkgbWVhbgpEYXRhX1BvcF9GRV8zMF8zNCRWYWx1ZVt3aGljaChhYnMoei5zY29yZXMpID4gMyldIDwtIG1lYW4oRGF0YV9Qb3BfRkVfMzBfMzQkVmFsdWUsIG5hLnJtID0gVFJVRSkKCiNQaWNrIHVwIGFsbCB0aGUgZmVtYWxlIHBvcHVsYXRpb24gYWdlZCBiZXR3ZWVuIDMwIHRvIDM0CkRhdGFfUG9wX01BXzMwXzM0IDwtIERhdGFfUG9wX01BICU+JSBmaWx0ZXIoSW5kaWNhdG9yQ29kZSA9PSAiMzAtMzQiKQpoZWFkKERhdGFfUG9wX01BXzMwXzM0LCAxNikKCiMgR2V0IHogc2Nyb2VzCnouc2NvcmVzIDwtIERhdGFfUG9wX01BXzMwXzM0JFZhbHVlICU+JSBzY29yZXModHlwZT0ieiIpCnouc2NvcmVzICU+JSBzdW1tYXJ5KCkKCiMgUmVwbGFjZSBvdXRsaWVycyBieSBtZWFuCkRhdGFfUG9wX01BXzMwXzM0JFZhbHVlW3doaWNoKGFicyh6LnNjb3JlcykgPiAzKV0gPC0gbWVhbihEYXRhX1BvcF9NQV8zMF8zNCRWYWx1ZSwgbmEucm0gPSBUUlVFKQpgYGAKCgojIwlUcmFuc2Zvcm0gCgpBcHBseSBhbiBhcHByb3ByaWF0ZSB0cmFuc2Zvcm1hdGlvbiBmb3IgYXQgbGVhc3Qgb25lIG9mIHRoZSB2YXJpYWJsZXMuIEluIGFkZGl0aW9uIHRvIHRoZSBSIGNvZGVzIGFuZCBvdXRwdXRzLCBleHBsYWluIGV2ZXJ5dGhpbmcgdGhhdCB5b3UgZG8gaW4gdGhpcyBzdGVwLiBJbiB0aGlzIHN0ZXAsIHlvdSBzaG91bGQgZnVsZmlsIHRoZSBtaW5pbXVtIHJlcXVpcmVtZW50ICM5LgoKYGBge3J9CgojIFByaW50IGhpc3RvcmdyYW1zIGJlZm9yZSB0cmFuc2Zvcm0KaGlzdChEYXRhX1BvcF9GRSRWYWx1ZSwgbWFpbiA9ICJIaXN0b2dyYW0gRGF0YSBpbiBGZW1hbGUgR3JvdXAiLCB4bGFiID0gIlBvcHVsYXRpb24iKQoKIyBBcHBseSB0cmFuc2Zvcm0gdG8gdmFsdWUKRGF0YV9Qb3BfRkUkVmFsdWUgPC0gbG9nMTAoRGF0YV9Qb3BfRkUkVmFsdWUpCgojIFByaW50IGhpc3RvZ3JhbXMgYWZ0ZXIgdHJhbnNmb3JtLgpoaXN0KERhdGFfUG9wX0ZFJFZhbHVlLCBtYWluID0gIkhpc3RvZ3JhbSBEYXRhIGluIEZlbWFsZSBHcm91cCIseGxhYiA9ICJQb3B1bGF0aW9uIikKCiMgUHJpbnQgaGlzdG9yZ3JhbXMgYWZ0ZXIgdHJhbnNmb3JtCmhpc3QoRGF0YV9Qb3BfTUEkVmFsdWUsIG1haW4gPSAiSGlzdG9ncmFtIERhdGEgaW4gTWFsZSBHcm91cCIpCgojIEFwcGx5IHRyYW5zZm9ybSB0byB2YWx1ZQpEYXRhX1BvcF9NQSRWYWx1ZSA8LSBsb2cxMChEYXRhX1BvcF9NQSRWYWx1ZSkKCiMgUHJpbnQgaGlzdG9yZ3JhbXMgYWZ0ZXIgdHJhbnNmb3JtCmhpc3QoRGF0YV9Qb3BfTUEkVmFsdWUsIG1haW4gPSAiSGlzdG9ncmFtIERhdGEgaW4gTWFsZSBHcm91cCIseGxhYiA9ICJQb3B1bGF0aW9uIikKCmBgYAo=