Required packages

Installed and loaded the necessary packages.

library(readr) # useful for importing data
library(magrittr) #useful for pipe operator
library(tidyr) #useful for tidying data
library(dplyr) #useful for data manipulation
library(Hmisc) #to replace the missing values
library(outliers) #useful for finding the outliers
library(lubridate) #useful for date transformation
library(car) # useful for plotting qqPlot

Executive Summary

Objective of this report is to find an open data set with creative commons licence and apply the various data preprocessing concepts acquired through Data Preprocessing course. The sequence of steps followed for data preprocessing is as follows.

Data

The data set contains the official details of 11538 athletes competed in 2016 Olympics Games in Rio de Janeiro and their respective countries.Collected dataset from kaggle( https://www.kaggle.com/rio2016/olympic-games )

Considered two data files ‘athletes.csv’ with 11 columns and ‘countries.csv’ with 4 columns.

The athletes data set contains following columns, id: Athlete ID name: Athlete name nationality: IOC country code of Athlete sex: Athlete gender dob: Athlete date of birth height: Athlete height weight: Athlete weight sport: The event in which athlete competes gold: Number of gold medal silver: Number of silver medal bronze: Number of bronze medal

The countries table contains the following attributes, Country: Country Code: IOC Country code Population: Total population of country gdp_per_capita: GDP per capita of the country

Imported the dataset using base R function and restricted the auto conversion of characters to strings . Using merge() function athletes table was joined with countries table to form olympics dataset based on the common attribute ie,country IOC code and displayed the first few rows using head function.

athletes <- read.csv("athletes.csv",stringsAsFactors = FALSE)
head(athletes)
countries <- read.csv("countries.csv",stringsAsFactors = FALSE)
head(countries)
olympics <- merge(athletes,countries,by.x = "nationality",by.y = "code")
head(olympics)

Understand

Summarised the types of variables and other statistics using ‘summarise()’ function. The data stucture of each variables were found using ‘str()’ function. Found that certain datatypes were captured incorrectly and performed proper datatype conversion on dob(char to date)and factorised sex,nationality and country variables.

summary(olympics)
 nationality              id                name               sex                dob           
 Length:11464       Min.   :    18347   Length:11464       Length:11464       Length:11464      
 Class :character   1st Qu.:245072255   Class :character   Class :character   Class :character  
 Mode  :character   Median :499491784   Mode  :character   Mode  :character   Mode  :character  
                    Mean   :499588457                                                           
                    3rd Qu.:753180230                                                           
                    Max.   :999987786                                                           
                                                                                                
     height          weight          sport                gold             silver            bronze       
 Min.   :1.210   Min.   : 31.00   Length:11464       Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:1.690   1st Qu.: 60.00   Class :character   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.760   Median : 70.00   Mode  :character   Median :0.00000   Median :0.00000   Median :0.00000  
 Mean   :1.766   Mean   : 72.04                      Mean   :0.05792   Mean   :0.05714   Mean   :0.06132  
 3rd Qu.:1.840   3rd Qu.: 81.00                      3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :2.210   Max.   :170.00                      Max.   :5.00000   Max.   :2.00000   Max.   :2.00000  
 NA's   :325     NA's   :654                                                                              
   country            population        gdp_per_capita    
 Length:11464       Min.   :1.022e+04   Min.   :   277.1  
 Class :character   1st Qu.:1.035e+07   1st Qu.:  8027.7  
 Mode  :character   Median :4.342e+07   Median : 18002.2  
                    Mean   :1.240e+08   Mean   : 24858.1  
                    3rd Qu.:8.141e+07   3rd Qu.: 41313.3  
                    Max.   :1.371e+09   Max.   :101450.0  
                    NA's   :83          NA's   :509       
str(olympics)
'data.frame':   11464 obs. of  14 variables:
 $ nationality   : chr  "AFG" "AFG" "AFG" "ALB" ...
 $ id            : int  103254143 289057786 152408417 539021692 103773001 324317073 345441615 915002256 997380920 690873472 ...
 $ name          : chr  "Kamia Yousufi" "Mohammad Tawfiq Bakhshi" "Abdul Wahab Zahiri" "Nikol Merizaj" ...
 $ sex           : chr  "female" "male" "male" "female" ...
 $ dob           : chr  "5/20/96" "3/11/86" "5/27/92" "8/7/98" ...
 $ height        : num  1.65 1.81 1.75 1.8 1.6 1.95 1.7 1.59 1.93 1.9 ...
 $ weight        : int  55 99 68 65 52 86 69 45 87 100 ...
 $ sport         : chr  "athletics" "judo" "athletics" "aquatics" ...
 $ gold          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ silver        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bronze        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ country       : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Albania" ...
 $ population    : int  32526562 32526562 32526562 2889167 2889167 2889167 2889167 2889167 2889167 39666519 ...
 $ gdp_per_capita: num  594 594 594 3945 3945 ...
olympics$dob <- dmy(format(as.Date(olympics$dob, format ="%m/%d/%y") ,"%d-%m-%y") )
olympics$sex <- as.factor(olympics$sex)
olympics$nationality <- as.factor(olympics$nationality)
olympics$country <- as.factor(olympics$country)

Tidy & Manipulate Data I

Inorder to tidy up the dataset removed the insignificant columns from the dataset. Using subset function,gdp_per_capita was removed from the dataset. On further analysis it’s found that the dataset doesn’t need any structural reformation.

head(olympics)
olympics <- subset(olympics,select = -c(gdp_per_capita))

Tidy & Manipulate Data II

Created a new column (Total_medals) to display the total number of medals received by an athlete by summing up the gold,silver and bronze medal reveived by each athlete using mutate function. Another column, population_interval was created inorder to have a better understanding of the variable population using ‘mutate’ and ‘case_when’ each population were categorised to form the interval. Then using factor() function, the variable ‘population_interval’ was categorised.

The stucture of the tidied dataset was checked using str() function.

olympics <- olympics %>% mutate(Total_Medals=gold+silver+bronze)
olympics <- olympics %>% mutate (population_interval=case_when(
  population>0 & population <50000000 ~"1",
  population>=50000000 & population <100000000 ~"2",
  population>=100000000& population <150000000 ~"3",
  population>=150000000& population <200000000 ~"4",
  population>=200000000& population <250000000 ~"5",
  population>=250000000& population <300000000 ~"6",
  population>=300000000& population <350000000 ~"7",
  population>=350000000& population <2000000000 ~"8" ))
olympics$population_interval <-factor(olympics$population_interval, 
                                      levels = c("1","2","3","4","5","6","7","8"),
                                      labels =c("0-50M","50M-100M","100M-150M","150M-200M",
                                                "200M-250M","250M-300M","300M-350M","350M+"),
                                      ordered = TRUE)
str(olympics)
'data.frame':   11464 obs. of  15 variables:
 $ nationality        : Factor w/ 199 levels "AFG","ALB","ALG",..: 1 1 1 2 2 2 2 2 2 3 ...
 $ id                 : int  103254143 289057786 152408417 539021692 103773001 324317073 345441615 915002256 997380920 690873472 ...
 $ name               : chr  "Kamia Yousufi" "Mohammad Tawfiq Bakhshi" "Abdul Wahab Zahiri" "Nikol Merizaj" ...
 $ sex                : Factor w/ 2 levels "female","male": 1 2 2 1 1 2 2 1 2 2 ...
 $ dob                : Date, format: "1996-05-20" "1986-03-11" "1992-05-27" "1998-08-07" ...
 $ height             : num  1.65 1.81 1.75 1.8 1.6 1.95 1.7 1.59 1.93 1.9 ...
 $ weight             : int  55 99 68 65 52 86 69 45 87 100 ...
 $ sport              : chr  "athletics" "judo" "athletics" "aquatics" ...
 $ gold               : int  0 0 0 0 0 0 0 0 0 0 ...
 $ silver             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bronze             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ country            : Factor w/ 199 levels "Afghanistan",..: 1 1 1 2 2 2 2 2 2 3 ...
 $ population         : int  32526562 32526562 32526562 2889167 2889167 2889167 2889167 2889167 2889167 39666519 ...
 $ Total_Medals       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ population_interval: Ord.factor w/ 8 levels "0-50M"<"50M-100M"<..: 1 1 1 1 1 1 1 1 1 1 ...

Scan I

In this step we scanned our dataset in order to find out the missing values present in it. We used the ColSums() function and found that a total of 5 columns had missing values present in them. In order to spot the locations of values such as Infinite, NaN & NAs we introduced a user defined function, is.nullcheck() and spotted them. It’s found that all attributes except height and weight had very few missing values (less than 5% of the total samples). To deal with null values in height column and weight column We filtered the dataset into olympics_m and olympics_f using filter() function. Then checked the normality of these values using qqPlot against both gender, even though it’s not necessary according to central limit theorem. Based on the result of qqPlot, NAs in height column was replaced with respective mean value of the height against gender using the mutate() function and group_by() function as height follows a normal distribution. Since the weight distribution was right skewed, the NA values in weight column was replaced with respective median value of the weight against each gender using the properties of mutate and group_by function. Once these substitutions were done then we removed all the other NA values in all the other fields using na.omit() function. At last we checked for NAs using ColSums() function to confirm that all NAs were removed from the dataset.

par(mfrow=c(1,2))
colSums(is.na(olympics))
        nationality                  id                name                 sex                 dob 
                  0                   0                   0                   0                   1 
             height              weight               sport                gold              silver 
                325                 654                   0                   0                   0 
             bronze             country          population        Total_Medals population_interval 
                  0                   0                  83                   0                  83 
is.nullcheck <- function(x){(is.infinite(x) | is.nan(x) | is.na(x))}
which(sapply(olympics$height, is.nullcheck))
  [1]   111   398   423   433   434   519   604   610   650   674   675   689   752   787   942   943   948
 [18]   949   952   954   957   958   959   961   965   972   974   975   976   977   978  1273  1275  1277
 [35]  1278  1279  1280  1281  1282  1284  1350  1411  1423  1432  1435  1500  1670  1673  1765  1781  1783
 [52]  1785  1793  1800  1801  1862  2129  2196  2200  2694  2695  2696  2794  2819  2877  3075  3443  3465
 [69]  3480  3499  3539  3540  3542  3543  3544  3546  3548  3549  3551  3552  3553  3556  3742  3923  3936
 [86]  3959  3972  3973  3987  3991  4003  4049  4484  4485  4486  4487  5351  5352  5353  5354  5355  5356
[103]  5360  5361  5363  5428  5459  5460  5461  5463  5464  5465  5505  5507  5509  5511  5778  5789  5797
[120]  5817  5839  5847  5852  5861  5875  5884  6111  6112  6113  6116  6141  6182  6431  6443  6461  6617
[137]  6884  6970  6976  6979  6993  7022  7037  7135  7200  7246  7273  7309  7310  7312  7313  7315  7316
[154]  7317  7319  7323  7324  7325  7327  7328  7329  7330  7335  7336  7337  7535  7536  7537  7538  7723
[171]  7738  7751  8072  8075  8113  8115  8116  8117  8118  8119  8120  8121  8122  8123  8124  8125  8126
[188]  8128  8129  8130  8132  8134  8135  8136  8137  8139  8140  8141  8142  8143  8144  8145  8148  8149
[205]  8150  8151  8152  8153  8154  8156  8157  8158  8161  8164  8166  8167  8168  8169  8170  8383  8389
[222]  8392  8447  8457  8458  8459  8462  8476  8517  8571  8652  8682  8689  8853  9067  9095  9146  9232
[239]  9237  9267  9294  9300  9320  9335  9342  9399  9420  9426  9453  9455  9456  9457  9459  9463  9464
[256]  9466  9467  9525  9532  9533  9534  9535  9605  9606  9607  9608  9835  9836  9837  9838  9839 10063
[273] 10064 10066 10069 10070 10151 10152 10153 10155 10226 10228 10232 10233 10278 10295 10419 10421 10422
[290] 10423 10424 10425 10426 10427 10428 10429 10430 10431 10432 10433 10434 10435 10436 10437 10439 10440
[307] 10441 10467 10545 10600 10791 10905 10985 11106 11144 11148 11149 11235 11256 11304 11335 11383 11416
[324] 11418 11426
olympics_m <- olympics %>% filter(sex=="male")
olympics_f <- olympics %>% filter(sex=="female")
qqPlot(olympics_m$height,dist="norm",main=" Male Height")
[1] 5907 3033
qqPlot(olympics_f$height,dist="norm",main="Female Height")
[1] 3087  653

olympics <- olympics %>% group_by(country) %>% group_by(sex) %>% 
  mutate(height=ifelse(is.na(height),mean(height,na.rm = TRUE),(height)))
which(sapply(olympics$height, is.nullcheck))
integer(0)
which(sapply(olympics$weight, is.nullcheck))
  [1]    20    29    45    47    62    64    67    75   111   182   198   236   237   279   295   346   351
 [18]   352   353   372   398   423   433   434   501   511   512   519   542   584   590   604   606   610
 [35]   650   674   675   689   752   787   801   890   892   904   911   917   918   923   925   930   933
 [52]   939   942   943   948   949   950   952   954   957   958   959   961   963   965   971   972   974
 [69]   975   976   977   978  1140  1177  1228  1273  1275  1277  1278  1279  1280  1281  1282  1284  1306
 [86]  1350  1352  1397  1411  1423  1432  1433  1435  1500  1522  1628  1653  1668  1670  1673  1757  1758
[103]  1765  1781  1783  1785  1793  1800  1801  1823  1826  1838  1862  1866  1921  1961  2093  2129  2196
[120]  2200  2208  2258  2300  2336  2361  2454  2468  2505  2538  2614  2625  2649  2658  2662  2673  2682
[137]  2687  2690  2694  2696  2718  2781  2819  2826  2830  2842  2867  2877  2956  2960  2973  2989  2994
[154]  3011  3014  3046  3048  3049  3057  3075  3077  3359  3375  3383  3392  3397  3402  3407  3441  3443
[171]  3445  3465  3480  3499  3510  3523  3539  3540  3542  3543  3544  3546  3548  3549  3551  3552  3553
[188]  3556  3564  3719  3742  3923  3936  3959  3991  4003  4049  4051  4075  4080  4096  4131  4217  4267
[205]  4268  4293  4317  4339  4388  4435  4475  4484  4485  4486  4487  4510  4526  4547  4565  4597  4618
[222]  4628  4636  4642  4710  4739  4780  4810  4817  4832  4836  4847  4849  4957  4971  4983  5029  5152
[239]  5306  5351  5352  5353  5354  5355  5356  5361  5362  5363  5428  5459  5460  5461  5463  5464  5505
[256]  5507  5509  5511  5580  5722  5724  5778  5789  5797  5801  5817  5839  5847  5852  5855  5861  5862
[273]  5875  5884  5890  5958  5976  5977  5982  5995  6003  6008  6026  6046  6111  6112  6113  6116  6133
[290]  6141  6159  6165  6182  6265  6282  6382  6383  6389  6443  6461  6492  6498  6508  6553  6617  6845
[307]  6850  6855  6856  6864  6875  6877  6884  6895  6906  6909  6910  6925  6965  6970  6987  7003  7022
[324]  7037  7039  7090  7135  7136  7184  7194  7200  7222  7246  7273  7309  7310  7312  7313  7315  7316
[341]  7317  7319  7323  7324  7325  7327  7328  7329  7330  7335  7336  7337  7359  7392  7408  7427  7428
[358]  7434  7437  7438  7440  7447  7466  7535  7536  7537  7538  7568  7595  7604  7626  7657  7660  7668
[375]  7670  7683  7686  7693  7697  7701  7717  7723  7724  7737  7738  7745  7751  7756  7760  7820  7846
[392]  7888  8027  8035  8037  8049  8050  8052  8055  8061  8064  8078  8079  8081  8082  8083  8085  8091
[409]  8093  8095  8098  8102  8112  8113  8114  8115  8116  8117  8118  8119  8120  8121  8122  8123  8124
[426]  8125  8126  8127  8128  8129  8130  8131  8132  8133  8134  8135  8136  8137  8139  8140  8141  8142
[443]  8143  8144  8145  8146  8147  8148  8149  8150  8151  8152  8153  8154  8155  8156  8157  8158  8160
[460]  8161  8162  8163  8164  8165  8166  8167  8168  8169  8170  8171  8172  8383  8389  8392  8400  8448
[477]  8452  8457  8458  8459  8462  8469  8476  8517  8558  8571  8595  8652  8682  8689  8849  8853  8909
[494]  8917  8927  9067  9095  9129  9146  9188  9200  9203  9232  9235  9237  9241  9267  9284  9294  9300
[511]  9305  9316  9320  9335  9342  9367  9384  9399  9419  9420  9426  9453  9455  9456  9457  9459  9463
[528]  9464  9466  9467  9494  9525  9532  9534  9605  9606  9607  9608  9835  9836  9837  9838  9839 10052
[545] 10063 10088 10093 10097 10111 10127 10134 10139 10148 10149 10150 10151 10152 10153 10155 10181 10183
[562] 10189 10217 10220 10226 10228 10232 10233 10240 10251 10269 10278 10280 10312 10326 10360 10389 10404
[579] 10406 10419 10421 10422 10423 10424 10425 10426 10427 10428 10429 10430 10431 10432 10433 10434 10435
[596] 10436 10437 10438 10439 10440 10441 10467 10470 10518 10545 10580 10600 10620 10622 10628 10750 10791
[613] 10837 10846 10905 10958 10960 10978 10985 10996 11039 11106 11144 11148 11149 11231 11235 11251 11253
[630] 11256 11257 11260 11261 11264 11268 11276 11282 11294 11304 11311 11325 11326 11332 11335 11338 11349
[647] 11372 11373 11379 11383 11386 11416 11418 11426
qqPlot(olympics_m$weight,dist="norm",main="Male Weight")
[1] 3268 4987
qqPlot(olympics_f$weight,dist="norm",main="Female Weight")
[1] 1346 3740

olympics <- olympics %>% group_by(country) %>%group_by(sex) %>% 
  mutate(weight=ifelse(is.na(weight),median(weight,na.rm = TRUE),(weight)))
which(sapply(olympics$weight, is.nullcheck))
integer(0)
colSums(is.na(olympics))
        nationality                  id                name                 sex                 dob 
                  0                   0                   0                   0                   1 
             height              weight               sport                gold              silver 
                  0                   0                   0                   0                   0 
             bronze             country          population        Total_Medals population_interval 
                  0                   0                  83                   0                  83 
olympics <- na.omit(olympics)
colSums(is.na(olympics))
        nationality                  id                name                 sex                 dob 
                  0                   0                   0                   0                   0 
             height              weight               sport                gold              silver 
                  0                   0                   0                   0                   0 
             bronze             country          population        Total_Medals population_interval 
                  0                   0                   0                   0                   0 

Scan II

To deal with the outliers, if any, in height and weight columns for each gender, . Boxplots for both the genders were plotted for corresponding height and weight. The presence of outliers were identified in all the 4 plots through analysis. We used capping(winsorising) method to deal with the outliers. We replaced the values that lie outside the outlier fence with lower and upper outlier values respectively. For further analysis of the whole data set, we combined olympics_m and olympics_f into a new data set ‘olympics_final’ using rbind() function.

#Male 
par(mfrow=c(1,2))
boxplot(olympics_m$height,main="Male Height DIstribution",ylab="Height(M)",col = "cyan")
IQR <- IQR(olympics_m$height, na.rm = TRUE) 
q1 <- quantile(olympics_m$height, .25, na.rm = TRUE) 
q3 <- quantile(olympics_m$height, .75, na.rm = TRUE)
benchq1 <-  (q1-1.5 * IQR )
benchq3 <-  (q3+1.5 * IQR )
olympics_m$height[olympics_m$height > benchq3] <- benchq3
olympics_m$height[olympics_m$height < benchq1] <- benchq1
boxplot(olympics_m$height,main="Male Height DIst. (Handled Outliers)",ylab="Height(M)",col = "cyan")

boxplot.stats(olympics_m$height)$out
numeric(0)
boxplot(olympics_m$weight,main="Male Weight DIstribution",ylab="Weight(kg)",col = "cyan")
IQR <- IQR(olympics_m$weight, na.rm = TRUE) 
q1 <- quantile(olympics_m$weight, .25, na.rm = TRUE) 
q3 <- quantile(olympics_m$weight, .75, na.rm = TRUE)
benchq1 <-  (q1-1.5 * IQR )
benchq3 <-  (q3+1.5 * IQR )
olympics_m$weight[olympics_m$weight > benchq3] <- benchq3
olympics_m$weight[olympics_m$weight < benchq1] <- benchq1
boxplot(olympics_m$weight,main="Male Weight DIst. (Handled Outliers)",ylab="Weight(kg)",col = "cyan")

boxplot.stats(olympics_m$height)$out
numeric(0)
#Female
boxplot(olympics_f$height,main="Female Height DIstribution",ylab="Height(M)",col = "deeppink")
IQR <- IQR(olympics_f$height, na.rm = TRUE) 
q1 <- quantile(olympics_f$height, .25, na.rm = TRUE) 
q3 <- quantile(olympics_f$height, .75, na.rm = TRUE)
benchq1 <-  (q1-1.5 * IQR )
benchq3 <-  (q3+1.5 * IQR )
olympics_f$height[olympics_f$height > benchq3] <- benchq3
olympics_f$height[olympics_f$height < benchq1] <- benchq1
boxplot(olympics_f$height,main="Male Height DIst. (Handled Outliers)",ylab="Height(M)",col = "deeppink")

boxplot.stats(olympics_f$height)$out
numeric(0)
boxplot(olympics_f$weight,main="Female Weight DIstribution",ylab="Weight(kg)",col = "deeppink")
IQR <- IQR(olympics_f$weight, na.rm = TRUE) 
q1 <- quantile(olympics_f$weight, .25, na.rm = TRUE) 
q3 <- quantile(olympics_f$weight, .75, na.rm = TRUE)
benchq1 <-  (q1-1.5 * IQR )
benchq3 <-  (q3+1.5 * IQR )
olympics_f$weight[olympics_f$weight > benchq3] <- benchq3
olympics_f$weight[olympics_f$weight < benchq1] <- benchq1
boxplot(olympics_f$weight,main="Female Weight Dist. (Handled Outliers)",ylab="Weight(kg)",col = "deeppink")

boxplot.stats(olympics_f$weight)$out
numeric(0)
olympics_final <- rbind(olympics_m,olympics_f)
str(olympics_final)
'data.frame':   11464 obs. of  15 variables:
 $ nationality        : Factor w/ 199 levels "AFG","ALB","ALG",..: 1 1 2 2 2 3 3 3 3 3 ...
 $ id                 : int  289057786 152408417 324317073 345441615 997380920 690873472 268626951 545134894 133974151 218421111 ...
 $ name               : chr  "Mohammad Tawfiq Bakhshi" "Abdul Wahab Zahiri" "Izmir Smajlaj" "Briken Calja" ...
 $ sex                : Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ...
 $ dob                : Date, format: "1986-03-11" "1992-05-27" "1993-03-29" "1990-02-19" ...
 $ height             : num  1.81 1.75 1.95 1.7 1.93 1.9 1.6 1.68 1.85 1.78 ...
 $ weight             : num  99 68 86 69 87 100 60 62 79 70 ...
 $ sport              : chr  "judo" "athletics" "athletics" "weightlifting" ...
 $ gold               : int  0 0 0 0 0 0 0 0 0 0 ...
 $ silver             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ bronze             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ country            : Factor w/ 199 levels "Afghanistan",..: 1 1 2 2 2 3 3 3 3 3 ...
 $ population         : int  32526562 32526562 2889167 2889167 2889167 39666519 39666519 39666519 39666519 39666519 ...
 $ Total_Medals       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ population_interval: Ord.factor w/ 8 levels "0-50M"<"50M-100M"<..: 1 1 1 1 1 1 1 1 1 1 ...

Transform

To analyze the body mass index (BMI) of the athletes, we introduced a new variable called BMI using the height and weight attributes. To check whether the BMI against each gender follows normality we used qqPlot and histogram. From these plots it’s observed that both genders follows right skewed normality. So, in order to make the distribution normal we had taken the logarithmic transformation of BMI on both the genders.

olympics_final <- olympics_final %>% mutate(BMI=weight/height^2)
olympics_m <- olympics_m %>% mutate(BMI=weight/height^2)
olympics_f <- olympics_f %>% mutate(BMI=weight/height^2)

qqPlot(olympics_m$BMI,dist="norm",main="Male BMI")
[1] 4695 6178

qqPlot(olympics_f$BMI,dist="norm",main = "Female BMI")
[1] 1588 3743
par(mfrow=c(1,2))

hist(olympics_final$BMI[olympics$sex=="male"],main = "Distribution of Male BMI",xlab = "BMI")
hist(olympics_final$BMI[olympics$sex=="female"],main = "Distribution of Female BMI",xlab = "BMI")

hist(log(olympics_final$BMI[olympics$sex=="male"]),main = "Male BMI (Log Transformation) ",xlab = "BMI with log transformation")
hist(log(olympics_final$BMI[olympics$sex=="female"]),main = "Female BMI (Log Transformation) ",xlab = "BMI with log transformation")



LS0tDQp0aXRsZTogIk1BVEgyMzQ5IFNlbWVzdGVyIDIsIDIwMTkiDQphdXRob3I6ICJTSElSSU4gSEFORUVGKFMzODAxODEzKSwgUkFIVUwgQURBUEFSQU1CSUwoUzM3NzkzNjcpICYgQUtBU0ggSk9ZKFMzODAzODQ4KSINCnN1YnRpdGxlOiBBc3NpZ25tZW50IDMNCm91dHB1dDoNCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIGRmX3ByaW50OiBwYWdlZA0KLS0tDQojIyBSZXF1aXJlZCBwYWNrYWdlcyANCg0KSW5zdGFsbGVkIGFuZCBsb2FkZWQgdGhlIG5lY2Vzc2FyeSBwYWNrYWdlcy4NCg0KYGBge3J9DQpsaWJyYXJ5KHJlYWRyKSAjIHVzZWZ1bCBmb3IgaW1wb3J0aW5nIGRhdGENCmxpYnJhcnkobWFncml0dHIpICN1c2VmdWwgZm9yIHBpcGUgb3BlcmF0b3INCmxpYnJhcnkodGlkeXIpICN1c2VmdWwgZm9yIHRpZHlpbmcgZGF0YQ0KbGlicmFyeShkcGx5cikgI3VzZWZ1bCBmb3IgZGF0YSBtYW5pcHVsYXRpb24NCmxpYnJhcnkoSG1pc2MpICN0byByZXBsYWNlIHRoZSBtaXNzaW5nIHZhbHVlcw0KbGlicmFyeShvdXRsaWVycykgI3VzZWZ1bCBmb3IgZmluZGluZyB0aGUgb3V0bGllcnMNCmxpYnJhcnkobHVicmlkYXRlKSAjdXNlZnVsIGZvciBkYXRlIHRyYW5zZm9ybWF0aW9uDQpsaWJyYXJ5KGNhcikgIyB1c2VmdWwgZm9yIHBsb3R0aW5nIHFxUGxvdA0KYGBgDQoNCg0KIyMgRXhlY3V0aXZlIFN1bW1hcnkgDQoNCk9iamVjdGl2ZSBvZiB0aGlzIHJlcG9ydCBpcyB0byBmaW5kIGFuIG9wZW4gZGF0YSBzZXQgd2l0aCBjcmVhdGl2ZSBjb21tb25zIGxpY2VuY2UgYW5kIGFwcGx5IHRoZSB2YXJpb3VzIGRhdGEgcHJlcHJvY2Vzc2luZyBjb25jZXB0cyBhY3F1aXJlZCB0aHJvdWdoIERhdGEgUHJlcHJvY2Vzc2luZyBjb3Vyc2UuIFRoZSBzZXF1ZW5jZSBvZiBzdGVwcyBmb2xsb3dlZCBmb3IgZGF0YSBwcmVwcm9jZXNzaW5nIGlzIGFzIGZvbGxvd3MuDQoNCiogIFRoZSBDU1YgZGF0YWZpbGVzICggYXRobGV0ZXMgJiBjb3VudHJpZXMgKSB3ZXJlIGltcG9ydGVkIGJ5IHVzaW5nIGJhc2UgUiBmdW5jdGlvbiAuDQoqICBUaGUgdHdvIGRhdGFzZXRzIHdlcmUgbWVyZ2VkIHRvIG9seW1waWNzIGRhdGFzZXQgdXNpbmcgbWVyZ2UoKSBmdW5jdGlvbiBiYXNlZCBvbiB0aGUgY29tbW9uIGF0dHJpYnV0ZS4gDQoqICBDaGVja2VkIGFuZCBlbnN1cmVkIHRoZSBjb3JyZWN0bmVzcyBvZiBvbHltcGljcyBkYXRhIHR5cGVzICwgcGVyZm9ybWVkIHRoZSBjb3JyZWN0bmVzcyBvZiBkYXRhIHR5cGVzIHdoZXJldmVyIGl0IHdhcyByZXF1aXJlZCBsaWtlLCBkYXRlIG9mIGJpcnRoIChkb2IpIGRhdGEgdHlwZSBmcm9tIGNoYXIgdG8gZGF0ZS4gTmF0aW9uYWxpdHksc3BvcnQgYW5kIHNleCBmaWVsZHMgd2hpY2ggd2VyZSBpbiBjaGFyIHdhcyBjb252ZXJ0ZWQgdG8gZmFjdG9yIHZhcmlhYmxlcy4gDQoqICBVc2luZyBtdXRhdGUoKSBmdW5jdGlvbiwgaW50cm9kdWNlZCAyIG5ldyBmaWVsZHMgdG8gb2x5bXBpY3MgZGF0YXNldCBuYW1lbHkgVG90YWxfbWVkYWxzIGFuZCBwb3B1bGF0aXBuX2ludGVydmFsIGZvciBiZXR0ZXIgdW5kZXJzdGFuZGluZyBvZiBkYXRhIHNldC4gQWxzbyBsYWJlbGxlZCBhbmQgb3JkZXJlZCB0aGUgZmFjdG9yZWQgdmFyaWFibGUgJ1BvcHVsYXRpb25faW50ZXJ2YWwnDQoqICBDaGVja2VkIGZvciB0aGUgbnVsbCB2YWx1ZXMgaW4gZWFjaCBjb2x1bW4gYW5kIHRvb2sgbmVjZXNzYXJ5IGFjdGlvbiB0byBkZWFsIHdpdGggbnVsbCB2YWx1ZXMgYWNjb3JkaW5nIHRvIHRoZSBkYXRhdHlwZSBhbmQgbm9ybWFsaXR5IG9mIGVhY2ggYXR0cmlidXRlLiANCiogIE91dGxpZXJzIHdlcmUgaWRlbnRpZmllZCBmcm9tIEJveHBsb3QgYW5kIGFwcHJvcHJpYXRlIG1lYXN1cmVzIHdlcmUgdGFrZW4gdG8gZGVhbCB3aXRoIHRob3NlIG91dGxpZXJzDQoqICBJbnRyb2R1Y2VkIG5ldyBjb2x1bW4sIEJNSShCb2R5IE1hc3MgSW5kZXgpIG9mIGVhY2ggYXRobGV0ZSB1c2luZyBoZWlnaHQgYW5kIHdlaWdodCBjb2x1bW4uIA0KKiAgQ2hlY2tlZCB0aGUgZGlzdHJpYnV0aW9uIG9mIG5ld2x5IGNyZWF0ZWQgY29sdW1uIGFuZCBwZXJmb3JtZWQgbG9nYXJpdGhtaWMgdHJhbnNmb3JtYXRpb24gYXMgdGhlIGRpc3RyaWJ1dGlvbiB3YXMgcmlnaHQgc2tld2VkLiAgICANCiANCg0KIyMgRGF0YSANCg0KVGhlIGRhdGEgc2V0IGNvbnRhaW5zIHRoZSBvZmZpY2lhbCBkZXRhaWxzIG9mIDExNTM4IGF0aGxldGVzIGNvbXBldGVkIGluIDIwMTYgT2x5bXBpY3MgR2FtZXMgaW4gUmlvIGRlIEphbmVpcm8gYW5kIHRoZWlyIHJlc3BlY3RpdmUgY291bnRyaWVzLkNvbGxlY3RlZCBkYXRhc2V0IGZyb20ga2FnZ2xlKCBodHRwczovL3d3dy5rYWdnbGUuY29tL3JpbzIwMTYvb2x5bXBpYy1nYW1lcyApDQogDQpDb25zaWRlcmVkIHR3byBkYXRhIGZpbGVzIOKAmGF0aGxldGVzLmNzduKAmSB3aXRoIDExIGNvbHVtbnMgYW5kIOKAmGNvdW50cmllcy5jc3bigJkgd2l0aCA0IGNvbHVtbnMuIA0KDQpUaGUgYXRobGV0ZXMgZGF0YSBzZXQgY29udGFpbnMgZm9sbG93aW5nIGNvbHVtbnMsDQppZDogQXRobGV0ZSBJRA0KbmFtZTogQXRobGV0ZSBuYW1lDQpuYXRpb25hbGl0eTogSU9DIGNvdW50cnkgY29kZSBvZiBBdGhsZXRlDQpzZXg6IEF0aGxldGUgZ2VuZGVyDQpkb2I6IEF0aGxldGUgZGF0ZSBvZiBiaXJ0aA0KaGVpZ2h0OiBBdGhsZXRlIGhlaWdodA0Kd2VpZ2h0OiBBdGhsZXRlIHdlaWdodA0Kc3BvcnQ6IFRoZSBldmVudCBpbiB3aGljaCBhdGhsZXRlIGNvbXBldGVzDQpnb2xkOiBOdW1iZXIgb2YgZ29sZCBtZWRhbA0Kc2lsdmVyOiBOdW1iZXIgb2Ygc2lsdmVyIG1lZGFsDQpicm9uemU6IE51bWJlciBvZiBicm9uemUgbWVkYWwNCg0KVGhlIGNvdW50cmllcyB0YWJsZSBjb250YWlucyB0aGUgZm9sbG93aW5nIGF0dHJpYnV0ZXMsDQpDb3VudHJ5OiBDb3VudHJ5DQpDb2RlOiBJT0MgQ291bnRyeSBjb2RlDQpQb3B1bGF0aW9uOiBUb3RhbCBwb3B1bGF0aW9uIG9mIGNvdW50cnkNCmdkcF9wZXJfY2FwaXRhOiBHRFAgcGVyIGNhcGl0YSBvZiB0aGUgY291bnRyeQ0KDQpJbXBvcnRlZCB0aGUgZGF0YXNldCB1c2luZyBiYXNlIFIgZnVuY3Rpb24gYW5kIHJlc3RyaWN0ZWQgdGhlIGF1dG8gY29udmVyc2lvbiBvZiBjaGFyYWN0ZXJzIHRvIHN0cmluZ3MgLg0KVXNpbmcgbWVyZ2UoKSBmdW5jdGlvbiBhdGhsZXRlcyB0YWJsZSB3YXMgam9pbmVkIHdpdGggY291bnRyaWVzIHRhYmxlIHRvIGZvcm0gb2x5bXBpY3MgZGF0YXNldCBiYXNlZCBvbiB0aGUgY29tbW9uIGF0dHJpYnV0ZSBpZSxjb3VudHJ5IElPQyBjb2RlIGFuZCBkaXNwbGF5ZWQgdGhlIGZpcnN0IGZldyByb3dzIHVzaW5nIGhlYWQgZnVuY3Rpb24uDQpgYGB7cn0NCmF0aGxldGVzIDwtIHJlYWQuY3N2KCJhdGhsZXRlcy5jc3YiLHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkNCmhlYWQoYXRobGV0ZXMpDQpjb3VudHJpZXMgPC0gcmVhZC5jc3YoImNvdW50cmllcy5jc3YiLHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkNCmhlYWQoY291bnRyaWVzKQ0Kb2x5bXBpY3MgPC0gbWVyZ2UoYXRobGV0ZXMsY291bnRyaWVzLGJ5LnggPSAibmF0aW9uYWxpdHkiLGJ5LnkgPSAiY29kZSIpDQpoZWFkKG9seW1waWNzKQ0KYGBgDQoNCiMjIFVuZGVyc3RhbmQgDQoNClN1bW1hcmlzZWQgdGhlIHR5cGVzIG9mIHZhcmlhYmxlcyBhbmQgb3RoZXIgc3RhdGlzdGljcyB1c2luZyAnc3VtbWFyaXNlKCknIGZ1bmN0aW9uLiBUaGUgZGF0YSBzdHVjdHVyZSBvZiBlYWNoIHZhcmlhYmxlcyB3ZXJlIGZvdW5kIHVzaW5nICdzdHIoKScgZnVuY3Rpb24uIA0KRm91bmQgdGhhdCBjZXJ0YWluIGRhdGF0eXBlcyB3ZXJlIGNhcHR1cmVkIGluY29ycmVjdGx5IGFuZCBwZXJmb3JtZWQgcHJvcGVyIGRhdGF0eXBlIGNvbnZlcnNpb24gb24gZG9iKGNoYXIgdG8gZGF0ZSlhbmQgZmFjdG9yaXNlZCBzZXgsbmF0aW9uYWxpdHkgYW5kIGNvdW50cnkgdmFyaWFibGVzLiANCmBgYHtyfQ0Kc3VtbWFyeShvbHltcGljcykNCnN0cihvbHltcGljcykNCm9seW1waWNzJGRvYiA8LSBkbXkoZm9ybWF0KGFzLkRhdGUob2x5bXBpY3MkZG9iLCBmb3JtYXQgPSIlbS8lZC8leSIpICwiJWQtJW0tJXkiKSApDQpvbHltcGljcyRzZXggPC0gYXMuZmFjdG9yKG9seW1waWNzJHNleCkNCm9seW1waWNzJG5hdGlvbmFsaXR5IDwtIGFzLmZhY3RvcihvbHltcGljcyRuYXRpb25hbGl0eSkNCm9seW1waWNzJGNvdW50cnkgPC0gYXMuZmFjdG9yKG9seW1waWNzJGNvdW50cnkpDQpgYGANCg0KIyMJVGlkeSAmIE1hbmlwdWxhdGUgRGF0YSBJIA0KDQpJbm9yZGVyIHRvIHRpZHkgdXAgdGhlIGRhdGFzZXQgcmVtb3ZlZCB0aGUgaW5zaWduaWZpY2FudCBjb2x1bW5zIGZyb20gdGhlIGRhdGFzZXQuIFVzaW5nIHN1YnNldCBmdW5jdGlvbixnZHBfcGVyX2NhcGl0YSB3YXMgcmVtb3ZlZCBmcm9tIHRoZSBkYXRhc2V0LiBPbiBmdXJ0aGVyIGFuYWx5c2lzIGl0J3MgZm91bmQgdGhhdCB0aGUgZGF0YXNldCBkb2Vzbid0IG5lZWQgYW55IHN0cnVjdHVyYWwgcmVmb3JtYXRpb24uDQoNCmBgYHtyfQ0KaGVhZChvbHltcGljcykNCm9seW1waWNzIDwtIHN1YnNldChvbHltcGljcyxzZWxlY3QgPSAtYyhnZHBfcGVyX2NhcGl0YSkpDQpgYGANCg0KIyMJVGlkeSAmIE1hbmlwdWxhdGUgRGF0YSBJSSANCkNyZWF0ZWQgIGEgbmV3IGNvbHVtbiAoVG90YWxfbWVkYWxzKSB0byBkaXNwbGF5IHRoZSB0b3RhbCBudW1iZXIgb2YgbWVkYWxzIHJlY2VpdmVkIGJ5IGFuIGF0aGxldGUgYnkgc3VtbWluZyB1cCB0aGUgZ29sZCxzaWx2ZXIgYW5kIGJyb256ZSBtZWRhbCByZXZlaXZlZCBieSBlYWNoIGF0aGxldGUgdXNpbmcgbXV0YXRlIGZ1bmN0aW9uLiBBbm90aGVyIGNvbHVtbiwgcG9wdWxhdGlvbl9pbnRlcnZhbCB3YXMgY3JlYXRlZCBpbm9yZGVyIHRvIGhhdmUgYSBiZXR0ZXIgdW5kZXJzdGFuZGluZyBvZiB0aGUgdmFyaWFibGUgcG9wdWxhdGlvbiB1c2luZyAnbXV0YXRlJyBhbmQgJ2Nhc2Vfd2hlbicgZWFjaCBwb3B1bGF0aW9uIHdlcmUgY2F0ZWdvcmlzZWQgdG8gZm9ybSB0aGUgaW50ZXJ2YWwuIFRoZW4gdXNpbmcgZmFjdG9yKCkgZnVuY3Rpb24sIHRoZSB2YXJpYWJsZSAncG9wdWxhdGlvbl9pbnRlcnZhbCcgd2FzIGNhdGVnb3Jpc2VkLg0KDQpUaGUgc3R1Y3R1cmUgb2YgdGhlIHRpZGllZCBkYXRhc2V0IHdhcyBjaGVja2VkIHVzaW5nIHN0cigpIGZ1bmN0aW9uLg0KYGBge3J9DQpvbHltcGljcyA8LSBvbHltcGljcyAlPiUgbXV0YXRlKFRvdGFsX01lZGFscz1nb2xkK3NpbHZlciticm9uemUpDQpvbHltcGljcyA8LSBvbHltcGljcyAlPiUgbXV0YXRlIChwb3B1bGF0aW9uX2ludGVydmFsPWNhc2Vfd2hlbigNCiAgcG9wdWxhdGlvbj4wICYgcG9wdWxhdGlvbiA8NTAwMDAwMDAgfiIxIiwNCiAgcG9wdWxhdGlvbj49NTAwMDAwMDAgJiBwb3B1bGF0aW9uIDwxMDAwMDAwMDAgfiIyIiwNCiAgcG9wdWxhdGlvbj49MTAwMDAwMDAwJiBwb3B1bGF0aW9uIDwxNTAwMDAwMDAgfiIzIiwNCiAgcG9wdWxhdGlvbj49MTUwMDAwMDAwJiBwb3B1bGF0aW9uIDwyMDAwMDAwMDAgfiI0IiwNCiAgcG9wdWxhdGlvbj49MjAwMDAwMDAwJiBwb3B1bGF0aW9uIDwyNTAwMDAwMDAgfiI1IiwNCiAgcG9wdWxhdGlvbj49MjUwMDAwMDAwJiBwb3B1bGF0aW9uIDwzMDAwMDAwMDAgfiI2IiwNCiAgcG9wdWxhdGlvbj49MzAwMDAwMDAwJiBwb3B1bGF0aW9uIDwzNTAwMDAwMDAgfiI3IiwNCiAgcG9wdWxhdGlvbj49MzUwMDAwMDAwJiBwb3B1bGF0aW9uIDwyMDAwMDAwMDAwIH4iOCIgKSkNCm9seW1waWNzJHBvcHVsYXRpb25faW50ZXJ2YWwgPC1mYWN0b3Iob2x5bXBpY3MkcG9wdWxhdGlvbl9pbnRlcnZhbCwgDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxldmVscyA9IGMoIjEiLCIyIiwiMyIsIjQiLCI1IiwiNiIsIjciLCI4IiksDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxhYmVscyA9YygiMC01ME0iLCI1ME0tMTAwTSIsIjEwME0tMTUwTSIsIjE1ME0tMjAwTSIsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAiMjAwTS0yNTBNIiwiMjUwTS0zMDBNIiwiMzAwTS0zNTBNIiwiMzUwTSsiKSwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgb3JkZXJlZCA9IFRSVUUpDQpzdHIob2x5bXBpY3MpDQpgYGANCg0KIyMJU2NhbiBJIA0KSW4gdGhpcyBzdGVwIHdlIHNjYW5uZWQgb3VyIGRhdGFzZXQgaW4gb3JkZXIgdG8gZmluZCBvdXQgdGhlIG1pc3NpbmcgdmFsdWVzIHByZXNlbnQgaW4gaXQuIFdlIHVzZWQgdGhlIENvbFN1bXMoKSBmdW5jdGlvbiBhbmQgZm91bmQgdGhhdCBhIHRvdGFsIG9mIDUgY29sdW1ucyBoYWQgbWlzc2luZyB2YWx1ZXMgcHJlc2VudCBpbiB0aGVtLiBJbiBvcmRlciB0byBzcG90IHRoZSBsb2NhdGlvbnMgb2YgdmFsdWVzIHN1Y2ggYXMgSW5maW5pdGUsIE5hTiAmIE5BcyB3ZSBpbnRyb2R1Y2VkIGEgdXNlciBkZWZpbmVkIGZ1bmN0aW9uLCBpcy5udWxsY2hlY2soKSBhbmQgc3BvdHRlZCB0aGVtLiAgIEl04oCZcyBmb3VuZCB0aGF0IGFsbCBhdHRyaWJ1dGVzIGV4Y2VwdCBoZWlnaHQgYW5kIHdlaWdodCBoYWQgdmVyeSBmZXcgbWlzc2luZyB2YWx1ZXMgKGxlc3MgdGhhbiA1JSBvZiB0aGUgdG90YWwgc2FtcGxlcykuIFRvIGRlYWwgd2l0aCBudWxsIHZhbHVlcyBpbiBoZWlnaHQgY29sdW1uIGFuZCB3ZWlnaHQgY29sdW1uIFdlIGZpbHRlcmVkIHRoZSBkYXRhc2V0IGludG8gb2x5bXBpY3NfbSBhbmQgb2x5bXBpY3NfZiB1c2luZyBmaWx0ZXIoKSBmdW5jdGlvbi4gVGhlbiBjaGVja2VkIHRoZSBub3JtYWxpdHkgb2YgdGhlc2UgdmFsdWVzIHVzaW5nIHFxUGxvdCBhZ2FpbnN0IGJvdGggZ2VuZGVyLCBldmVuIHRob3VnaCBpdOKAmXMgbm90IG5lY2Vzc2FyeSBhY2NvcmRpbmcgdG8gY2VudHJhbCBsaW1pdCB0aGVvcmVtLiAgQmFzZWQgb24gdGhlIHJlc3VsdCBvZiBxcVBsb3QsIE5BcyBpbiBoZWlnaHQgY29sdW1uIHdhcyByZXBsYWNlZCB3aXRoIHJlc3BlY3RpdmUgbWVhbiB2YWx1ZSBvZiB0aGUgaGVpZ2h0IGFnYWluc3QgZ2VuZGVyIHVzaW5nIHRoZSBtdXRhdGUoKSBmdW5jdGlvbiBhbmQgZ3JvdXBfYnkoKSBmdW5jdGlvbiBhcyBoZWlnaHQgZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24uIFNpbmNlIHRoZSB3ZWlnaHQgZGlzdHJpYnV0aW9uIHdhcyByaWdodCBza2V3ZWQsIHRoZSBOQSB2YWx1ZXMgaW4gd2VpZ2h0IGNvbHVtbiB3YXMgcmVwbGFjZWQgd2l0aCByZXNwZWN0aXZlIG1lZGlhbiB2YWx1ZSBvZiB0aGUgd2VpZ2h0IGFnYWluc3QgZWFjaCBnZW5kZXIgdXNpbmcgdGhlIHByb3BlcnRpZXMgb2YgbXV0YXRlIGFuZCBncm91cF9ieSBmdW5jdGlvbi4gT25jZSB0aGVzZSBzdWJzdGl0dXRpb25zIHdlcmUgZG9uZSB0aGVuIHdlIHJlbW92ZWQgYWxsIHRoZSBvdGhlciBOQSB2YWx1ZXMgaW4gYWxsIHRoZSBvdGhlciBmaWVsZHMgdXNpbmcgbmEub21pdCgpIGZ1bmN0aW9uLiBBdCBsYXN0IHdlIGNoZWNrZWQgZm9yIE5BcyB1c2luZyBDb2xTdW1zKCkgZnVuY3Rpb24gdG8gY29uZmlybSB0aGF0ICBhbGwgTkFzIHdlcmUgcmVtb3ZlZCBmcm9tIHRoZSBkYXRhc2V0Lg0KDQpgYGB7cn0NCnBhcihtZnJvdz1jKDEsMikpDQpjb2xTdW1zKGlzLm5hKG9seW1waWNzKSkNCmlzLm51bGxjaGVjayA8LSBmdW5jdGlvbih4KXsoaXMuaW5maW5pdGUoeCkgfCBpcy5uYW4oeCkgfCBpcy5uYSh4KSl9DQp3aGljaChzYXBwbHkob2x5bXBpY3MkaGVpZ2h0LCBpcy5udWxsY2hlY2spKQ0Kb2x5bXBpY3NfbSA8LSBvbHltcGljcyAlPiUgZmlsdGVyKHNleD09Im1hbGUiKQ0Kb2x5bXBpY3NfZiA8LSBvbHltcGljcyAlPiUgZmlsdGVyKHNleD09ImZlbWFsZSIpDQpxcVBsb3Qob2x5bXBpY3NfbSRoZWlnaHQsZGlzdD0ibm9ybSIsbWFpbj0iIE1hbGUgSGVpZ2h0IikNCnFxUGxvdChvbHltcGljc19mJGhlaWdodCxkaXN0PSJub3JtIixtYWluPSJGZW1hbGUgSGVpZ2h0IikNCm9seW1waWNzIDwtIG9seW1waWNzICU+JSBncm91cF9ieShjb3VudHJ5KSAlPiUgZ3JvdXBfYnkoc2V4KSAlPiUgDQogIG11dGF0ZShoZWlnaHQ9aWZlbHNlKGlzLm5hKGhlaWdodCksbWVhbihoZWlnaHQsbmEucm0gPSBUUlVFKSwoaGVpZ2h0KSkpDQp3aGljaChzYXBwbHkob2x5bXBpY3MkaGVpZ2h0LCBpcy5udWxsY2hlY2spKQ0Kd2hpY2goc2FwcGx5KG9seW1waWNzJHdlaWdodCwgaXMubnVsbGNoZWNrKSkNCnFxUGxvdChvbHltcGljc19tJHdlaWdodCxkaXN0PSJub3JtIixtYWluPSJNYWxlIFdlaWdodCIpDQpxcVBsb3Qob2x5bXBpY3NfZiR3ZWlnaHQsZGlzdD0ibm9ybSIsbWFpbj0iRmVtYWxlIFdlaWdodCIpDQpvbHltcGljcyA8LSBvbHltcGljcyAlPiUgZ3JvdXBfYnkoY291bnRyeSkgJT4lZ3JvdXBfYnkoc2V4KSAlPiUgDQogIG11dGF0ZSh3ZWlnaHQ9aWZlbHNlKGlzLm5hKHdlaWdodCksbWVkaWFuKHdlaWdodCxuYS5ybSA9IFRSVUUpLCh3ZWlnaHQpKSkNCndoaWNoKHNhcHBseShvbHltcGljcyR3ZWlnaHQsIGlzLm51bGxjaGVjaykpDQpjb2xTdW1zKGlzLm5hKG9seW1waWNzKSkNCm9seW1waWNzIDwtIG5hLm9taXQob2x5bXBpY3MpDQpjb2xTdW1zKGlzLm5hKG9seW1waWNzKSkNCmBgYA0KIyMJU2NhbiBJSQ0KDQpUbyBkZWFsIHdpdGggdGhlIG91dGxpZXJzLCBpZiBhbnksIGluIGhlaWdodCBhbmQgd2VpZ2h0IGNvbHVtbnMgZm9yIGVhY2ggZ2VuZGVyLCAuIEJveHBsb3RzIGZvciBib3RoIHRoZSBnZW5kZXJzIHdlcmUgcGxvdHRlZCBmb3IgY29ycmVzcG9uZGluZyBoZWlnaHQgYW5kIHdlaWdodC4gVGhlIHByZXNlbmNlIG9mIG91dGxpZXJzIHdlcmUgaWRlbnRpZmllZCBpbiBhbGwgdGhlIDQgcGxvdHMgdGhyb3VnaCBhbmFseXNpcy4gV2UgdXNlZCBjYXBwaW5nKHdpbnNvcmlzaW5nKSBtZXRob2QgdG8gZGVhbCB3aXRoIHRoZSBvdXRsaWVycy4gV2UgcmVwbGFjZWQgdGhlIHZhbHVlcyB0aGF0IGxpZSBvdXRzaWRlIHRoZSBvdXRsaWVyIGZlbmNlIHdpdGggbG93ZXIgYW5kIHVwcGVyIG91dGxpZXIgdmFsdWVzIHJlc3BlY3RpdmVseS4gRm9yIGZ1cnRoZXIgYW5hbHlzaXMgb2YgdGhlIHdob2xlIGRhdGEgc2V0LCB3ZSBjb21iaW5lZCBvbHltcGljc19tIGFuZCBvbHltcGljc19mIGludG8gYSBuZXcgZGF0YSBzZXQg4oCYb2x5bXBpY3NfZmluYWzigJkgdXNpbmcgcmJpbmQoKSBmdW5jdGlvbi4NCmBgYHtyfQ0KI01hbGUgDQpwYXIobWZyb3c9YygxLDIpKQ0KYm94cGxvdChvbHltcGljc19tJGhlaWdodCxtYWluPSJNYWxlIEhlaWdodCBESXN0cmlidXRpb24iLHlsYWI9IkhlaWdodChNKSIsY29sID0gImN5YW4iKQ0KSVFSIDwtIElRUihvbHltcGljc19tJGhlaWdodCwgbmEucm0gPSBUUlVFKSANCnExIDwtIHF1YW50aWxlKG9seW1waWNzX20kaGVpZ2h0LCAuMjUsIG5hLnJtID0gVFJVRSkgDQpxMyA8LSBxdWFudGlsZShvbHltcGljc19tJGhlaWdodCwgLjc1LCBuYS5ybSA9IFRSVUUpDQpiZW5jaHExIDwtICAocTEtMS41ICogSVFSICkNCmJlbmNocTMgPC0gIChxMysxLjUgKiBJUVIgKQ0Kb2x5bXBpY3NfbSRoZWlnaHRbb2x5bXBpY3NfbSRoZWlnaHQgPiBiZW5jaHEzXSA8LSBiZW5jaHEzDQpvbHltcGljc19tJGhlaWdodFtvbHltcGljc19tJGhlaWdodCA8IGJlbmNocTFdIDwtIGJlbmNocTENCmJveHBsb3Qob2x5bXBpY3NfbSRoZWlnaHQsbWFpbj0iTWFsZSBIZWlnaHQgRElzdC4gKEhhbmRsZWQgT3V0bGllcnMpIix5bGFiPSJIZWlnaHQoTSkiLGNvbCA9ICJjeWFuIikNCmJveHBsb3Quc3RhdHMob2x5bXBpY3NfbSRoZWlnaHQpJG91dA0KYm94cGxvdChvbHltcGljc19tJHdlaWdodCxtYWluPSJNYWxlIFdlaWdodCBESXN0cmlidXRpb24iLHlsYWI9IldlaWdodChrZykiLGNvbCA9ICJjeWFuIikNCklRUiA8LSBJUVIob2x5bXBpY3NfbSR3ZWlnaHQsIG5hLnJtID0gVFJVRSkgDQpxMSA8LSBxdWFudGlsZShvbHltcGljc19tJHdlaWdodCwgLjI1LCBuYS5ybSA9IFRSVUUpIA0KcTMgPC0gcXVhbnRpbGUob2x5bXBpY3NfbSR3ZWlnaHQsIC43NSwgbmEucm0gPSBUUlVFKQ0KYmVuY2hxMSA8LSAgKHExLTEuNSAqIElRUiApDQpiZW5jaHEzIDwtICAocTMrMS41ICogSVFSICkNCm9seW1waWNzX20kd2VpZ2h0W29seW1waWNzX20kd2VpZ2h0ID4gYmVuY2hxM10gPC0gYmVuY2hxMw0Kb2x5bXBpY3NfbSR3ZWlnaHRbb2x5bXBpY3NfbSR3ZWlnaHQgPCBiZW5jaHExXSA8LSBiZW5jaHExDQpib3hwbG90KG9seW1waWNzX20kd2VpZ2h0LG1haW49Ik1hbGUgV2VpZ2h0IERJc3QuIChIYW5kbGVkIE91dGxpZXJzKSIseWxhYj0iV2VpZ2h0KGtnKSIsY29sID0gImN5YW4iKQ0KYm94cGxvdC5zdGF0cyhvbHltcGljc19tJGhlaWdodCkkb3V0DQojRmVtYWxlDQpib3hwbG90KG9seW1waWNzX2YkaGVpZ2h0LG1haW49IkZlbWFsZSBIZWlnaHQgRElzdHJpYnV0aW9uIix5bGFiPSJIZWlnaHQoTSkiLGNvbCA9ICJkZWVwcGluayIpDQpJUVIgPC0gSVFSKG9seW1waWNzX2YkaGVpZ2h0LCBuYS5ybSA9IFRSVUUpIA0KcTEgPC0gcXVhbnRpbGUob2x5bXBpY3NfZiRoZWlnaHQsIC4yNSwgbmEucm0gPSBUUlVFKSANCnEzIDwtIHF1YW50aWxlKG9seW1waWNzX2YkaGVpZ2h0LCAuNzUsIG5hLnJtID0gVFJVRSkNCmJlbmNocTEgPC0gIChxMS0xLjUgKiBJUVIgKQ0KYmVuY2hxMyA8LSAgKHEzKzEuNSAqIElRUiApDQpvbHltcGljc19mJGhlaWdodFtvbHltcGljc19mJGhlaWdodCA+IGJlbmNocTNdIDwtIGJlbmNocTMNCm9seW1waWNzX2YkaGVpZ2h0W29seW1waWNzX2YkaGVpZ2h0IDwgYmVuY2hxMV0gPC0gYmVuY2hxMQ0KYm94cGxvdChvbHltcGljc19mJGhlaWdodCxtYWluPSJNYWxlIEhlaWdodCBESXN0LiAoSGFuZGxlZCBPdXRsaWVycykiLHlsYWI9IkhlaWdodChNKSIsY29sID0gImRlZXBwaW5rIikNCmJveHBsb3Quc3RhdHMob2x5bXBpY3NfZiRoZWlnaHQpJG91dA0KYm94cGxvdChvbHltcGljc19mJHdlaWdodCxtYWluPSJGZW1hbGUgV2VpZ2h0IERJc3RyaWJ1dGlvbiIseWxhYj0iV2VpZ2h0KGtnKSIsY29sID0gImRlZXBwaW5rIikNCklRUiA8LSBJUVIob2x5bXBpY3NfZiR3ZWlnaHQsIG5hLnJtID0gVFJVRSkgDQpxMSA8LSBxdWFudGlsZShvbHltcGljc19mJHdlaWdodCwgLjI1LCBuYS5ybSA9IFRSVUUpIA0KcTMgPC0gcXVhbnRpbGUob2x5bXBpY3NfZiR3ZWlnaHQsIC43NSwgbmEucm0gPSBUUlVFKQ0KYmVuY2hxMSA8LSAgKHExLTEuNSAqIElRUiApDQpiZW5jaHEzIDwtICAocTMrMS41ICogSVFSICkNCm9seW1waWNzX2Ykd2VpZ2h0W29seW1waWNzX2Ykd2VpZ2h0ID4gYmVuY2hxM10gPC0gYmVuY2hxMw0Kb2x5bXBpY3NfZiR3ZWlnaHRbb2x5bXBpY3NfZiR3ZWlnaHQgPCBiZW5jaHExXSA8LSBiZW5jaHExDQpib3hwbG90KG9seW1waWNzX2Ykd2VpZ2h0LG1haW49IkZlbWFsZSBXZWlnaHQgRGlzdC4gKEhhbmRsZWQgT3V0bGllcnMpIix5bGFiPSJXZWlnaHQoa2cpIixjb2wgPSAiZGVlcHBpbmsiKQ0KYm94cGxvdC5zdGF0cyhvbHltcGljc19mJHdlaWdodCkkb3V0DQpvbHltcGljc19maW5hbCA8LSByYmluZChvbHltcGljc19tLG9seW1waWNzX2YpDQpzdHIob2x5bXBpY3NfZmluYWwpDQpgYGANCg0KIyMJVHJhbnNmb3JtIA0KVG8gYW5hbHl6ZSB0aGUgYm9keSBtYXNzIGluZGV4IChCTUkpIG9mIHRoZSBhdGhsZXRlcywgd2UgaW50cm9kdWNlZCBhIG5ldyB2YXJpYWJsZSBjYWxsZWQgQk1JIHVzaW5nIHRoZSBoZWlnaHQgYW5kIHdlaWdodCBhdHRyaWJ1dGVzLiBUbyBjaGVjayB3aGV0aGVyIHRoZSBCTUkgYWdhaW5zdCBlYWNoIGdlbmRlciBmb2xsb3dzIG5vcm1hbGl0eSB3ZSB1c2VkIHFxUGxvdCBhbmQgaGlzdG9ncmFtLiBGcm9tIHRoZXNlIHBsb3RzIGl04oCZcyBvYnNlcnZlZCB0aGF0IGJvdGggZ2VuZGVycyBmb2xsb3dzIHJpZ2h0IHNrZXdlZCBub3JtYWxpdHkuIFNvLCBpbiBvcmRlciB0byBtYWtlIHRoZSBkaXN0cmlidXRpb24gbm9ybWFsIHdlIGhhZCB0YWtlbiB0aGUgbG9nYXJpdGhtaWMgdHJhbnNmb3JtYXRpb24gb2YgQk1JIG9uIGJvdGggdGhlIGdlbmRlcnMuIA0KDQpgYGB7cn0NCm9seW1waWNzX2ZpbmFsIDwtIG9seW1waWNzX2ZpbmFsICU+JSBtdXRhdGUoQk1JPXdlaWdodC9oZWlnaHReMikNCm9seW1waWNzX20gPC0gb2x5bXBpY3NfbSAlPiUgbXV0YXRlKEJNST13ZWlnaHQvaGVpZ2h0XjIpDQpvbHltcGljc19mIDwtIG9seW1waWNzX2YgJT4lIG11dGF0ZShCTUk9d2VpZ2h0L2hlaWdodF4yKQ0KDQpxcVBsb3Qob2x5bXBpY3NfbSRCTUksZGlzdD0ibm9ybSIsbWFpbj0iTWFsZSBCTUkiKQ0KcXFQbG90KG9seW1waWNzX2YkQk1JLGRpc3Q9Im5vcm0iLG1haW4gPSAiRmVtYWxlIEJNSSIpDQpwYXIobWZyb3c9YygxLDIpKQ0KaGlzdChvbHltcGljc19maW5hbCRCTUlbb2x5bXBpY3Mkc2V4PT0ibWFsZSJdLG1haW4gPSAiRGlzdHJpYnV0aW9uIG9mIE1hbGUgQk1JIix4bGFiID0gIkJNSSIpDQpoaXN0KG9seW1waWNzX2ZpbmFsJEJNSVtvbHltcGljcyRzZXg9PSJmZW1hbGUiXSxtYWluID0gIkRpc3RyaWJ1dGlvbiBvZiBGZW1hbGUgQk1JIix4bGFiID0gIkJNSSIpDQpoaXN0KGxvZyhvbHltcGljc19maW5hbCRCTUlbb2x5bXBpY3Mkc2V4PT0ibWFsZSJdKSxtYWluID0gIk1hbGUgQk1JIChMb2cgVHJhbnNmb3JtYXRpb24pICIseGxhYiA9ICJCTUkgd2l0aCBsb2cgdHJhbnNmb3JtYXRpb24iKQ0KaGlzdChsb2cob2x5bXBpY3NfZmluYWwkQk1JW29seW1waWNzJHNleD09ImZlbWFsZSJdKSxtYWluID0gIkZlbWFsZSBCTUkgKExvZyBUcmFuc2Zvcm1hdGlvbikgIix4bGFiID0gIkJNSSB3aXRoIGxvZyB0cmFuc2Zvcm1hdGlvbiIpDQpgYGANCjxicj4NCjxicj4NCg==