Required packages

library(readr)
library(tidyr)
library(dplyr)
library(Hmisc)
library(outliers)
library(lubridate)

Executive Summary

In the first section of the report, all the required packages and the datasets were imported into the R workspace. Next, we performed the required type conversions to various variables in the dataset i.e., Char to Factor and Char to DateTime. Following this, the data was checked to make sure it adheres to the tidy data principles. After making sure each variable had it’s own column, each observation had it’s own row and each value had it’s own cell, we proceeded to add a new variable, Shipping days to the dataset. Next, the dataset was scanned to find any NA, NaN or infinite values. Since having these values could pose serious problems, they were subsequently removed from the dataset using the appropriate techniques. The outliers were then detected in the numeric variables with the help of box plots. In the outliers section of the report, we deal with them using various methods such as capping and removing. In the final section of the report, we perform a log transformation on a variable to get rid of it’s right skewness and normalize it.

Data

Data sets source - https://www.kaggle.com/olistbr/brazilian-ecommerce

The data sets used for this assignment were obtained from kaggle. They are a part of the Brazilian E-Commerce Public Dataset released by Olist, an e-commerce company made for sellers, that links merchants and their products to the major marketplaces of Brazil.

Below are the details of variables in olist_orders_dataset.csv order_id - unique identifier of the order. customer_id - key to the customer dataset. Each order has a unique customer_id. order_status - Reference to the order status (delivered, shipped, etc). order_purchase_timestamp - Shows the purchase timestamp. order_approved_at - Shows the payment approval timestamp. order_delivered_carrier_date - Shows the order posting timestamp. When it was handled to the logistic partner. order_delivered_customer_date - Shows the actual order delivery date to the customer. order_estimated_delivery_date - Shows the estimated delivery date that was informed to customer at the purchase moment.

Below are the details of variables in olist_order_payments_dataset.csv order_id - unique identifier of an order. payment_sequential - a customer may pay an order with more than one payment method. If he does so, a sequence will be created to accommodate all payments. payment_type - method of payment chosen by the customer. payment_installments - number of installments chosen by the customer. payment_value - transaction value.

First both the datasets were imported into using the read.csv. The option for converting strings to factors was set to false. Following this, both the datasets were merged based on the order_id column using the merge command. The merged dataset was named orders_final.

ord <- read.csv("olist_orders_dataset.csv", stringsAsFactors = FALSE)
ord
payment <- read.csv("olist_order_payments_dataset.csv", stringsAsFactors = FALSE)
payment
#Merging orders and payment into orders_final
orders_final <- merge(payment,ord, by = "order_id")
orders_final

Understand

The orders_final dataframe was summarized using the str() function. Following this, the categorical columns were converted to factors using the as.factor function and the levels were viewed using the levels function. After this, the columns containing dates were converted from character to datetime using the lubridate function ymd_hms. This was applied to columns 8 to 12 using the lapply function.

#checking structure of the merged data set
str(orders_final)
'data.frame':   103886 obs. of  12 variables:
 $ order_id                     : chr  "00010242fe8c5a6d1ba2dd792cb16214" "00018f77f2f0320c557190d7a144bdd3" "000229ec398224ef6ca0657da4fc703e" "00024acbcdf0a6daa1e931b038114c75" ...
 $ payment_sequential           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ payment_type                 : chr  "credit_card" "credit_card" "credit_card" "credit_card" ...
 $ payment_installments         : int  2 3 5 2 3 1 1 10 3 1 ...
 $ payment_value                : num  72.2 259.8 216.9 25.8 218 ...
 $ customer_id                  : chr  "3ce436f183e68e07877b285a838db11a" "f6dd3ec061db4e3987629fe6b26e5cce" "6489ae5e4333f3693df5ad4372dab6d3" "d4eb9395c8c0431ee92fce09860c5a06" ...
 $ order_status                 : chr  "delivered" "delivered" "delivered" "delivered" ...
 $ order_purchase_timestamp     : chr  "2017-09-13 08:59:02" "2017-04-26 10:53:06" "2018-01-14 14:33:31" "2018-08-08 10:00:35" ...
 $ order_approved_at            : chr  "2017-09-13 09:45:35" "2017-04-26 11:05:13" "2018-01-14 14:48:30" "2018-08-08 10:10:18" ...
 $ order_delivered_carrier_date : chr  "2017-09-19 18:34:16" "2017-05-04 14:35:00" "2018-01-16 12:36:48" "2018-08-10 13:28:00" ...
 $ order_delivered_customer_date: chr  "2017-09-20 23:43:48" "2017-05-12 16:04:24" "2018-01-22 13:19:16" "2018-08-14 13:32:39" ...
 $ order_estimated_delivery_date: chr  "2017-09-29 00:00:00" "2017-05-15 00:00:00" "2018-02-05 00:00:00" "2018-08-20 00:00:00" ...
#Factoring variables payement type and order status
orders_final$payment_type <- as.factor(orders_final$payment_type)
levels(orders_final$payment_type)
[1] "boleto"      "credit_card" "debit_card"  "not_defined" "voucher"    
orders_final$order_status <- as.factor(orders_final$order_status)
levels(orders_final$order_status)
[1] "approved"    "canceled"    "created"     "delivered"   "invoiced"    "processing" 
[7] "shipped"     "unavailable"
#Converting character to date format for last five columns
orders_final[8:12] <- lapply(orders_final[8:12], ymd_hms)
sapply(orders_final[8:12], class)
     order_purchase_timestamp order_approved_at order_delivered_carrier_date
[1,] "POSIXct"                "POSIXct"         "POSIXct"                   
[2,] "POSIXt"                 "POSIXt"          "POSIXt"                    
     order_delivered_customer_date order_estimated_delivery_date
[1,] "POSIXct"                     "POSIXct"                    
[2,] "POSIXt"                      "POSIXt"                     

Tidy & Manipulate Data I

Check if the data conforms the tidy data principles. If your data is untidy, reshape your data into a tidy format (minimum requirement #5). In addition to the R codes and outputs, explain everything that you do in this step.

The dataset was checked to make sure it adheres to the tidy data priniciples of 1) Each variable must have its own column. 2) Each observation must have its own row. 3) Each value must have its own cell.

Our dataset follows these priniciples and thereby is in a tidy format.

orders_final

Tidy & Manipulate Data II

The column shipping days was formed by finding the days difference between the order purchase day and the delivery day. It was created with the help of the mutate function with the units set to days. Following this, the floor function was applied to all observartions in the new column as days represent whole number.

#creating shipping days variable using mutate
orders_final <- mutate(orders_final,shipping_days = difftime(order_delivered_customer_date, order_purchase_timestamp, units = "days"))
#Rounf the date using floor for shipping days
orders_final$shipping_days <- floor(orders_final$shipping_days)

Scan I

#Checking for NA values(Missing values)
sum(is.na(orders_final))
[1] 8327
colSums(is.na(orders_final))
                     order_id            payment_sequential 
                            0                             0 
                 payment_type          payment_installments 
                            0                             0 
                payment_value                   customer_id 
                            0                             0 
                 order_status      order_purchase_timestamp 
                            0                             0 
            order_approved_at  order_delivered_carrier_date 
                          175                          1888 
order_delivered_customer_date order_estimated_delivery_date 
                         3132                             0 
                shipping_days 
                         3132 
##3132/103886 * 100 = 3.015 <- order_delivered customer date
#Percentage of missing values calculation
percentage_missing_values <- sum(is.na(orders_final$order_delivered_customer_date))/length(orders_final$order_delivered_customer_date) * 100
percentage_missing_values
[1] 3.014843
#After removing missing values
orders_final <- na.omit(orders_final)
#validating after updating the dataset
colSums(is.na(orders_final))
                     order_id            payment_sequential 
                            0                             0 
                 payment_type          payment_installments 
                            0                             0 
                payment_value                   customer_id 
                            0                             0 
                 order_status      order_purchase_timestamp 
                            0                             0 
            order_approved_at  order_delivered_carrier_date 
                            0                             0 
order_delivered_customer_date order_estimated_delivery_date 
                            0                             0 
                shipping_days 
                            0 
#Code for checking any NaN (Not a number)
sum(sapply(orders_final, is.nan))
[1] 0
#Code for checking any infinite values
sum(sapply(orders_final, is.infinite))
[1] 0

Scan II

pay_sequence_outliers <- boxplot(orders_final$payment_sequential, main="Distribution of payment sequential", col = "grey")

#calculating percentage of missing values
percentage_missing_values <- length(pay_sequence_outliers$out)/length(orders_final$payment_sequential) * 100
percentage_missing_values
[1] 4.325038
#Payment sequence - removing outliers
orders_final <- orders_final[!(orders_final$payment_sequential %in% pay_sequence_outliers$out), ]
boxplot(orders_final$payment_sequential, main="Distribution of updated payment sequential", col = "lightblue")

#capping function
cap <- function(x){
  quantiles <- quantile( x, c(.05, 0.25, 0.75, .95 ) )
  x[ x < quantiles[2] - 1.5*IQR(x) ] <- quantiles[1]
  x[ x > quantiles[3] + 1.5*IQR(x) ] <- quantiles[4]
  x
}
pay_installments_outliers <- boxplot(orders_final$payment_installments, main="Distribution of payment installments", col = "grey")

#payment installment - capping method
orders_final$payment_installments <- orders_final$payment_installments %>% cap()
boxplot(orders_final$payment_installments, main="Distribution of updated payment installments", col = "lightblue")

pay_value_outliers <- boxplot(orders_final$payment_value, main="Distribution of payment value", col = "grey")

#payment value - capping method
orders_final$payment_value <- orders_final$payment_value %>% cap()
boxplot(orders_final$payment_value, main="Distribution of updated payment value", col = "lightblue")

Transform

For the final task, we decided to use perfrom logarithmic transformation on the payment value variable. The reason for going with this transformation was to reduce the right skewness of the variable and normailize it. The transformation was performed using the log() function. Before and After histograms along with a distribution curve was plotted to visualize the transformation.

hist(orders_final$payment_value, main = "Histogram of payment value", xlab = "Payment value")

#logarithmic transformation of payement value variable as it was right skewed
orders_final <- orders_final %>% mutate(log_payment_value = log(orders_final$payment_value))
#after the transformation, the values are normalized.
h <- hist(orders_final$log_payment_value, breaks = 20, xlim=c(2,8), main = "Histogram of log  transformed payment value", xlab = "Payment value in terms of log")
xfit <- seq(min(orders_final$log_payment_value), max(orders_final$log_payment_value), length = length(orders_final$log_payment_value)) 
yfit <- dnorm(xfit, mean = mean(orders_final$log_payment_value), sd = sd(orders_final$log_payment_value)) 
yfit <- yfit * diff(h$mids[1:2]) * length(orders_final$log_payment_value) 
lines(xfit, yfit, col = "blue", lwd = 2)



LS0tCnRpdGxlOiAiTUFUSDIzNDkgU2VtZXN0ZXIgMiwgMjAxOSIKYXV0aG9yOiAiQXNod2luIEFuaXMgLSBzMzc2MzQ3NiAvIERlZXBhayBQcmFzYWQgLSBzMzc1OTEwOCIKc3VidGl0bGU6IEFzc2lnbm1lbnQgMwpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAotLS0KCmBgYHtyIHNldHVwLCBtZXNzYWdlPUZBTFNFLCBlY2hvPUZBTFNFfQpyZXF1aXJlKCJrbml0ciIpCm9wdHNfa25pdCRzZXQocm9vdC5kaXIgPSAiL1VzZXJzL2RwcmFzYWRnL0Rlc2t0b3AvRFAvQXNzaWdubWVudCAzIikKYGBgCgojIyBSZXF1aXJlZCBwYWNrYWdlcyAKCmBgYHtyLCBlY2hvID0gVFJVRSwgbWVzc2FnZT1GQUxTRX0KbGlicmFyeShyZWFkcikKbGlicmFyeSh0aWR5cikKbGlicmFyeShkcGx5cikKbGlicmFyeShIbWlzYykKbGlicmFyeShvdXRsaWVycykKbGlicmFyeShsdWJyaWRhdGUpCmBgYAoKCiMjIEV4ZWN1dGl2ZSBTdW1tYXJ5IAoKSW4gdGhlIGZpcnN0IHNlY3Rpb24gb2YgdGhlIHJlcG9ydCwgYWxsIHRoZSByZXF1aXJlZCBwYWNrYWdlcyBhbmQgdGhlIGRhdGFzZXRzIHdlcmUgaW1wb3J0ZWQgaW50byB0aGUgUiB3b3Jrc3BhY2UuIE5leHQsIHdlIHBlcmZvcm1lZCB0aGUgcmVxdWlyZWQgdHlwZSBjb252ZXJzaW9ucyB0byB2YXJpb3VzIHZhcmlhYmxlcyBpbiB0aGUgZGF0YXNldCBpLmUuLCBDaGFyIHRvIEZhY3RvciBhbmQgQ2hhciB0byBEYXRlVGltZS4gRm9sbG93aW5nIHRoaXMsIHRoZSBkYXRhIHdhcyBjaGVja2VkIHRvIG1ha2Ugc3VyZSBpdCBhZGhlcmVzIHRvIHRoZSB0aWR5IGRhdGEgcHJpbmNpcGxlcy4gQWZ0ZXIgbWFraW5nIHN1cmUgZWFjaCB2YXJpYWJsZSBoYWQgaXQncyBvd24gY29sdW1uLCBlYWNoIG9ic2VydmF0aW9uIGhhZCBpdCdzIG93biByb3cgYW5kIGVhY2ggdmFsdWUgaGFkIGl0J3Mgb3duIGNlbGwsIHdlIHByb2NlZWRlZCB0byBhZGQgYSBuZXcgdmFyaWFibGUsIFNoaXBwaW5nIGRheXMgdG8gdGhlIGRhdGFzZXQuIE5leHQsIHRoZSBkYXRhc2V0IHdhcyBzY2FubmVkIHRvIGZpbmQgYW55IE5BLCBOYU4gb3IgaW5maW5pdGUgdmFsdWVzLiBTaW5jZSBoYXZpbmcgdGhlc2UgdmFsdWVzIGNvdWxkIHBvc2Ugc2VyaW91cyBwcm9ibGVtcywgdGhleSB3ZXJlIHN1YnNlcXVlbnRseSByZW1vdmVkIGZyb20gdGhlIGRhdGFzZXQgdXNpbmcgdGhlIGFwcHJvcHJpYXRlIHRlY2huaXF1ZXMuIFRoZSBvdXRsaWVycyB3ZXJlIHRoZW4gZGV0ZWN0ZWQgaW4gdGhlIG51bWVyaWMgdmFyaWFibGVzIHdpdGggdGhlIGhlbHAgb2YgYm94IHBsb3RzLiBJbiB0aGUgb3V0bGllcnMgc2VjdGlvbiBvZiB0aGUgcmVwb3J0LCB3ZSBkZWFsIHdpdGggdGhlbSB1c2luZyB2YXJpb3VzIG1ldGhvZHMgc3VjaCBhcyBjYXBwaW5nIGFuZCByZW1vdmluZy4gSW4gdGhlIGZpbmFsIHNlY3Rpb24gb2YgdGhlIHJlcG9ydCwgd2UgcGVyZm9ybSBhIGxvZyB0cmFuc2Zvcm1hdGlvbiBvbiBhIHZhcmlhYmxlIHRvIGdldCByaWQgb2YgaXQncyByaWdodCBza2V3bmVzcyBhbmQgbm9ybWFsaXplIGl0LiAKCgojIyBEYXRhIAoKRGF0YSBzZXRzIHNvdXJjZSAtIGh0dHBzOi8vd3d3LmthZ2dsZS5jb20vb2xpc3Rici9icmF6aWxpYW4tZWNvbW1lcmNlCgpUaGUgZGF0YSBzZXRzIHVzZWQgZm9yIHRoaXMgYXNzaWdubWVudCB3ZXJlIG9idGFpbmVkIGZyb20ga2FnZ2xlLiBUaGV5IGFyZSBhIHBhcnQgb2YgdGhlIEJyYXppbGlhbiBFLUNvbW1lcmNlIFB1YmxpYyBEYXRhc2V0IHJlbGVhc2VkIGJ5IE9saXN0LCBhbiBlLWNvbW1lcmNlIGNvbXBhbnkgbWFkZSBmb3Igc2VsbGVycywgdGhhdCBsaW5rcyBtZXJjaGFudHMgYW5kIHRoZWlyIHByb2R1Y3RzIHRvIHRoZSBtYWpvciBtYXJrZXRwbGFjZXMgb2YgQnJhemlsLiAgCgpCZWxvdyBhcmUgdGhlIGRldGFpbHMgb2YgdmFyaWFibGVzIGluIG9saXN0X29yZGVyc19kYXRhc2V0LmNzdgpvcmRlcl9pZCAtIHVuaXF1ZSBpZGVudGlmaWVyIG9mIHRoZSBvcmRlci4KY3VzdG9tZXJfaWQgLSBrZXkgdG8gdGhlIGN1c3RvbWVyIGRhdGFzZXQuIEVhY2ggb3JkZXIgaGFzIGEgdW5pcXVlIGN1c3RvbWVyX2lkLgpvcmRlcl9zdGF0dXMgLSBSZWZlcmVuY2UgdG8gdGhlIG9yZGVyIHN0YXR1cyAoZGVsaXZlcmVkLCBzaGlwcGVkLCBldGMpLgpvcmRlcl9wdXJjaGFzZV90aW1lc3RhbXAgLSBTaG93cyB0aGUgcHVyY2hhc2UgdGltZXN0YW1wLgpvcmRlcl9hcHByb3ZlZF9hdCAtIFNob3dzIHRoZSBwYXltZW50IGFwcHJvdmFsIHRpbWVzdGFtcC4Kb3JkZXJfZGVsaXZlcmVkX2NhcnJpZXJfZGF0ZSAtIFNob3dzIHRoZSBvcmRlciBwb3N0aW5nIHRpbWVzdGFtcC4gV2hlbiBpdCB3YXMgaGFuZGxlZCB0byB0aGUgbG9naXN0aWMgcGFydG5lci4Kb3JkZXJfZGVsaXZlcmVkX2N1c3RvbWVyX2RhdGUgLSBTaG93cyB0aGUgYWN0dWFsIG9yZGVyIGRlbGl2ZXJ5IGRhdGUgdG8gdGhlIGN1c3RvbWVyLgpvcmRlcl9lc3RpbWF0ZWRfZGVsaXZlcnlfZGF0ZSAtIFNob3dzIHRoZSBlc3RpbWF0ZWQgZGVsaXZlcnkgZGF0ZSB0aGF0IHdhcyBpbmZvcm1lZCB0byBjdXN0b21lciBhdCB0aGUgcHVyY2hhc2UgbW9tZW50LgoKQmVsb3cgYXJlIHRoZSBkZXRhaWxzIG9mIHZhcmlhYmxlcyBpbiBvbGlzdF9vcmRlcl9wYXltZW50c19kYXRhc2V0LmNzdgpvcmRlcl9pZCAtIHVuaXF1ZSBpZGVudGlmaWVyIG9mIGFuIG9yZGVyLgpwYXltZW50X3NlcXVlbnRpYWwgLSBhIGN1c3RvbWVyIG1heSBwYXkgYW4gb3JkZXIgd2l0aCBtb3JlIHRoYW4gb25lIHBheW1lbnQgbWV0aG9kLiBJZiBoZSBkb2VzIHNvLCBhIHNlcXVlbmNlIHdpbGwgYmUgY3JlYXRlZCB0byBhY2NvbW1vZGF0ZSBhbGwgcGF5bWVudHMuCnBheW1lbnRfdHlwZSAtIG1ldGhvZCBvZiBwYXltZW50IGNob3NlbiBieSB0aGUgY3VzdG9tZXIuCnBheW1lbnRfaW5zdGFsbG1lbnRzIC0gbnVtYmVyIG9mIGluc3RhbGxtZW50cyBjaG9zZW4gYnkgdGhlIGN1c3RvbWVyLgpwYXltZW50X3ZhbHVlIC0gdHJhbnNhY3Rpb24gdmFsdWUuCgoKRmlyc3QgYm90aCB0aGUgZGF0YXNldHMgd2VyZSBpbXBvcnRlZCBpbnRvIHVzaW5nIHRoZSByZWFkLmNzdi4gVGhlIG9wdGlvbiBmb3IgY29udmVydGluZyBzdHJpbmdzIHRvIGZhY3RvcnMgd2FzIHNldCB0byBmYWxzZS4gRm9sbG93aW5nIHRoaXMsIGJvdGggdGhlIGRhdGFzZXRzIHdlcmUgbWVyZ2VkIGJhc2VkIG9uIHRoZSBvcmRlcl9pZCBjb2x1bW4gdXNpbmcgdGhlIG1lcmdlIGNvbW1hbmQuIFRoZSBtZXJnZWQgZGF0YXNldCB3YXMgbmFtZWQgb3JkZXJzX2ZpbmFsLiAKCmBgYHtyfQpvcmQgPC0gcmVhZC5jc3YoIm9saXN0X29yZGVyc19kYXRhc2V0LmNzdiIsIHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRSkKb3JkCnBheW1lbnQgPC0gcmVhZC5jc3YoIm9saXN0X29yZGVyX3BheW1lbnRzX2RhdGFzZXQuY3N2Iiwgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFKQpwYXltZW50CgojTWVyZ2luZyBvcmRlcnMgYW5kIHBheW1lbnQgaW50byBvcmRlcnNfZmluYWwKb3JkZXJzX2ZpbmFsIDwtIG1lcmdlKHBheW1lbnQsb3JkLCBieSA9ICJvcmRlcl9pZCIpCm9yZGVyc19maW5hbApgYGAKCgojIyBVbmRlcnN0YW5kIAoKVGhlIG9yZGVyc19maW5hbCBkYXRhZnJhbWUgd2FzIHN1bW1hcml6ZWQgdXNpbmcgdGhlIHN0cigpIGZ1bmN0aW9uLiBGb2xsb3dpbmcgdGhpcywgdGhlIGNhdGVnb3JpY2FsIGNvbHVtbnMgd2VyZSBjb252ZXJ0ZWQgdG8gZmFjdG9ycyB1c2luZyB0aGUgYXMuZmFjdG9yIGZ1bmN0aW9uIGFuZCB0aGUgbGV2ZWxzIHdlcmUgdmlld2VkIHVzaW5nIHRoZSBsZXZlbHMgZnVuY3Rpb24uIEFmdGVyIHRoaXMsIHRoZSBjb2x1bW5zIGNvbnRhaW5pbmcgZGF0ZXMgd2VyZSBjb252ZXJ0ZWQgZnJvbSBjaGFyYWN0ZXIgdG8gZGF0ZXRpbWUgdXNpbmcgdGhlIGx1YnJpZGF0ZSBmdW5jdGlvbiB5bWRfaG1zLiBUaGlzIHdhcyBhcHBsaWVkIHRvIGNvbHVtbnMgOCB0byAxMiB1c2luZyB0aGUgbGFwcGx5IGZ1bmN0aW9uLiAKCmBgYHtyfQojY2hlY2tpbmcgc3RydWN0dXJlIG9mIHRoZSBtZXJnZWQgZGF0YSBzZXQKc3RyKG9yZGVyc19maW5hbCkKCiNGYWN0b3JpbmcgdmFyaWFibGVzIHBheWVtZW50IHR5cGUgYW5kIG9yZGVyIHN0YXR1cwpvcmRlcnNfZmluYWwkcGF5bWVudF90eXBlIDwtIGFzLmZhY3RvcihvcmRlcnNfZmluYWwkcGF5bWVudF90eXBlKQpsZXZlbHMob3JkZXJzX2ZpbmFsJHBheW1lbnRfdHlwZSkKb3JkZXJzX2ZpbmFsJG9yZGVyX3N0YXR1cyA8LSBhcy5mYWN0b3Iob3JkZXJzX2ZpbmFsJG9yZGVyX3N0YXR1cykKbGV2ZWxzKG9yZGVyc19maW5hbCRvcmRlcl9zdGF0dXMpCgojQ29udmVydGluZyBjaGFyYWN0ZXIgdG8gZGF0ZSBmb3JtYXQgZm9yIGxhc3QgZml2ZSBjb2x1bW5zCm9yZGVyc19maW5hbFs4OjEyXSA8LSBsYXBwbHkob3JkZXJzX2ZpbmFsWzg6MTJdLCB5bWRfaG1zKQpzYXBwbHkob3JkZXJzX2ZpbmFsWzg6MTJdLCBjbGFzcykKYGBgCgoKIyMJVGlkeSAmIE1hbmlwdWxhdGUgRGF0YSBJIAoKQ2hlY2sgaWYgdGhlIGRhdGEgY29uZm9ybXMgdGhlIHRpZHkgZGF0YSBwcmluY2lwbGVzLiBJZiB5b3VyIGRhdGEgaXMgdW50aWR5LCByZXNoYXBlIHlvdXIgZGF0YSBpbnRvIGEgdGlkeSBmb3JtYXQgKG1pbmltdW0gcmVxdWlyZW1lbnQgIzUpLiBJbiBhZGRpdGlvbiB0byB0aGUgUiBjb2RlcyBhbmQgb3V0cHV0cywgZXhwbGFpbiBldmVyeXRoaW5nIHRoYXQgeW91IGRvIGluIHRoaXMgc3RlcC4KClRoZSBkYXRhc2V0IHdhcyBjaGVja2VkIHRvIG1ha2Ugc3VyZSBpdCBhZGhlcmVzIHRvIHRoZSB0aWR5IGRhdGEgcHJpbmljaXBsZXMgb2YgCjEpIEVhY2ggdmFyaWFibGUgbXVzdCBoYXZlIGl0cyBvd24gY29sdW1uLgoyKSBFYWNoIG9ic2VydmF0aW9uIG11c3QgaGF2ZSBpdHMgb3duIHJvdy4KMykgRWFjaCB2YWx1ZSBtdXN0IGhhdmUgaXRzIG93biBjZWxsLgoKT3VyIGRhdGFzZXQgZm9sbG93cyB0aGVzZSBwcmluaWNpcGxlcyBhbmQgdGhlcmVieSBpcyBpbiBhIHRpZHkgZm9ybWF0LgoKYGBge3J9Cm9yZGVyc19maW5hbApgYGAKCgojIwlUaWR5ICYgTWFuaXB1bGF0ZSBEYXRhIElJIAoKVGhlIGNvbHVtbiBzaGlwcGluZyBkYXlzIHdhcyBmb3JtZWQgYnkgZmluZGluZyB0aGUgZGF5cyBkaWZmZXJlbmNlIGJldHdlZW4gdGhlIG9yZGVyIHB1cmNoYXNlIGRheSBhbmQgdGhlIGRlbGl2ZXJ5IGRheS4gSXQgd2FzIGNyZWF0ZWQgd2l0aCB0aGUgaGVscCBvZiB0aGUgbXV0YXRlIGZ1bmN0aW9uIHdpdGggdGhlIHVuaXRzIHNldCB0byBkYXlzLiBGb2xsb3dpbmcgdGhpcywgdGhlIGZsb29yIGZ1bmN0aW9uIHdhcyBhcHBsaWVkIHRvIGFsbCBvYnNlcnZhcnRpb25zIGluIHRoZSBuZXcgY29sdW1uIGFzIGRheXMgcmVwcmVzZW50IHdob2xlIG51bWJlci4KCmBgYHtyfQojY3JlYXRpbmcgc2hpcHBpbmcgZGF5cyB2YXJpYWJsZSB1c2luZyBtdXRhdGUKb3JkZXJzX2ZpbmFsIDwtIG11dGF0ZShvcmRlcnNfZmluYWwsc2hpcHBpbmdfZGF5cyA9IGRpZmZ0aW1lKG9yZGVyX2RlbGl2ZXJlZF9jdXN0b21lcl9kYXRlLCBvcmRlcl9wdXJjaGFzZV90aW1lc3RhbXAsIHVuaXRzID0gImRheXMiKSkKI1JvdW5mIHRoZSBkYXRlIHVzaW5nIGZsb29yIGZvciBzaGlwcGluZyBkYXlzCm9yZGVyc19maW5hbCRzaGlwcGluZ19kYXlzIDwtIGZsb29yKG9yZGVyc19maW5hbCRzaGlwcGluZ19kYXlzKQpgYGAKCgojIwlTY2FuIEkgCgoqIFRoZSBudW1iZXIgb2YgTkEgdmFsdWVzIHdlcmUgb2J0YWluZWQgYnkgdXNpbmcgdGhlIGlzLm5hIGZ1bmN0aW9uIG9uIHRoZSBkYXRhc2V0IGFuZCB0aGVuIHVzaW5nIHRoZSBzdW0gZnVuY3Rpb24gb24gdGhlIHJlc3VsdCBvZiB0aGUgZm9ybWVyLiAKKiBUaGUgdG90YWwgbnVtYmVyIG9mIE5BIHZhbHVlcyB3ZXJlIG9ic2VydmVkIHRvIGJlIDgzMjcuIFRvIGdldCBhIG1vcmUgY2xlYXJlciBwaWN0dXJlLCB0aGUgTkEgdmFsdWVzIGluIGVhY2ggY29sdW1uIHdhcyBvYnNlcnZlZCB1c2luZyB0aGUgY29sU3VtcyBmdW5jdGlvbi4gCiogVGhlIE5BIHZhbHVlcyB3ZXJlIHJlbW92ZWQgZnJvbSB0aGUgZGF0YXNldCB1c2luZyB0aGUgbmEub21pdCBmdW5jdGlvbi4gVGhlIGRlY2lzaW9uIHRvIHJlbW92ZSB0aGVtIHdhcyBtYWRlIHNpbmNlIHRoZXkgbWFrZSB1cCBvbmx5IDMgcGVyY2VudCBvZiB0aGUgZGF0YXNldCBhbmQgZWxpbWluYXRpbmcgdGhlbSB3b24ndCBjYXVzZSBtdWNoIG9mIGFuIGltcGFjdCB0byB0aGUgZGF0YXNldC4gCiogVGhlIGRhdGUgYW5kIHNoaXBwaW5nIGRheXMgY29sdW1ucyBhcmUgYWxsIGNsb3NlbHkgcmVsYXRlZCB0byBlYWNoIG90aGVyIGFuZCBtaXNzaW5nIHZhbHVlcyBpbiBvbmUgY29sdW1uIGNvdWxkIGNhdXNlIG1pc3NpbmcgdmFsdWVzIGluIHRoZSBvdGhlcnMsIHRoaXMgaXMgd2h5IHRoZSBudW1iZXIgb2YgTkEgdmFsdWVzIGlzIHNob3duIGF0IDgzMjcuIAoqIFZhbGlkYXRpb25zIHdlcmUgbWFkZSB0byBtYWtlIHN1cmUgdGhlIE5BIHZhbHVlcyB3ZXJlIHJlbW92ZWQuIAoqIGNoZWNrcyB3ZXJlIG1hZGUgdG8gbWFrZSBzdXJlIGlmIHRoZSBkYXRhc2V0IGNvbnRhaW5lZCBhbnkgTkFOIG9yIGluZmluaXRlIHZhbHVlcyB1c2luZyB0aGUgaXMubmFuIGFuZCBpcy5pbmZpbml0ZSBmdW5jdGlvbnMgcmVzcGVjdGl2ZWx5LiBUaGlzIHdhcyBhcHBsaWVkIHRvIGFsbCB0aGUgY29sdW1ucyB1c2luZyB0aGUgc2FwcGx5IGZ1bmN0aW9uLiBGcm9tIHRoaXMsIHdlIGZvdW5kIG91dCB0aGVyZSB3ZXJlIG5vIGluZmluaXRlIG9yIE5hTiB2YWx1ZXMgcHJlc2VudCBpbiB0aGUgZGF0YXNldC4KCmBgYHtyfQojQ2hlY2tpbmcgZm9yIE5BIHZhbHVlcyhNaXNzaW5nIHZhbHVlcykKc3VtKGlzLm5hKG9yZGVyc19maW5hbCkpCgpjb2xTdW1zKGlzLm5hKG9yZGVyc19maW5hbCkpCiMjMzEzMi8xMDM4ODYgKiAxMDAgPSAzLjAxNSA8LSBvcmRlcl9kZWxpdmVyZWQgY3VzdG9tZXIgZGF0ZQoKI1BlcmNlbnRhZ2Ugb2YgbWlzc2luZyB2YWx1ZXMgY2FsY3VsYXRpb24KcGVyY2VudGFnZV9taXNzaW5nX3ZhbHVlcyA8LSBzdW0oaXMubmEob3JkZXJzX2ZpbmFsJG9yZGVyX2RlbGl2ZXJlZF9jdXN0b21lcl9kYXRlKSkvbGVuZ3RoKG9yZGVyc19maW5hbCRvcmRlcl9kZWxpdmVyZWRfY3VzdG9tZXJfZGF0ZSkgKiAxMDAKcGVyY2VudGFnZV9taXNzaW5nX3ZhbHVlcwoKI0FmdGVyIHJlbW92aW5nIG1pc3NpbmcgdmFsdWVzCm9yZGVyc19maW5hbCA8LSBuYS5vbWl0KG9yZGVyc19maW5hbCkKI3ZhbGlkYXRpbmcgYWZ0ZXIgdXBkYXRpbmcgdGhlIGRhdGFzZXQKY29sU3Vtcyhpcy5uYShvcmRlcnNfZmluYWwpKQoKI0NvZGUgZm9yIGNoZWNraW5nIGFueSBOYU4gKE5vdCBhIG51bWJlcikKc3VtKHNhcHBseShvcmRlcnNfZmluYWwsIGlzLm5hbikpCgojQ29kZSBmb3IgY2hlY2tpbmcgYW55IGluZmluaXRlIHZhbHVlcwpzdW0oc2FwcGx5KG9yZGVyc19maW5hbCwgaXMuaW5maW5pdGUpKQpgYGAKCgojIwlTY2FuIElJCgoqIE51bWVyaWMgdmFyaWFibGVzIGxpa2UgcGF5bWVudCBzZXF1ZW50aWFsLCBwYXltZW50X2luc3RhbGxtZW50cyBhbmQgcGF5bWVudF92YWx1ZSB3ZXJlIGNoZWNrZWQgZm9yIG91dGxpZXJzLiAKKiBQYXltZW50IHNlcXVlbnRpYWwgaXMgaGFuZGxlZCBieSByZW1vdmluZyBvdXRseWluZyBvYnNlcnZhdGlvbnMgYXMgdGhleSBjb250cmlidXRlZCA0LjMlIG9mIHRoZSBkYXRhc2V0KGNhbGN1bGF0aW9uIHNob3duIGJlbG93KS4KKiBDYXBwaW5nIGFwcHJvYWNoIHdhcyBpbmNvcnBvcmF0ZWQgZm9yIHBheW1lbnQgdmFsdWUgYW5kIHBheW1lbnQgaW5zdGFsbG1lbnQgYnkgY29uc2lkZXJpbmcgdGhlIG5lYXJlc3QgbmVpZ2hib3Igb2YgdGhlIDk1dGggb3IgNXRoIHBlcmNlbnRpbGUgZm9yIHRoZSBvdXRsaWVyIHZhbHVlLgoqIEJveHBsb3RzIHdlcmUgcGxvdHRlZCB0byB2aXN1YWxpemUgdGhlIG91dGxpZXJzIGluIHRoZXNlIHZhcmlhYmxlcy4KCmBgYHtyfQoKcGF5X3NlcXVlbmNlX291dGxpZXJzIDwtIGJveHBsb3Qob3JkZXJzX2ZpbmFsJHBheW1lbnRfc2VxdWVudGlhbCwgbWFpbj0iRGlzdHJpYnV0aW9uIG9mIHBheW1lbnQgc2VxdWVudGlhbCIsIGNvbCA9ICJncmV5IikKCiNjYWxjdWxhdGluZyBwZXJjZW50YWdlIG9mIG1pc3NpbmcgdmFsdWVzCnBlcmNlbnRhZ2VfbWlzc2luZ192YWx1ZXMgPC0gbGVuZ3RoKHBheV9zZXF1ZW5jZV9vdXRsaWVycyRvdXQpL2xlbmd0aChvcmRlcnNfZmluYWwkcGF5bWVudF9zZXF1ZW50aWFsKSAqIDEwMApwZXJjZW50YWdlX21pc3NpbmdfdmFsdWVzCgojUGF5bWVudCBzZXF1ZW5jZSAtIHJlbW92aW5nIG91dGxpZXJzCm9yZGVyc19maW5hbCA8LSBvcmRlcnNfZmluYWxbIShvcmRlcnNfZmluYWwkcGF5bWVudF9zZXF1ZW50aWFsICVpbiUgcGF5X3NlcXVlbmNlX291dGxpZXJzJG91dCksIF0KYm94cGxvdChvcmRlcnNfZmluYWwkcGF5bWVudF9zZXF1ZW50aWFsLCBtYWluPSJEaXN0cmlidXRpb24gb2YgdXBkYXRlZCBwYXltZW50IHNlcXVlbnRpYWwiLCBjb2wgPSAibGlnaHRibHVlIikKCiNjYXBwaW5nIGZ1bmN0aW9uCmNhcCA8LSBmdW5jdGlvbih4KXsKICBxdWFudGlsZXMgPC0gcXVhbnRpbGUoIHgsIGMoLjA1LCAwLjI1LCAwLjc1LCAuOTUgKSApCiAgeFsgeCA8IHF1YW50aWxlc1syXSAtIDEuNSpJUVIoeCkgXSA8LSBxdWFudGlsZXNbMV0KICB4WyB4ID4gcXVhbnRpbGVzWzNdICsgMS41KklRUih4KSBdIDwtIHF1YW50aWxlc1s0XQogIHgKfQoKcGF5X2luc3RhbGxtZW50c19vdXRsaWVycyA8LSBib3hwbG90KG9yZGVyc19maW5hbCRwYXltZW50X2luc3RhbGxtZW50cywgbWFpbj0iRGlzdHJpYnV0aW9uIG9mIHBheW1lbnQgaW5zdGFsbG1lbnRzIiwgY29sID0gImdyZXkiKQoKI3BheW1lbnQgaW5zdGFsbG1lbnQgLSBjYXBwaW5nIG1ldGhvZApvcmRlcnNfZmluYWwkcGF5bWVudF9pbnN0YWxsbWVudHMgPC0gb3JkZXJzX2ZpbmFsJHBheW1lbnRfaW5zdGFsbG1lbnRzICU+JSBjYXAoKQpib3hwbG90KG9yZGVyc19maW5hbCRwYXltZW50X2luc3RhbGxtZW50cywgbWFpbj0iRGlzdHJpYnV0aW9uIG9mIHVwZGF0ZWQgcGF5bWVudCBpbnN0YWxsbWVudHMiLCBjb2wgPSAibGlnaHRibHVlIikKCnBheV92YWx1ZV9vdXRsaWVycyA8LSBib3hwbG90KG9yZGVyc19maW5hbCRwYXltZW50X3ZhbHVlLCBtYWluPSJEaXN0cmlidXRpb24gb2YgcGF5bWVudCB2YWx1ZSIsIGNvbCA9ICJncmV5IikKCiNwYXltZW50IHZhbHVlIC0gY2FwcGluZyBtZXRob2QKb3JkZXJzX2ZpbmFsJHBheW1lbnRfdmFsdWUgPC0gb3JkZXJzX2ZpbmFsJHBheW1lbnRfdmFsdWUgJT4lIGNhcCgpCmJveHBsb3Qob3JkZXJzX2ZpbmFsJHBheW1lbnRfdmFsdWUsIG1haW49IkRpc3RyaWJ1dGlvbiBvZiB1cGRhdGVkIHBheW1lbnQgdmFsdWUiLCBjb2wgPSAibGlnaHRibHVlIikKYGBgCgoKIyMJVHJhbnNmb3JtIAoKRm9yIHRoZSBmaW5hbCB0YXNrLCB3ZSBkZWNpZGVkIHRvIHVzZSBwZXJmcm9tIGxvZ2FyaXRobWljIHRyYW5zZm9ybWF0aW9uIG9uIHRoZSBwYXltZW50IHZhbHVlIHZhcmlhYmxlLiBUaGUgcmVhc29uIGZvciBnb2luZyB3aXRoIHRoaXMgdHJhbnNmb3JtYXRpb24gd2FzIHRvIHJlZHVjZSB0aGUgcmlnaHQgc2tld25lc3Mgb2YgdGhlIHZhcmlhYmxlIGFuZCBub3JtYWlsaXplIGl0LiBUaGUgdHJhbnNmb3JtYXRpb24gd2FzIHBlcmZvcm1lZCB1c2luZyB0aGUgbG9nKCkgZnVuY3Rpb24uIEJlZm9yZSBhbmQgQWZ0ZXIgaGlzdG9ncmFtcyBhbG9uZyB3aXRoIGEgZGlzdHJpYnV0aW9uIGN1cnZlIHdhcyBwbG90dGVkIHRvIHZpc3VhbGl6ZSB0aGUgdHJhbnNmb3JtYXRpb24uIAoKYGBge3J9Cmhpc3Qob3JkZXJzX2ZpbmFsJHBheW1lbnRfdmFsdWUsIG1haW4gPSAiSGlzdG9ncmFtIG9mIHBheW1lbnQgdmFsdWUiLCB4bGFiID0gIlBheW1lbnQgdmFsdWUiKQojbG9nYXJpdGhtaWMgdHJhbnNmb3JtYXRpb24gb2YgcGF5ZW1lbnQgdmFsdWUgdmFyaWFibGUgYXMgaXQgd2FzIHJpZ2h0IHNrZXdlZApvcmRlcnNfZmluYWwgPC0gb3JkZXJzX2ZpbmFsICU+JSBtdXRhdGUobG9nX3BheW1lbnRfdmFsdWUgPSBsb2cob3JkZXJzX2ZpbmFsJHBheW1lbnRfdmFsdWUpKQoKI2FmdGVyIHRoZSB0cmFuc2Zvcm1hdGlvbiwgdGhlIHZhbHVlcyBhcmUgbm9ybWFsaXplZC4KaCA8LSBoaXN0KG9yZGVyc19maW5hbCRsb2dfcGF5bWVudF92YWx1ZSwgYnJlYWtzID0gMjAsIHhsaW09YygyLDgpLCBtYWluID0gIkhpc3RvZ3JhbSBvZiBsb2cgIHRyYW5zZm9ybWVkIHBheW1lbnQgdmFsdWUiLCB4bGFiID0gIlBheW1lbnQgdmFsdWUgaW4gdGVybXMgb2YgbG9nIikKeGZpdCA8LSBzZXEobWluKG9yZGVyc19maW5hbCRsb2dfcGF5bWVudF92YWx1ZSksIG1heChvcmRlcnNfZmluYWwkbG9nX3BheW1lbnRfdmFsdWUpLCBsZW5ndGggPSBsZW5ndGgob3JkZXJzX2ZpbmFsJGxvZ19wYXltZW50X3ZhbHVlKSkgCnlmaXQgPC0gZG5vcm0oeGZpdCwgbWVhbiA9IG1lYW4ob3JkZXJzX2ZpbmFsJGxvZ19wYXltZW50X3ZhbHVlKSwgc2QgPSBzZChvcmRlcnNfZmluYWwkbG9nX3BheW1lbnRfdmFsdWUpKSAKeWZpdCA8LSB5Zml0ICogZGlmZihoJG1pZHNbMToyXSkgKiBsZW5ndGgob3JkZXJzX2ZpbmFsJGxvZ19wYXltZW50X3ZhbHVlKSAKbGluZXMoeGZpdCwgeWZpdCwgY29sID0gImJsdWUiLCBsd2QgPSAyKQpgYGAKCjxicj4KPGJyPgo=