Overview

This data has been uploaded to Kaggle and contains house sale prices for King County, WA. The data is for one year between May ’14 and May ’15. I’ll do some brief analysis and plotting in R.

https://www.kaggle.com/harlfoxem/housesalesprediction

Packages

options(warn=-1)
library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats
library(corrplot)
library(caret)
Loading required package: lattice

Attaching package: <U+393C><U+3E31>caret<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:purrr<U+393C><U+3E32>:

    lift
library(gbm)
Loading required package: survival

Attaching package: <U+393C><U+3E31>survival<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:caret<U+393C><U+3E32>:

    cluster

Loading required package: splines
Loading required package: parallel
Loaded gbm 2.1.3
library(e1071)
library(randomForest)
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.

Attaching package: <U+393C><U+3E31>randomForest<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:

    combine

The following object is masked from <U+393C><U+3E31>package:ggplot2<U+393C><U+3E32>:

    margin
library(lubridate)

Attaching package: <U+393C><U+3E31>lubridate<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:base<U+393C><U+3E32>:

    date
library(xgboost)

Attaching package: <U+393C><U+3E31>xgboost<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:

    slice

Data Input

Read the data from the .csv file into R.

df<-read.csv('kc_house_data.csv')
str(df)
'data.frame':   21613 obs. of  21 variables:
 $ id           : num  7.13e+09 6.41e+09 5.63e+09 2.49e+09 1.95e+09 ...
 $ date         : Factor w/ 372 levels "20140502T000000",..: 165 221 291 221 284 11 57 252 340 306 ...
 $ price        : num  221900 538000 180000 604000 510000 ...
 $ bedrooms     : int  3 3 2 4 3 4 3 3 3 3 ...
 $ bathrooms    : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
 $ sqft_living  : int  1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
 $ sqft_lot     : int  5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
 $ floors       : num  1 2 1 1 1 1 2 1 1 2 ...
 $ waterfront   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ condition    : int  3 3 3 5 3 3 3 3 3 3 ...
 $ grade        : int  7 7 6 7 8 11 7 7 7 7 ...
 $ sqft_above   : int  1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
 $ sqft_basement: int  0 400 0 910 0 1530 0 0 730 0 ...
 $ yr_built     : int  1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
 $ yr_renovated : int  0 1991 0 0 0 0 0 0 0 0 ...
 $ zipcode      : int  98178 98125 98028 98136 98074 98053 98003 98198 98146 98038 ...
 $ lat          : num  47.5 47.7 47.7 47.5 47.6 ...
 $ long         : num  -122 -122 -122 -122 -122 ...
 $ sqft_living15: int  1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
 $ sqft_lot15   : int  5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...

The dataset is quite small and manageable, along with having decently formatted headers.

Data Clean-up

The date_string variable needs to be converted into a proper data as well as changing a few variables to factors.

date_string <- substr(df$date,1,8)
df$date <- as.Date(date_string, format = '%Y%m%d')
df$waterfront<- as.factor(df$waterfront)
df$zipcode <- as.factor(df$zipcode)

Exploratory Data Analysis

Data Checking

A quick check shows no missing values.

apply(apply(df,2,is.na),2,sum)
           id          date         price      bedrooms     bathrooms   sqft_living      sqft_lot        floors    waterfront 
            0             0             0             0             0             0             0             0             0 
         view     condition         grade    sqft_above sqft_basement      yr_built  yr_renovated       zipcode           lat 
            0             0             0             0             0             0             0             0             0 
         long sqft_living15    sqft_lot15 
            0             0             0 

Exploratory Plots

I will plot out all the numeric variables using facet_wrap. There is a lot of great information to see in this output. All of this can be cleaned up and dived into further.

df %>%
  keep(is.numeric) %>% 
  gather() %>% 
  ggplot(aes(value)) +
  facet_wrap(~ key, scales = "free") +
  geom_density(fill= 'blue') +
  theme_bw()

Similar plotting for factor variables.

df %>%
  keep(is.factor) %>% 
  gather() %>% 
  ggplot(aes(value)) +
  facet_wrap(~ key, scales = "free") +
  geom_bar(fill = 'blue') + 
  theme_bw()
attributes are not identical across measure variables; they will be dropped

I will now plot the numeric variables against the price dependent variable using the same technique as before.

df %>%
  select(-id) %>%
  keep(is.numeric) %>% 
  gather(key,value,-price) %>%
  ggplot(aes(x=value,y=price)) +
  geom_jitter(color = 'light blue',alpha = .6) +
  geom_smooth(method = 'gam', color= 'dark blue', fill = 'grey', alpha = .2) +
  facet_wrap(~key, scales = 'free') + 
  theme_bw()

Factor variables plotted against price.

df %>%
  select(waterfront,price,zipcode) %>%
  gather(key,value,-price) %>%
  ggplot(aes(x=value,y=price)) +
  geom_boxplot(color= 'dark blue', fill = 'grey', alpha = .2) +
  facet_wrap(~key, scales = 'free') + 
  theme_bw() +
  coord_flip()
attributes are not identical across measure variables; they will be dropped

Outliers

The outlier in the bedroom field seems to be an actual mistake and not a real value. I will replace it with 3 as it seems to be reasonable based on the distributions.

df %>% 
  ggplot(aes(x=bedrooms, y=bathrooms)) +
  geom_count(color = 'dark green')

df[df$bedrooms == 33,'bedrooms'] <- 3

Variable Correlation

To see how our predictors relate I will do a correlation matrix. Not an extreme amount of correlation between the numeric variable. We do see SHOT_DIST with strong negative correlation with the players FG%, typical three point shooters would be expected to have lower accuracy.

df %>%
  select (-date) %>%
  apply(2,as.character) %>%
  apply(2,as.numeric) %>%
  cor(use='complete',method='pearson') %>%
  corrplot(type='lower', diag = F)

Modeling

Feature Creation

The time of the year may have an impact on sale price, a few dummy variables can be created and tested.

df$year_2015 <- as.factor(ifelse(year(df$date) == '2015',1,0))
df$winter <- as.factor(ifelse(month(df$date) %in% c(11,12,1),1,0))
df$summer <- as.factor(ifelse(month(df$date) %in% c(5,6,7),1,0))
df$fall <- as.factor(ifelse(month(df$date) %in% c(8,9,10),1,0))

The time of the renovation is also interesting and can be broken out as well.

df %>%
  ggplot(aes(x=yr_renovated)) +
  geom_area(stat='bin', fill = 'light blue') +
  xlim(1900,2017)

  
df$renovated_t_f <- as.factor(ifelse(df$yr_renovated == 0, 0,1))
df$renovated_b_1950 <- as.factor(ifelse(df$yr_renovated > 0 & df$yr_renovated <= 1950 , 1, 0))
df$renovated_b_1975 <- as.factor(ifelse(df$yr_renovated > 1950 & df$yr_renovated <= 1975 , 1, 0))
df$renovated_b_2000 <- as.factor(ifelse(df$yr_renovated > 1975 & df$yr_renovated <= 2000 , 1, 0))
df$renovated_a_2000 <- as.factor(ifelse(df$yr_renovated > 2000 , 1, 0))

A dummy variable may be more appropriate than the actual basement parameters due to the low occurrence of basements

df$basement_t_f <- as.factor(ifelse(df$sqft_basement == 0, 0, 1))

There is a lot of interesting information using the zip code, for now a few tiers can be created based on their deviation from the average.

df %>%
  group_by(zipcode) %>%
  summarize(count = n(), zip_ave = mean(price)) %>%
  mutate(zip_diff = zip_ave - mean(zip_ave)) %>%
  mutate(zip_difference = ifelse(zip_diff > 0, 'Below Average' , 'Above Average')) %>%
  arrange((zip_diff)) %>%
  #filter(count > 20) %>%
  ggplot(aes( x = factor(zipcode, levels =zipcode), y = zip_diff)) + 
  geom_bar(stat = 'identity', aes(fill=zip_difference), width = .6) + 
  coord_flip() + 
  labs(x = 'Zip Code', y = 'Percentage Point Difference', title = 'Price Averages per Zip Code') + 
  theme_bw()

df$tier1_zip <-ifelse(df$zipcode %in% c(98039),1,0)
df$tier2_zip <-ifelse(df$zipcode %in% c(98004,98040,98112,98102,98109,98105,98006,98119),1,0)
df$tier4_zip <-ifelse(df$zipcode %in% c(98002,98168,98032,98001,98148),1,0)
df$tier3_zip <- 1 - df$tier1_zip - df$tier2_zip - df$tier4_zip

Model Tuning

The data can be split into training and test using the caret package. We will hold out 30% to test our final model on while doing cross-validation on the other 70% to discover the strongest modeling type.

set.seed(57)
train_flag <- createDataPartition(df$price, p = .7, list = FALSE, times = 1)
df_train <- df[train_flag,]
df_test <- df[-train_flag,]

The datasets can be narrowed down to only include the columns to be used as predictors.

df_train<- subset(df_train, select = c(price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view,
                                       condition, grade, sqft_above, basement_t_f, yr_built, lat, long, sqft_living15, sqft_lot15,
                                       year_2015, winter, summer, fall, renovated_b_1950,
                                       renovated_b_1975, renovated_b_2000, renovated_a_2000, tier1_zip, tier2_zip,  tier4_zip ) )
df_test<- subset(df_train, select = c(price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view,
                                       condition, grade, sqft_above, basement_t_f, yr_built, lat, long, sqft_living15, sqft_lot15,
                                       year_2015, winter, summer, fall, renovated_b_1950,
                                       renovated_b_1975, renovated_b_2000, renovated_a_2000, tier1_zip, tier2_zip, tier4_zip ) )

Parameters tuned for linear regrestion and XG Boost Linear models.

fitControl <- trainControl( method = "repeatedcv",
                            number = 5,
                            repeats = 2)
lr_fit <- train(price ~ ., data = as.data.frame(df_train),
                 method = 'lm',
                 trControl = fitControl)
xgb_fit <- train(price ~ ., data = as.data.frame(df_train),
                method = 'xgbLinear',
                trControl = fitControl)

Variable Importance

The grade feature was quite popular for XGB, I would need to research what this field means. It is perhaps the quality of the home which would make sense.

ggplot(varImp(xgb_fit)) + 
  geom_col() + 
  coord_flip()

Model Selection

XG Boost did much better than the linear regression.

summary(resamples(list(LR = lr_fit, XGB = xgb_fit)))

Call:
summary.resamples(object = resamples(list(LR = lr_fit, XGB = xgb_fit)))

Models: LR, XGB 
Number of resamples: 10 

RMSE 
        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
LR  169596.1 173920.6 179855.0 181637.9 184371.0 200397.2    0
XGB 109458.8 123821.3 132609.7 132231.7 140281.9 148709.0    0

Rsquared 
         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
LR  0.7377668 0.7441515 0.7550750 0.7552494 0.7641244 0.7771353    0
XGB 0.8324944 0.8543750 0.8697135 0.8706166 0.8873433 0.9029150    0

For better predictions, more feature engineering would be needed. Lat/Long data could perhaps be treated better. The predictive power of other model types also can be investigated, though it would be hard to beat XGB. As there is no competition for this dataset, I will chose to end after these basic steps.

LS0tDQp0aXRsZTogIkVzdGltYXRpbmcgSG91c2UgU2FsZXMiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQojT3ZlcnZpZXcNCg0KVGhpcyBkYXRhIGhhcyBiZWVuIHVwbG9hZGVkIHRvIEthZ2dsZSBhbmQgY29udGFpbnMgaG91c2Ugc2FsZSBwcmljZXMgZm9yIEtpbmcgQ291bnR5LCBXQS4gVGhlIGRhdGEgaXMgZm9yIG9uZSB5ZWFyIGJldHdlZW4gTWF5ICcxNCBhbmQgTWF5ICcxNS4gSSdsbCBkbyBzb21lIGJyaWVmIGFuYWx5c2lzIGFuZCBwbG90dGluZyBpbiBSLg0KDQpodHRwczovL3d3dy5rYWdnbGUuY29tL2hhcmxmb3hlbS9ob3VzZXNhbGVzcHJlZGljdGlvbg0KDQojI1BhY2thZ2VzDQoNCmBgYHtyfQ0Kb3B0aW9ucyh3YXJuPS0xKQ0KDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoY29ycnBsb3QpDQpsaWJyYXJ5KGNhcmV0KQ0KbGlicmFyeShnYm0pDQpsaWJyYXJ5KGUxMDcxKQ0KbGlicmFyeShyYW5kb21Gb3Jlc3QpDQpsaWJyYXJ5KGx1YnJpZGF0ZSkNCmxpYnJhcnkoeGdib29zdCkNCmBgYA0KDQojI0RhdGEgSW5wdXQNCg0KUmVhZCB0aGUgZGF0YSBmcm9tIHRoZSAuY3N2IGZpbGUgaW50byBSLg0KYGBge3J9DQpkZjwtcmVhZC5jc3YoJ2tjX2hvdXNlX2RhdGEuY3N2JykNCg0Kc3RyKGRmKQ0KYGBgDQpUaGUgZGF0YXNldCBpcyBxdWl0ZSBzbWFsbCBhbmQgbWFuYWdlYWJsZSwgYWxvbmcgd2l0aCBoYXZpbmcgZGVjZW50bHkgZm9ybWF0dGVkIGhlYWRlcnMuDQoNCiMjRGF0YSBDbGVhbi11cA0KVGhlIGRhdGVfc3RyaW5nIHZhcmlhYmxlIG5lZWRzIHRvIGJlIGNvbnZlcnRlZCBpbnRvIGEgcHJvcGVyIGRhdGEgYXMgd2VsbCBhcyBjaGFuZ2luZyBhIGZldyB2YXJpYWJsZXMgdG8gZmFjdG9ycy4NCmBgYHtyfQ0KZGF0ZV9zdHJpbmcgPC0gc3Vic3RyKGRmJGRhdGUsMSw4KQ0KZGYkZGF0ZSA8LSBhcy5EYXRlKGRhdGVfc3RyaW5nLCBmb3JtYXQgPSAnJVklbSVkJykNCg0KZGYkd2F0ZXJmcm9udDwtIGFzLmZhY3RvcihkZiR3YXRlcmZyb250KQ0KZGYkemlwY29kZSA8LSBhcy5mYWN0b3IoZGYkemlwY29kZSkNCg0KDQpgYGANCg0KI0V4cGxvcmF0b3J5IERhdGEgQW5hbHlzaXMNCg0KIyNEYXRhIENoZWNraW5nDQpBIHF1aWNrIGNoZWNrIHNob3dzIG5vIG1pc3NpbmcgdmFsdWVzLg0KYGBge3J9DQphcHBseShhcHBseShkZiwyLGlzLm5hKSwyLHN1bSkNCmBgYA0KIyNFeHBsb3JhdG9yeSBQbG90cw0KSSB3aWxsIHBsb3Qgb3V0IGFsbCB0aGUgbnVtZXJpYyB2YXJpYWJsZXMgdXNpbmcgZmFjZXRfd3JhcC4gVGhlcmUgaXMgYSBsb3Qgb2YgZ3JlYXQgaW5mb3JtYXRpb24gdG8gc2VlIGluIHRoaXMgb3V0cHV0LiBBbGwgb2YgdGhpcyBjYW4gYmUgY2xlYW5lZCB1cCBhbmQgZGl2ZWQgaW50byBmdXJ0aGVyLg0KYGBge3J9DQpkZiAlPiUNCiAga2VlcChpcy5udW1lcmljKSAlPiUgDQogIGdhdGhlcigpICU+JSANCiAgZ2dwbG90KGFlcyh2YWx1ZSkpICsNCiAgZmFjZXRfd3JhcCh+IGtleSwgc2NhbGVzID0gImZyZWUiKSArDQogIGdlb21fZGVuc2l0eShmaWxsPSAnYmx1ZScpICsNCiAgdGhlbWVfYncoKQ0KYGBgDQoNClNpbWlsYXIgcGxvdHRpbmcgZm9yIGZhY3RvciB2YXJpYWJsZXMuDQpgYGB7cn0NCmRmICU+JQ0KICBrZWVwKGlzLmZhY3RvcikgJT4lIA0KICBnYXRoZXIoKSAlPiUgDQogIGdncGxvdChhZXModmFsdWUpKSArDQogIGZhY2V0X3dyYXAofiBrZXksIHNjYWxlcyA9ICJmcmVlIikgKw0KICBnZW9tX2JhcihmaWxsID0gJ2JsdWUnKSArIA0KICB0aGVtZV9idygpDQpgYGANCg0KSSB3aWxsIG5vdyBwbG90IHRoZSBudW1lcmljIHZhcmlhYmxlcyBhZ2FpbnN0IHRoZSBwcmljZSBkZXBlbmRlbnQgdmFyaWFibGUgdXNpbmcgdGhlIHNhbWUgdGVjaG5pcXVlIGFzIGJlZm9yZS4NCmBgYHtyfQ0KZGYgJT4lDQogIHNlbGVjdCgtaWQpICU+JQ0KICBrZWVwKGlzLm51bWVyaWMpICU+JSANCiAgZ2F0aGVyKGtleSx2YWx1ZSwtcHJpY2UpICU+JQ0KICBnZ3Bsb3QoYWVzKHg9dmFsdWUseT1wcmljZSkpICsNCiAgZ2VvbV9qaXR0ZXIoY29sb3IgPSAnbGlnaHQgYmx1ZScsYWxwaGEgPSAuNikgKw0KICBnZW9tX3Ntb290aChtZXRob2QgPSAnZ2FtJywgY29sb3I9ICdkYXJrIGJsdWUnLCBmaWxsID0gJ2dyZXknLCBhbHBoYSA9IC4yKSArDQogIGZhY2V0X3dyYXAofmtleSwgc2NhbGVzID0gJ2ZyZWUnKSArIA0KICB0aGVtZV9idygpDQpgYGANCg0KRmFjdG9yIHZhcmlhYmxlcyBwbG90dGVkIGFnYWluc3QgcHJpY2UuDQpgYGB7cn0NCmRmICU+JQ0KICBzZWxlY3Qod2F0ZXJmcm9udCxwcmljZSx6aXBjb2RlKSAlPiUNCiAgZ2F0aGVyKGtleSx2YWx1ZSwtcHJpY2UpICU+JQ0KICBnZ3Bsb3QoYWVzKHg9dmFsdWUseT1wcmljZSkpICsNCiAgZ2VvbV9ib3hwbG90KGNvbG9yPSAnZGFyayBibHVlJywgZmlsbCA9ICdncmV5JywgYWxwaGEgPSAuMikgKw0KICBmYWNldF93cmFwKH5rZXksIHNjYWxlcyA9ICdmcmVlJykgKyANCiAgdGhlbWVfYncoKSArDQogIGNvb3JkX2ZsaXAoKQ0KYGBgDQoNCiMjT3V0bGllcnMNClRoZSBvdXRsaWVyIGluIHRoZSBiZWRyb29tIGZpZWxkIHNlZW1zIHRvIGJlIGFuIGFjdHVhbCBtaXN0YWtlIGFuZCBub3QgYSByZWFsIHZhbHVlLiBJIHdpbGwgcmVwbGFjZSBpdCB3aXRoIDMgYXMgaXQgc2VlbXMgdG8gYmUgcmVhc29uYWJsZSBiYXNlZCBvbiB0aGUgZGlzdHJpYnV0aW9ucy4NCmBgYHtyfQ0KZGYgJT4lIA0KICBnZ3Bsb3QoYWVzKHg9YmVkcm9vbXMsIHk9YmF0aHJvb21zKSkgKw0KICBnZW9tX2NvdW50KGNvbG9yID0gJ2RhcmsgZ3JlZW4nKQ0KDQoNCmRmW2RmJGJlZHJvb21zID09IDMzLCdiZWRyb29tcyddIDwtIDMNCg0KDQpgYGANCg0KIyNWYXJpYWJsZSBDb3JyZWxhdGlvbiAgDQoNClRvIHNlZSBob3cgb3VyIHByZWRpY3RvcnMgcmVsYXRlIEkgd2lsbCBkbyBhIGNvcnJlbGF0aW9uIG1hdHJpeC4gTm90IGFuIGV4dHJlbWUgYW1vdW50IG9mIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlIG51bWVyaWMgdmFyaWFibGUuIFdlIGRvIHNlZSBTSE9UX0RJU1Qgd2l0aCBzdHJvbmcgbmVnYXRpdmUgY29ycmVsYXRpb24gd2l0aCB0aGUgcGxheWVycyBGRyUsIHR5cGljYWwgdGhyZWUgcG9pbnQgc2hvb3RlcnMgd291bGQgYmUgZXhwZWN0ZWQgdG8gaGF2ZSBsb3dlciBhY2N1cmFjeS4NCmBgYHtyfQ0KZGYgJT4lDQogIHNlbGVjdCAoLWRhdGUpICU+JQ0KICBhcHBseSgyLGFzLmNoYXJhY3RlcikgJT4lDQogIGFwcGx5KDIsYXMubnVtZXJpYykgJT4lDQogIGNvcih1c2U9J2NvbXBsZXRlJyxtZXRob2Q9J3BlYXJzb24nKSAlPiUNCiAgY29ycnBsb3QodHlwZT0nbG93ZXInLCBkaWFnID0gRikNCmBgYA0KI01vZGVsaW5nDQojI0ZlYXR1cmUgQ3JlYXRpb24NClRoZSB0aW1lIG9mIHRoZSB5ZWFyIG1heSBoYXZlIGFuIGltcGFjdCBvbiBzYWxlIHByaWNlLCBhIGZldyBkdW1teSB2YXJpYWJsZXMgY2FuIGJlIGNyZWF0ZWQgYW5kIHRlc3RlZC4NCmBgYHtyfQ0KZGYkeWVhcl8yMDE1IDwtIGFzLmZhY3RvcihpZmVsc2UoeWVhcihkZiRkYXRlKSA9PSAnMjAxNScsMSwwKSkNCmRmJHdpbnRlciA8LSBhcy5mYWN0b3IoaWZlbHNlKG1vbnRoKGRmJGRhdGUpICVpbiUgYygxMSwxMiwxKSwxLDApKQ0KZGYkc3VtbWVyIDwtIGFzLmZhY3RvcihpZmVsc2UobW9udGgoZGYkZGF0ZSkgJWluJSBjKDUsNiw3KSwxLDApKQ0KZGYkZmFsbCA8LSBhcy5mYWN0b3IoaWZlbHNlKG1vbnRoKGRmJGRhdGUpICVpbiUgYyg4LDksMTApLDEsMCkpDQpgYGANClRoZSB0aW1lIG9mIHRoZSByZW5vdmF0aW9uIGlzIGFsc28gaW50ZXJlc3RpbmcgYW5kIGNhbiBiZSBicm9rZW4gb3V0IGFzIHdlbGwuDQpgYGB7cn0NCmRmICU+JQ0KICBnZ3Bsb3QoYWVzKHg9eXJfcmVub3ZhdGVkKSkgKw0KICBnZW9tX2FyZWEoc3RhdD0nYmluJywgZmlsbCA9ICdsaWdodCBibHVlJykgKw0KICB4bGltKDE5MDAsMjAxNykNCiAgDQpkZiRyZW5vdmF0ZWRfdF9mIDwtIGFzLmZhY3RvcihpZmVsc2UoZGYkeXJfcmVub3ZhdGVkID09IDAsIDAsMSkpDQpkZiRyZW5vdmF0ZWRfYl8xOTUwIDwtIGFzLmZhY3RvcihpZmVsc2UoZGYkeXJfcmVub3ZhdGVkID4gMCAmIGRmJHlyX3Jlbm92YXRlZCA8PSAxOTUwICwgMSwgMCkpDQpkZiRyZW5vdmF0ZWRfYl8xOTc1IDwtIGFzLmZhY3RvcihpZmVsc2UoZGYkeXJfcmVub3ZhdGVkID4gMTk1MCAmIGRmJHlyX3Jlbm92YXRlZCA8PSAxOTc1ICwgMSwgMCkpDQpkZiRyZW5vdmF0ZWRfYl8yMDAwIDwtIGFzLmZhY3RvcihpZmVsc2UoZGYkeXJfcmVub3ZhdGVkID4gMTk3NSAmIGRmJHlyX3Jlbm92YXRlZCA8PSAyMDAwICwgMSwgMCkpDQpkZiRyZW5vdmF0ZWRfYV8yMDAwIDwtIGFzLmZhY3RvcihpZmVsc2UoZGYkeXJfcmVub3ZhdGVkID4gMjAwMCAsIDEsIDApKQ0KYGBgDQpBIGR1bW15IHZhcmlhYmxlIG1heSBiZSBtb3JlIGFwcHJvcHJpYXRlIHRoYW4gdGhlIGFjdHVhbCBiYXNlbWVudCBwYXJhbWV0ZXJzIGR1ZSB0byB0aGUgbG93IG9jY3VycmVuY2Ugb2YgYmFzZW1lbnRzDQpgYGB7cn0NCmRmJGJhc2VtZW50X3RfZiA8LSBhcy5mYWN0b3IoaWZlbHNlKGRmJHNxZnRfYmFzZW1lbnQgPT0gMCwgMCwgMSkpDQpgYGANClRoZXJlIGlzIGEgbG90IG9mIGludGVyZXN0aW5nIGluZm9ybWF0aW9uIHVzaW5nIHRoZSB6aXAgY29kZSwgZm9yIG5vdyBhIGZldyB0aWVycyBjYW4gYmUgY3JlYXRlZCBiYXNlZCBvbiB0aGVpciBkZXZpYXRpb24gZnJvbSB0aGUgYXZlcmFnZS4NCmBgYHtyfQ0KZGYgJT4lDQogIGdyb3VwX2J5KHppcGNvZGUpICU+JQ0KICBzdW1tYXJpemUoY291bnQgPSBuKCksIHppcF9hdmUgPSBtZWFuKHByaWNlKSkgJT4lDQogIG11dGF0ZSh6aXBfZGlmZiA9IHppcF9hdmUgLSBtZWFuKHppcF9hdmUpKSAlPiUNCiAgbXV0YXRlKHppcF9kaWZmZXJlbmNlID0gaWZlbHNlKHppcF9kaWZmID4gMCwgJ0JlbG93IEF2ZXJhZ2UnICwgJ0Fib3ZlIEF2ZXJhZ2UnKSkgJT4lDQogIGFycmFuZ2UoKHppcF9kaWZmKSkgJT4lDQogICNmaWx0ZXIoY291bnQgPiAyMCkgJT4lDQogIGdncGxvdChhZXMoIHggPSBmYWN0b3IoemlwY29kZSwgbGV2ZWxzID16aXBjb2RlKSwgeSA9IHppcF9kaWZmKSkgKyANCiAgZ2VvbV9iYXIoc3RhdCA9ICdpZGVudGl0eScsIGFlcyhmaWxsPXppcF9kaWZmZXJlbmNlKSwgd2lkdGggPSAuNikgKyANCiAgY29vcmRfZmxpcCgpICsgDQogIGxhYnMoeCA9ICdaaXAgQ29kZScsIHkgPSAnUGVyY2VudGFnZSBQb2ludCBEaWZmZXJlbmNlJywgdGl0bGUgPSAnUHJpY2UgQXZlcmFnZXMgcGVyIFppcCBDb2RlJykgKyANCiAgdGhlbWVfYncoKQ0KDQpkZiR0aWVyMV96aXAgPC1pZmVsc2UoZGYkemlwY29kZSAlaW4lIGMoOTgwMzkpLDEsMCkNCmRmJHRpZXIyX3ppcCA8LWlmZWxzZShkZiR6aXBjb2RlICVpbiUgYyg5ODAwNCw5ODA0MCw5ODExMiw5ODEwMiw5ODEwOSw5ODEwNSw5ODAwNiw5ODExOSksMSwwKQ0KZGYkdGllcjRfemlwIDwtaWZlbHNlKGRmJHppcGNvZGUgJWluJSBjKDk4MDAyLDk4MTY4LDk4MDMyLDk4MDAxLDk4MTQ4KSwxLDApDQpkZiR0aWVyM196aXAgPC0gMSAtIGRmJHRpZXIxX3ppcCAtIGRmJHRpZXIyX3ppcCAtIGRmJHRpZXI0X3ppcA0KYGBgDQoNCiMjTW9kZWwgVHVuaW5nDQpUaGUgZGF0YSBjYW4gYmUgc3BsaXQgaW50byB0cmFpbmluZyBhbmQgdGVzdCB1c2luZyB0aGUgY2FyZXQgcGFja2FnZS4gV2Ugd2lsbCBob2xkIG91dCAzMCUgdG8gdGVzdCBvdXIgZmluYWwgbW9kZWwgb24gd2hpbGUgZG9pbmcgY3Jvc3MtdmFsaWRhdGlvbiBvbiB0aGUgb3RoZXIgNzAlIHRvIGRpc2NvdmVyIHRoZSBzdHJvbmdlc3QgbW9kZWxpbmcgdHlwZS4NCmBgYHtyfQ0Kc2V0LnNlZWQoNTcpDQp0cmFpbl9mbGFnIDwtIGNyZWF0ZURhdGFQYXJ0aXRpb24oZGYkcHJpY2UsIHAgPSAuNywgbGlzdCA9IEZBTFNFLCB0aW1lcyA9IDEpDQpkZl90cmFpbiA8LSBkZlt0cmFpbl9mbGFnLF0NCmRmX3Rlc3QgPC0gZGZbLXRyYWluX2ZsYWcsXQ0KYGBgDQpUaGUgZGF0YXNldHMgY2FuIGJlIG5hcnJvd2VkIGRvd24gdG8gb25seSBpbmNsdWRlIHRoZSBjb2x1bW5zIHRvIGJlIHVzZWQgYXMgcHJlZGljdG9ycy4NCmBgYHtyfQ0KZGZfdHJhaW48LSBzdWJzZXQoZGZfdHJhaW4sIHNlbGVjdCA9IGMocHJpY2UsIGJlZHJvb21zLCBiYXRocm9vbXMsIHNxZnRfbGl2aW5nLCBzcWZ0X2xvdCwgZmxvb3JzLCB3YXRlcmZyb250LCB2aWV3LA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgY29uZGl0aW9uLCBncmFkZSwgc3FmdF9hYm92ZSwgYmFzZW1lbnRfdF9mLCB5cl9idWlsdCwgbGF0LCBsb25nLCBzcWZ0X2xpdmluZzE1LCBzcWZ0X2xvdDE1LA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgeWVhcl8yMDE1LCB3aW50ZXIsIHN1bW1lciwgZmFsbCwgcmVub3ZhdGVkX2JfMTk1MCwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHJlbm92YXRlZF9iXzE5NzUsIHJlbm92YXRlZF9iXzIwMDAsIHJlbm92YXRlZF9hXzIwMDAsIHRpZXIxX3ppcCwgdGllcjJfemlwLCAgdGllcjRfemlwICkgKQ0KDQpkZl90ZXN0PC0gc3Vic2V0KGRmX3RyYWluLCBzZWxlY3QgPSBjKHByaWNlLCBiZWRyb29tcywgYmF0aHJvb21zLCBzcWZ0X2xpdmluZywgc3FmdF9sb3QsIGZsb29ycywgd2F0ZXJmcm9udCwgdmlldywNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNvbmRpdGlvbiwgZ3JhZGUsIHNxZnRfYWJvdmUsIGJhc2VtZW50X3RfZiwgeXJfYnVpbHQsIGxhdCwgbG9uZywgc3FmdF9saXZpbmcxNSwgc3FmdF9sb3QxNSwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHllYXJfMjAxNSwgd2ludGVyLCBzdW1tZXIsIGZhbGwsIHJlbm92YXRlZF9iXzE5NTAsDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICByZW5vdmF0ZWRfYl8xOTc1LCByZW5vdmF0ZWRfYl8yMDAwLCByZW5vdmF0ZWRfYV8yMDAwLCB0aWVyMV96aXAsIHRpZXIyX3ppcCwgdGllcjRfemlwICkgKQ0KYGBgDQpQYXJhbWV0ZXJzIHR1bmVkIGZvciBsaW5lYXIgcmVncmVzdGlvbiBhbmQgWEcgQm9vc3QgTGluZWFyIG1vZGVscy4NCmBgYHtyfQ0KZml0Q29udHJvbCA8LSB0cmFpbkNvbnRyb2woIG1ldGhvZCA9ICJyZXBlYXRlZGN2IiwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICBudW1iZXIgPSA1LA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgIHJlcGVhdHMgPSAyKQ0KDQpscl9maXQgPC0gdHJhaW4ocHJpY2UgfiAuLCBkYXRhID0gYXMuZGF0YS5mcmFtZShkZl90cmFpbiksDQogICAgICAgICAgICAgICAgIG1ldGhvZCA9ICdsbScsDQogICAgICAgICAgICAgICAgIHRyQ29udHJvbCA9IGZpdENvbnRyb2wpDQoNCnhnYl9maXQgPC0gdHJhaW4ocHJpY2UgfiAuLCBkYXRhID0gYXMuZGF0YS5mcmFtZShkZl90cmFpbiksDQogICAgICAgICAgICAgICAgbWV0aG9kID0gJ3hnYkxpbmVhcicsDQogICAgICAgICAgICAgICAgdHJDb250cm9sID0gZml0Q29udHJvbCkNCmBgYA0KIyNWYXJpYWJsZSBJbXBvcnRhbmNlDQoNClRoZSBncmFkZSBmZWF0dXJlIHdhcyBxdWl0ZSBwb3B1bGFyIGZvciBYR0IsIEkgd291bGQgbmVlZCB0byByZXNlYXJjaCB3aGF0IHRoaXMgZmllbGQgbWVhbnMuIEl0IGlzIHBlcmhhcHMgdGhlIHF1YWxpdHkgb2YgdGhlIGhvbWUgd2hpY2ggd291bGQgbWFrZSBzZW5zZS4NCmBgYHtyfQ0KZ2dwbG90KHZhckltcCh4Z2JfZml0KSkgKyANCiAgZ2VvbV9jb2woKSArIA0KICBjb29yZF9mbGlwKCkNCmBgYA0KDQojI01vZGVsIFNlbGVjdGlvbg0KWEcgQm9vc3QgZGlkIG11Y2ggYmV0dGVyIHRoYW4gdGhlIGxpbmVhciByZWdyZXNzaW9uLg0KYGBge3J9DQpzdW1tYXJ5KHJlc2FtcGxlcyhsaXN0KExSID0gbHJfZml0LCBYR0IgPSB4Z2JfZml0KSkpDQpgYGANCg0KRm9yIGJldHRlciBwcmVkaWN0aW9ucywgbW9yZSBmZWF0dXJlIGVuZ2luZWVyaW5nIHdvdWxkIGJlIG5lZWRlZC4gTGF0L0xvbmcgZGF0YSBjb3VsZCBwZXJoYXBzIGJlIHRyZWF0ZWQgYmV0dGVyLiBUaGUgcHJlZGljdGl2ZSBwb3dlciBvZiBvdGhlciBtb2RlbCB0eXBlcyBhbHNvIGNhbiBiZSBpbnZlc3RpZ2F0ZWQsIHRob3VnaCBpdCB3b3VsZCBiZSBoYXJkIHRvIGJlYXQgWEdCLiBBcyB0aGVyZSBpcyBubyBjb21wZXRpdGlvbiBmb3IgdGhpcyBkYXRhc2V0LCBJIHdpbGwgY2hvc2UgdG8gZW5kIGFmdGVyIHRoZXNlIGJhc2ljIHN0ZXBzLg0K