title: “GooglePlayStore-Analysis” author: “Khawla-BanyDOmi” date: “29/12/2020” output: pdf_document: default html_document: default —

#1.Loading Data

gg = read.csv("googleplaystore.csv")
review = read.csv("googleplaystore_user_reviews.csv")
library(e1071)
library(tidyverse)
review1 = review %>% select(App, Translated_Review)
head(review1)
knitr::kable(head(review1))

App Translated_Review
10 Best Foods for You I like eat delicious food. That’s I’m cooking food myself, case “10 Best Foods” helps lot, also “Best Before (Shelf Life)”
10 Best Foods for You This help eating healthy exercise regular basis
10 Best Foods for You nan
10 Best Foods for You Works great especially going grocery store
10 Best Foods for You Best idea us
10 Best Foods for You Best way

head(review)
head(gg)

#2.Data Preprocessing

str(gg)
'data.frame':   10841 obs. of  13 variables:
 $ App           : Factor w/ 9660 levels "- Free Comics - Comic Apps",..: 7206 2551 8970 8089 7272 7103 8149 5568 4926 5806 ...
 $ Category      : Factor w/ 34 levels "1.9","ART_AND_DESIGN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Rating        : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
 $ Reviews       : Factor w/ 6002 levels "0","1","10","100",..: 1183 5924 5681 1947 5924 1310 1464 3385 816 485 ...
 $ Size          : Factor w/ 462 levels "1,000+","1.0M",..: 55 30 368 102 64 222 55 118 146 120 ...
 $ Installs      : Factor w/ 22 levels "0","0+","1,000,000,000+",..: 8 20 13 16 11 17 17 4 4 8 ...
 $ Type          : Factor w/ 4 levels "0","Free","NaN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Price         : Factor w/ 93 levels "$0.99","$1.00",..: 92 92 92 92 92 92 92 92 92 92 ...
 $ Content.Rating: Factor w/ 7 levels "","Adults only 18+",..: 3 3 3 6 3 3 3 3 3 3 ...
 $ Genres        : Factor w/ 120 levels "Action","Action;Action & Adventure",..: 10 13 10 10 12 10 10 10 10 12 ...
 $ Last.Updated  : Factor w/ 1378 levels "1.0.19","April 1, 2016",..: 562 482 117 825 757 901 76 726 1317 670 ...
 $ Current.Ver   : Factor w/ 2834 levels "","0.0.0.2","0.0.1",..: 122 1020 468 2827 280 116 280 2393 1457 1431 ...
 $ Android.Ver   : Factor w/ 35 levels "","1.0 and up",..: 17 17 17 20 22 10 17 20 12 17 ...

There are a lot of factor variables which should actually be converted to numeric variables.

##2.1Converting variable types

library(lubridate)
library(tidyverse)
library(dplyr)
gg.new <- gg %>%
  mutate(
    # Eliminate "+" to transform Installs to numeric variable
   # Installs = gsub("\\+", "", as.character(Installs)),
   # Installs = as.numeric(gsub(",", "", Installs)),
    # Eliminate "M" to transform Size to numeric variable
    Size = gsub("M", "", Size),
    # For cells with k, divide it by 1024, since 1024kB = 1MB, the unit for size is MB
    Size = ifelse(grepl("k", Size),as.numeric(gsub("k", "", Size))/1024, as.numeric(Size)),
    # Transform reviews to numeric
    Reviews = as.numeric(Reviews),
    # Remove "$" from Price to transform it to numeric
    Price = as.numeric(gsub("\\$", "", as.character(Price))),
    # Convert Last Updated to date format
    Last.Updated = mdy(Last.Updated),
    # Replace "Varies with device" to NA since it is unknown
    Min.Android.Ver = gsub("Varies with device", NA, Android.Ver),
    # Keep only version number to 1 decimal as it's most representatice
    Min.Android.Ver = as.numeric(substr(Min.Android.Ver, start = 1, stop = 3)),
    # Drop old Android version column
    Android.Ver = NULL
  ) %>% 
  filter(
    # Two apps had type as 0 or NA, they will be removed 
    Type %in% c("Free", "Paid")
 )
Problem with `mutate()` input `Size`.
ℹ NAs introduced by coercion
ℹ Input `Size` is `ifelse(...)`.NAs introduced by coercionProblem with `mutate()` input `Size`.
ℹ NAs introduced by coercion
ℹ Input `Size` is `ifelse(...)`.NAs introduced by coercionProblem with `mutate()` input `Price`.
ℹ NAs introduced by coercion
ℹ Input `Price` is `as.numeric(gsub("\\$", "", as.character(Price)))`.NAs introduced by coercionProblem with `mutate()` input `Last.Updated`.
ℹ  1 failed to parse.
ℹ Input `Last.Updated` is `mdy(Last.Updated)`. 1 failed to parse.
str(gg.new)
'data.frame':   10839 obs. of  13 variables:
 $ App            : Factor w/ 9660 levels "- Free Comics - Comic Apps",..: 7206 2551 8970 8089 7272 7103 8149 5568 4926 5806 ...
 $ Category       : Factor w/ 34 levels "1.9","ART_AND_DESIGN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Rating         : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
 $ Reviews        : num  1183 5924 5681 1947 5924 ...
 $ Size           : num  19 14 8.7 25 2.8 5.6 19 29 33 3.1 ...
 $ Installs       : Factor w/ 22 levels "0","0+","1,000,000,000+",..: 8 20 13 16 11 17 17 4 4 8 ...
 $ Type           : Factor w/ 4 levels "0","Free","NaN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Price          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Content.Rating : Factor w/ 7 levels "","Adults only 18+",..: 3 3 3 6 3 3 3 3 3 3 ...
 $ Genres         : Factor w/ 120 levels "Action","Action;Action & Adventure",..: 10 13 10 10 12 10 10 10 10 12 ...
 $ Last.Updated   : Date, format: "2018-01-07" "2018-01-15" ...
 $ Current.Ver    : Factor w/ 2834 levels "","0.0.0.2","0.0.1",..: 122 1020 468 2827 280 116 280 2393 1457 1431 ...
 $ Min.Android.Ver: num  4 4 4 4.2 4.4 2.3 4 4.2 3 4 ...
options(scipen=999)
table(gg.new$Installs)

             0             0+ 1,000,000,000+     1,000,000+         1,000+ 
             0             14             58           1579            907 
            1+    10,000,000+        10,000+            10+   100,000,000+ 
            67           1252           1054            386            409 
      100,000+           100+     5,000,000+         5,000+             5+ 
          1169            719            752            477             82 
   50,000,000+        50,000+            50+   500,000,000+       500,000+ 
           289            479            205             72            539 
          500+           Free 
           330              0 
gg.new$Installs%>%str()
 Factor w/ 22 levels "0","0+","1,000,000,000+",..: 8 20 13 16 11 17 17 4 4 8 ...
gg.new %>% filter(Installs == "500,000") %>% print
library(highcharter)
gg.new %>% select(-Min.Android.Ver) %>% 
    summarise_all(
        funs(sum(is.na(.)))
    ) %>%
  gather() %>%
  # Only show columns with NA
  filter(value> 1) %>%
  arrange(-value) %>%
    hchart('column', hcaes(x = 'key', y = 'value', color = 'key')) %>%
  hc_add_theme(hc_theme_elementary()) %>%
  hc_title(text = "Columns with Missing Value")

boxplot of different Installment categories

ggplot(data = gg.new) +
  geom_boxplot(aes(x = reorder(Installs.cat, -Rating), y = Rating)) + 
  labs(x = "Installment Categories",y = "Rating")

##2.3 Delete duplicated rows

# number of observations before deleting duplicated rows
(original_num_rows = nrow(gg.new))
[1] 10839
gg.new.uniq = gg.new %>% distinct
# number of rows after delete duplicated rows
(uniq_num_rows = nrow(gg.new.uniq))
[1] 10356
# number of duplicated rows
(dup_rows = original_num_rows - uniq_num_rows)
[1] 483

##2.4 Merge Category into 6

# gg.new.uniq %>% filter (!is.na(Category)) %>% print
levels(gg.new.uniq$Category)
 [1] "1.9"                 "ART_AND_DESIGN"      "AUTO_AND_VEHICLES"  
 [4] "BEAUTY"              "BOOKS_AND_REFERENCE" "BUSINESS"           
 [7] "COMICS"              "COMMUNICATION"       "DATING"             
[10] "EDUCATION"           "ENTERTAINMENT"       "EVENTS"             
[13] "FAMILY"              "FINANCE"             "FOOD_AND_DRINK"     
[16] "GAME"                "HEALTH_AND_FITNESS"  "HOUSE_AND_HOME"     
[19] "LIBRARIES_AND_DEMO"  "LIFESTYLE"           "MAPS_AND_NAVIGATION"
[22] "MEDICAL"             "NEWS_AND_MAGAZINES"  "PARENTING"          
[25] "PERSONALIZATION"     "PHOTOGRAPHY"         "PRODUCTIVITY"       
[28] "SHOPPING"            "SOCIAL"              "SPORTS"             
[31] "TOOLS"               "TRAVEL_AND_LOCAL"    "VIDEO_PLAYERS"      
[34] "WEATHER"            
mydata1 = gg.new.uniq %>% filter(Category != 1.9) %>% mutate(Cat.cat = fct_collapse(Category,
                                                        Education = c("EDUCATION", "BOOKS_AND_REFERENCE", "LIBRARIES_AND_DEMO", "ART_AND_DESIGN"),
                                                        Personalization = c("PERSONALIZATION", "BEAUTY", "SHOPPING", "DATING", "PHOTOGRAPHY"),
                                                        Lifestyle = c("HEALTH_AND_FITNESS", "MEDICAL", "LIFESTYLE", "SPORTS", "FOOD_AND_DRINK"),
                                                        Family = c("FAMILY", "PARENTING", "HOUSE_AND_HOME", "1.9"),
                                                        Entertainment = c("ENTERTAINMENT", "GAME", "COMICS", "VIDEO_PLAYERS"), 
                                                        Business = c("BUSINESS", "FINANCE", "PRODUCTIVITY", "TOOLS", "NEWS_AND_MAGAZINES", "EVENTS", "SOCIAL", "COMMUNICATION"),
                                                        Travel = c("MAPS_AND_NAVIGATION", "AUTO_AND_VEHICLES", "TRAVEL_AND_LOCAL", "WEATHER")))
mydata2 = mydata1 %>% mutate(Interval = difftime(time1 = today(), time2 = Last.Updated))
str(mydata2)
'data.frame':   10356 obs. of  16 variables:
 $ App            : Factor w/ 9660 levels "- Free Comics - Comic Apps",..: 7206 2551 8970 8089 7272 7103 8149 5568 4926 5806 ...
 $ Category       : Factor w/ 34 levels "1.9","ART_AND_DESIGN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Rating         : num  4.1 3.9 4.7 4.5 4.3 4.4 3.8 4.1 4.4 4.7 ...
 $ Reviews        : num  1183 5924 5681 1947 5924 ...
 $ Size           : num  19 14 8.7 25 2.8 5.6 19 29 33 3.1 ...
 $ Installs       : Factor w/ 22 levels "0","0+","1,000,000,000+",..: 8 20 13 16 11 17 17 4 4 8 ...
 $ Type           : Factor w/ 4 levels "0","Free","NaN",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Price          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Content.Rating : Factor w/ 7 levels "","Adults only 18+",..: 3 3 3 6 3 3 3 3 3 3 ...
 $ Genres         : Factor w/ 120 levels "Action","Action;Action & Adventure",..: 10 13 10 10 12 10 10 10 10 12 ...
 $ Last.Updated   : Date, format: "2018-01-07" "2018-01-15" ...
 $ Current.Ver    : Factor w/ 2834 levels "","0.0.0.2","0.0.1",..: 122 1020 468 2827 280 116 280 2393 1457 1431 ...
 $ Min.Android.Ver: num  4 4 4 4.2 4.4 2.3 4 4.2 3 4 ...
 $ Installs.cat   : Factor w/ 3 levels "low","high","medium": 3 3 2 2 3 3 3 2 2 3 ...
 $ Cat.cat        : Factor w/ 7 levels "Family","Education",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Interval       : 'difftime' num  1089 1081 883 937 ...
  ..- attr(*, "units")= chr "days"
mydata2 %>% filter(Installs.cat == "low") %>% print

Impute missing values

#missForest
library(missForest)
#impute missing values, using all parameters as default values
gg.new.imp <- missForest(data.matrix(mydata2), maxiter = 5, ntree = 10)
  missForest iteration 1 in progress...done!
  missForest iteration 2 in progress...done!
  missForest iteration 3 in progress...done!
  missForest iteration 4 in progress...done!
  missForest iteration 5 in progress...done!
#check imputed values
# gg.new.imp$ximp
#check imputation error
gg.new.imp$OOBerror
      NRMSE 
0.001104393 

get the semantic score

# install.packages("stringr")
# install.packages("tidytext")
library(stringr)
library(tidytext)
# read in user reviews
user_review = read.csv("googleplaystore_user_reviews.csv")
str(user_review)
'data.frame':   64295 obs. of  5 variables:
 $ App                   : Factor w/ 1074 levels "10 Best Foods for You",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Translated_Review     : Factor w/ 27996 levels "","___ ___ ___ ___ ___ 0",..: 9279 23853 17229 27355 2076 2168 1032 17229 15968 13280 ...
 $ Sentiment             : Factor w/ 4 levels "nan","Negative",..: 4 4 1 4 4 4 4 1 3 3 ...
 $ Sentiment_Polarity    : num  1 0.25 NaN 0.4 1 1 0.6 NaN 0 0 ...
 $ Sentiment_Subjectivity: num  0.533 0.288 NaN 0.875 0.3 ...
user_review %>% print
head(user_review)
# get sentiment data frame
sents = get_sentiments("afinn") %>% print
range(sents$score)
Unknown or uninitialised column: `score`.no non-missing arguments to min; returning Infno non-missing arguments to max; returning -Inf
[1]  Inf -Inf
# left join the sentiment chart and the user reviews to get score
t1 = user_review %>% mutate(review = as.character(Translated_Review)) %>% unnest_tokens(word, review)
# t2 = user_review[1:500, ]
user_score = left_join(t1, sents) %>% group_by(App) %>% summarise(n = n(), score=sum(t1$score, na.rm=T)) %>% mutate(avg.score = score / n) %>% print
Joining, by = "word"
`summarise()` ungrouping output (override with `.groups` argument)
# range(user_score $ avg.score)
user_review %>% group_by(App) %>% count
t11 = user_score %>% inner_join(gg.new) %>% filter(Installs != 5000) %>% filter(Installs != 1000000000)
Joining, by = "App"
ggplot(t11) + geom_line(aes(x = Installs, y = avg.score))

ggplot(t11) + geom_boxplot(aes(x = reorder(as.factor(Installs), -avg.score), y = avg.score)) + labs(x = "Installments", y = "Average Score") + coord_flip()

# recover app name after data imputation
# add num_row to gg.new
mydata2 = mydata2 %>% mutate(r = row_number()) 
# split data into training and test data
# change the list to data frame 
gg.df = gg.new.imp[[1]] %>% unlist()
gg.data = data.frame(gg.df) %>% mutate(r = row_number()) 
t1 = left_join(gg.data, mydata2, by = "r") %>% 
  select(Rating.x, Reviews.y, Size.x, Installs.cat.y, Price.y, Content.Rating.y, Cat.cat.y, Interval.y) %>% print
# split data
(total_row = nrow(t1))
[1] 10356
ins.l= which(t1$Installs.cat.y == "low")
ins.m= which(t1$Installs.cat.y == "medium")
ins.h= which(t1$Installs.cat.y == "high")
train.id = c(sample(ins.l, size = trunc(0.8 *length(ins.l))),
             sample(ins.m, size = trunc(0.8 *length(ins.m))), 
             sample(ins.h, size = trunc(0.8 *length(ins.h))))
train.gg = t1[train.id, ]
test.gg = t1[-train.id, ]
levels(train.gg$`Installs`)
[1] "low"    "high"   "medium"
table(train.gg$`Installs`)

   low   high medium 
  2519   3243   2522 
# random forest
set.seed(415)
library(randomForest)
table(factor(train.gg$Installs.cat.y))

   low   high medium 
  2519   3243   2522 
bag.gg=randomForest(Installs.cat.y~., data=train.gg, mtry = ncol(train.gg) - 1,importance=TRUE)
bag.gg

Call:
 randomForest(formula = Installs.cat.y ~ ., data = train.gg, mtry = ncol(train.gg) -      1, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 7

        OOB estimate of  error rate: 35.01%
Confusion matrix:
        low high medium class.error
low    1623  282    614   0.3556967
high    118 2505    620   0.2275671
medium  409  857   1256   0.5019826
# plot
yhat.bag = predict(bag.gg, newdata=test.gg) 
# test error
(forest.test.err = mean(yhat.bag != test.gg$Installs.cat.y))
[1] 0.3397683
# get the importance
importance(bag.gg)
                       low      high   medium MeanDecreaseAccuracy MeanDecreaseGini
Rating.x          83.41493 145.52376 42.23615            157.29426         919.4855
Reviews.y        171.97873 124.97380 53.34318            191.67516        1621.0679
Size.x            41.47474 149.90416 15.38727            126.17961        1078.8905
Price.y           65.55221 126.03555 31.48082            125.50710         163.8186
Content.Rating.y  16.06423  19.74333 11.55403             27.17451         131.4630
Cat.cat.y         18.65104  78.45468 18.60626             75.61766         357.2444
Interval.y        38.97942 150.75211 15.99744            125.52877        1207.8295
varImpPlot(bag.gg)

# tree
set.seed(415)
library(tree)
#train.gg
#colnames(train.gg)[1] = "Rating"
#colnames(train.gg)[2] = "Reviews"
#colnames(train.gg)[3] = "Size"
#colnames(train.gg)[5] = "Price"
#colnames(train.gg)[6] = "Content Rating"
#colnames(train.gg)[7] = "Category"
#colnames(train.gg)[1] = "Time Since Last Update"
#train.gg
train.gg
tree.gg = tree(Installs.cat.y~., data = train.gg)
NAs introduced by coercion
summary(tree.gg)

Classification tree:
tree(formula = Installs.cat.y ~ ., data = train.gg)
Variables actually used in tree construction:
[1] "Reviews.y" "Size.x"    "Rating.x"  "Price.y"  
Number of terminal nodes:  8 
Residual mean deviance:  1.688 = 13970 / 8276 
Misclassification error rate: 0.4013 = 3324 / 8284 
plot(tree.gg)
text(tree.gg, pretty = 1, cex = 1)

yhat.tree = predict(tree.gg, newdata=test.gg) 
NAs introduced by coercion
# test error
(tree.test.err = mean(yhat.tree != test.gg$Installs.cat.y))
[1] 1
# prune the tree
cv.gg.tree=cv.tree(tree.gg,FUN=prune.misclass)
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
cv.gg.tree
$size
[1] 8 7 6 5 4 3 2 1

$dev
[1] 3391 3456 3674 3728 3768 3850 4317 5041

$k
[1] -Inf   67   93  100  109  138  486  724

$method
[1] "misclass"

attr(,"class")
[1] "prune"         "tree.sequence"
# par(mfrow=c(1,2))
# plot(cv.gg.tree$size,cv.gg.tree$dev / length(train.gg),ylab="cv error", xlab="size",type="b")
# plot(cv.gg.tree$k, cv.gg.tree$dev / length(train.gg),ylab="cv error", xlab="k",type="b")
# predict using pruning tree
prune.tree=prune.misclass(tree.gg,best=8)
tree.pred=predict(prune.tree, test.gg,type="class")
NAs introduced by coercion
table(tree.pred, test.gg$Installs.cat.y)
         
tree.pred low high medium
   low    305   10     64
   high    68  545    176
   medium 257  256    391
(test.tree.err = mean(tree.pred != test.gg$Installs.cat.y)) 
[1] 0.4010618
# plot the tree
plot(prune.tree)
text(prune.tree, pretty = 0, cex = 1)

As we can see in both single tree and random forest, reviews is the most important predictor. When we dig into the reviews, we figure out that approxiamtely 1000 apps have more than 100 relevant text reviews / comments.

SVM on traning set

set.seed(415)
# get data frame ready to use
train.gg
table(factor(train.gg$Installs.cat.y))

   low   high medium 
  2519   3243   2522 
costVals = c(1, 5, 10, 50)
# linear kernel
# running too slow, be careful to change predictors
svm1 <- tune(svm, as.factor(Installs.cat.y) ~ ., data = train.gg,
             kernel = "linear",
             ranges = list("cost" = costVals)) 
summary(svm1)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:

- best performance: 0.4470082 

- Detailed performance results:
# find the best cost under linear kernel
best_mod_linear = svm1$best.model
summary(best_mod_linear)

Call:
best.tune(method = svm, train.x = as.factor(Installs.cat.y) ~ ., data = train.gg, 
    ranges = list(cost = costVals), kernel = "linear")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  10 

Number of Support Vectors:  6894

 ( 2211 2447 2236 )


Number of Classes:  3 

Levels: 
 low high medium
# thus the cost of the best model si 50.
# get the test error of the best model of the linear kernel
test.gg %>% str()
'data.frame':   2072 obs. of  8 variables:
 $ Rating.x        : num  4.1 4.3 4.7 4.7 4.7 4.2 4.4 3.8 4.7 4.2 ...
 $ Reviews.y       : num  1183 5924 485 443 3284 ...
 $ Size.x          : num  19 2.8 3.1 23 4.2 ...
 $ Installs.cat.y  : Factor w/ 3 levels "low","high","medium": 3 3 3 3 3 3 2 3 2 3 ...
 $ Price.y         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Content.Rating.y: Factor w/ 7 levels "","Adults only 18+",..: 3 3 3 3 6 3 3 3 3 3 ...
 $ Cat.cat.y       : Factor w/ 7 levels "Family","Education",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Interval.y      : 'difftime' num  1089 925 912 908 ...
  ..- attr(*, "units")= chr "days"
pred_test_linear = predict(best_mod_linear, newdata = test.gg)
table(predict = pred_test_linear, truth = test.gg$Installs.cat.y)
        truth
predict  low high medium
  low    351  105    225
  high   144  622    226
  medium 135   84    180
(test_err_linear = mean(pred_test_linear != test.gg$Installs.cat.y))
[1] 0.4435328
set.seed(415)
# kernel radial
gammaVals = c(1, 2, 3, 4)
svm_radial <-tune(svm, as.factor(Installs.cat.y) ~ ., data = train.gg, 
                  kernel = "radial",
                  cost = 100,
                               gamma =gammaVals)
summary(svm_radial)

Error estimation of ‘svm’ using 10-fold cross validation: 0.4412131
best_mod_radial = svm_radial$best.model
summary(best_mod_radial)

Call:
best.tune(method = svm, train.x = as.factor(Installs.cat.y) ~ ., data = train.gg, 
    kernel = "radial", cost = 100, gamma = gammaVals)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  100 

Number of Support Vectors:  6009

 ( 1772 2206 2031 )


Number of Classes:  3 

Levels: 
 low high medium
# get test error of kernel of the radial
pred_test_radial = predict(best_mod_radial, newdata = test.gg)
(test_err_radial = mean(pred_test_radial != test.gg$Installs.cat.y))
[1] 0.4377413

Is it true that people tends to give text review when they highly positively review the app?

# left join the user_score table and t3
mydata2 = mydata2 %>% mutate(r = row_number()) %>% print 
gg.df = gg.new.imp[[1]] %>% unlist()
gg.data = data.frame(gg.df) %>% mutate(r = row_number()) %>% print
t3 = left_join(gg.data, mydata2, by = "r") %>% 
  select(Rating.x, Reviews.y, App.y, Installs.cat.y) %>% print
colnames(t3)[3] = "App"
t2 = inner_join(user_score, t3, by = "App") %>% print
# raing and avg score
# add main title manually, which is "rating vs aaverage sentimental score"
ggplot(data = t2, aes(x = Rating.x, y = avg.score)) + geom_bar(stat = "identity") + labs(x = "Rating", y = "Average Sentimental Score", title = "Rating vs Average sentimental Score") 

ggplot(data = t2, aes(x = as.factor(Installs.cat.y), y = avg.score)) + geom_boxplot() + labs(x = "Installment Category", y = "Average Sentimental Score")

#boxplot(t2$Installs.cat.y ~ t2$avg.score)
# rating vs reviews
ggplot(data = t2, aes(x = Reviews.y, y = avg.score)) + geom_bar(stat = "identity") + labs(x = "Number of #Reviews", y = "Average Sentimental Score", title = "Number of Reviews vs Average sentimental Score") 

High avg score tends to concentrated at rating above and including 4.0

data frame that might not be used

final1 = left_join(gg.data, mydata2, by = "r") %>% select(App.y, Reviews.y, Rating.x, Interval.y, Size.x, Price.y, Cat.cat.y, Content.Rating.y) %>% print
colnames(final1)[1] = "App"
colnames(final1)[2] = "Reviews"
colnames(final1)[3] = "Rating"
colnames(final1)[4] = "Interval"
colnames(final1)[5] = "Size"
colnames(final1)[6] = "Price"
colnames(final1)[7] = "Category"
colnames(final1)[8] = "Content"
show((final1))
plot(final1)

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQp0aXRsZTogIkdvb2dsZVBsYXlTdG9yZS1BbmFseXNpcyIKYXV0aG9yOiAiS2hhd2xhLUJhbnlET21pIgpkYXRlOiAiMjkvMTIvMjAyMCIKb3V0cHV0OgogIHBkZl9kb2N1bWVudDogZGVmYXVsdAogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQKLS0tCgojMS5Mb2FkaW5nIERhdGEKYGBge3IgfQpnZyA9IHJlYWQuY3N2KCJnb29nbGVwbGF5c3RvcmUuY3N2IikKcmV2aWV3ID0gcmVhZC5jc3YoImdvb2dsZXBsYXlzdG9yZV91c2VyX3Jldmlld3MuY3N2IikKbGlicmFyeShlMTA3MSkKbGlicmFyeSh0aWR5dmVyc2UpCnJldmlldzEgPSByZXZpZXcgJT4lIHNlbGVjdChBcHAsIFRyYW5zbGF0ZWRfUmV2aWV3KQpoZWFkKHJldmlldzEpCmtuaXRyOjprYWJsZShoZWFkKHJldmlldzEpKQpoZWFkKHJldmlldykKaGVhZChnZykKYGBgCgojMi5EYXRhIFByZXByb2Nlc3NpbmcKYGBge3J9CnN0cihnZykKYGBgCgpUaGVyZSBhcmUgYSBsb3Qgb2YgZmFjdG9yIHZhcmlhYmxlcyB3aGljaCBzaG91bGQgYWN0dWFsbHkgYmUgY29udmVydGVkIHRvIG51bWVyaWMgdmFyaWFibGVzLgoKIyMyLjFDb252ZXJ0aW5nIHZhcmlhYmxlIHR5cGVzCmBgYHtyfQpsaWJyYXJ5KGx1YnJpZGF0ZSkKbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoZHBseXIpCmdnLm5ldyA8LSBnZyAlPiUKICBtdXRhdGUoCiAgICAjIEVsaW1pbmF0ZSAiKyIgdG8gdHJhbnNmb3JtIEluc3RhbGxzIHRvIG51bWVyaWMgdmFyaWFibGUKICAgIyBJbnN0YWxscyA9IGdzdWIoIlxcKyIsICIiLCBhcy5jaGFyYWN0ZXIoSW5zdGFsbHMpKSwKICAgIyBJbnN0YWxscyA9IGFzLm51bWVyaWMoZ3N1YigiLCIsICIiLCBJbnN0YWxscykpLAogICAgIyBFbGltaW5hdGUgIk0iIHRvIHRyYW5zZm9ybSBTaXplIHRvIG51bWVyaWMgdmFyaWFibGUKICAgIFNpemUgPSBnc3ViKCJNIiwgIiIsIFNpemUpLAogICAgIyBGb3IgY2VsbHMgd2l0aCBrLCBkaXZpZGUgaXQgYnkgMTAyNCwgc2luY2UgMTAyNGtCID0gMU1CLCB0aGUgdW5pdCBmb3Igc2l6ZSBpcyBNQgogICAgU2l6ZSA9IGlmZWxzZShncmVwbCgiayIsIFNpemUpLGFzLm51bWVyaWMoZ3N1YigiayIsICIiLCBTaXplKSkvMTAyNCwgYXMubnVtZXJpYyhTaXplKSksCiAgICAjIFRyYW5zZm9ybSByZXZpZXdzIHRvIG51bWVyaWMKICAgIFJldmlld3MgPSBhcy5udW1lcmljKFJldmlld3MpLAogICAgIyBSZW1vdmUgIiQiIGZyb20gUHJpY2UgdG8gdHJhbnNmb3JtIGl0IHRvIG51bWVyaWMKICAgIFByaWNlID0gYXMubnVtZXJpYyhnc3ViKCJcXCQiLCAiIiwgYXMuY2hhcmFjdGVyKFByaWNlKSkpLAogICAgIyBDb252ZXJ0IExhc3QgVXBkYXRlZCB0byBkYXRlIGZvcm1hdAogICAgTGFzdC5VcGRhdGVkID0gbWR5KExhc3QuVXBkYXRlZCksCiAgICAjIFJlcGxhY2UgIlZhcmllcyB3aXRoIGRldmljZSIgdG8gTkEgc2luY2UgaXQgaXMgdW5rbm93bgogICAgTWluLkFuZHJvaWQuVmVyID0gZ3N1YigiVmFyaWVzIHdpdGggZGV2aWNlIiwgTkEsIEFuZHJvaWQuVmVyKSwKICAgICMgS2VlcCBvbmx5IHZlcnNpb24gbnVtYmVyIHRvIDEgZGVjaW1hbCBhcyBpdCdzIG1vc3QgcmVwcmVzZW50YXRpY2UKICAgIE1pbi5BbmRyb2lkLlZlciA9IGFzLm51bWVyaWMoc3Vic3RyKE1pbi5BbmRyb2lkLlZlciwgc3RhcnQgPSAxLCBzdG9wID0gMykpLAogICAgIyBEcm9wIG9sZCBBbmRyb2lkIHZlcnNpb24gY29sdW1uCiAgICBBbmRyb2lkLlZlciA9IE5VTEwKICApICU+JSAKICBmaWx0ZXIoCiAgICAjIFR3byBhcHBzIGhhZCB0eXBlIGFzIDAgb3IgTkEsIHRoZXkgd2lsbCBiZSByZW1vdmVkIAogICAgVHlwZSAlaW4lIGMoIkZyZWUiLCAiUGFpZCIpCiApCmBgYAoKCmBgYHtyfQpzdHIoZ2cubmV3KQpgYGAKYGBge3J9Cm9wdGlvbnMoc2NpcGVuPTk5OSkKdGFibGUoZ2cubmV3JEluc3RhbGxzKQpnZy5uZXckSW5zdGFsbHMlPiVzdHIoKQpnZy5uZXcgJT4lIGZpbHRlcihJbnN0YWxscyA9PSAiNTAwLDAwMCIpICU+JSBwcmludApgYGAKCmBgYHtyfQpsaWJyYXJ5KGhpZ2hjaGFydGVyKQpnZy5uZXcgJT4lIHNlbGVjdCgtTWluLkFuZHJvaWQuVmVyKSAlPiUgCiAgICBzdW1tYXJpc2VfYWxsKAogICAgICAgIGZ1bnMoc3VtKGlzLm5hKC4pKSkKICAgICkgJT4lCiAgZ2F0aGVyKCkgJT4lCiAgIyBPbmx5IHNob3cgY29sdW1ucyB3aXRoIE5BCiAgZmlsdGVyKHZhbHVlPiAxKSAlPiUKICBhcnJhbmdlKC12YWx1ZSkgJT4lCiAgICBoY2hhcnQoJ2NvbHVtbicsIGhjYWVzKHggPSAna2V5JywgeSA9ICd2YWx1ZScsIGNvbG9yID0gJ2tleScpKSAlPiUKICBoY19hZGRfdGhlbWUoaGNfdGhlbWVfZWxlbWVudGFyeSgpKSAlPiUKICBoY190aXRsZSh0ZXh0ID0gIkNvbHVtbnMgd2l0aCBNaXNzaW5nIFZhbHVlIikKYGBgCgoKIyMjIE1vc3QgcG9wdWxhciBjYXRlZ29yeSAKYGBge3J9CmdnLm5ldzEgPC0gZ2cgJT4lCiAgbXV0YXRlKAogICAgIyBFbGltaW5hdGUgIisiIHRvIHRyYW5zZm9ybSBJbnN0YWxscyB0byBudW1lcmljIHZhcmlhYmxlCiAgICBJbnN0YWxscyA9IGdzdWIoIlxcKyIsICIiLCBhcy5jaGFyYWN0ZXIoSW5zdGFsbHMpKSwKICAgIEluc3RhbGxzID0gYXMubnVtZXJpYyhnc3ViKCIsIiwgIiIsIEluc3RhbGxzKSksCiAgICAjIEVsaW1pbmF0ZSAiTSIgdG8gdHJhbnNmb3JtIFNpemUgdG8gbnVtZXJpYyB2YXJpYWJsZQogICAgU2l6ZSA9IGdzdWIoIk0iLCAiIiwgU2l6ZSksCiAgICAjIEZvciBjZWxscyB3aXRoIGssIGRpdmlkZSBpdCBieSAxMDI0LCBzaW5jZSAxMDI0a0IgPSAxTUIsIHRoZSB1bml0IGZvciBzaXplIGlzIE1CCiAgICBTaXplID0gaWZlbHNlKGdyZXBsKCJrIiwgU2l6ZSksYXMubnVtZXJpYyhnc3ViKCJrIiwgIiIsIFNpemUpKS8xMDI0LCBhcy5udW1lcmljKFNpemUpKSwKICAgICMgVHJhbnNmb3JtIHJldmlld3MgdG8gbnVtZXJpYwogICAgUmV2aWV3cyA9IGFzLm51bWVyaWMoUmV2aWV3cyksCiAgICAjIFJlbW92ZSAiJCIgZnJvbSBQcmljZSB0byB0cmFuc2Zvcm0gaXQgdG8gbnVtZXJpYwogICAgUHJpY2UgPSBhcy5udW1lcmljKGdzdWIoIlxcJCIsICIiLCBhcy5jaGFyYWN0ZXIoUHJpY2UpKSksCiAgICAjIENvbnZlcnQgTGFzdCBVcGRhdGVkIHRvIGRhdGUgZm9ybWF0CiAgICBMYXN0LlVwZGF0ZWQgPSBtZHkoTGFzdC5VcGRhdGVkKSwKICAgICMgUmVwbGFjZSAiVmFyaWVzIHdpdGggZGV2aWNlIiB0byBOQSBzaW5jZSBpdCBpcyB1bmtub3duCiAgICBNaW4uQW5kcm9pZC5WZXIgPSBnc3ViKCJWYXJpZXMgd2l0aCBkZXZpY2UiLCBOQSwgQW5kcm9pZC5WZXIpLAogICAgIyBLZWVwIG9ubHkgdmVyc2lvbiBudW1iZXIgdG8gMSBkZWNpbWFsIGFzIGl0J3MgbW9zdCByZXByZXNlbnRhdGljZQogICAgTWluLkFuZHJvaWQuVmVyID0gYXMubnVtZXJpYyhzdWJzdHIoTWluLkFuZHJvaWQuVmVyLCBzdGFydCA9IDEsIHN0b3AgPSAzKSksCiAgICAjIERyb3Agb2xkIEFuZHJvaWQgdmVyc2lvbiBjb2x1bW4KICAgIEFuZHJvaWQuVmVyID0gTlVMTAogICkKZ2cubmV3MiA9IGdnLm5ldzEgJT4lIG11dGF0ZShJbnRlcnZhbCA9IGRpZmZ0aW1lKHRpbWUxID0gdG9kYXkoKSwgdGltZTIgPSBMYXN0LlVwZGF0ZWQpKSAlPiUgcHJpbnQKZ2dwbG90KGdnLm5ldzIpICsgZ2VvbV9saW5lKGFlcyh4ID0gSW50ZXJ2YWwsIHkgPSBJbnN0YWxscykpICsgbGFicyh4ID0gIkRheXMgU2luY2UgTGFzdCBVcGRhdGUiLCB5ID0gIkluc3RhbGxtZW50cyIpCmBgYAoKCmBgYHtyfQpnZy5uZXcxICU+JSAKICBncm91cF9ieShDYXRlZ29yeSkgJT4lIGZpbHRlcihDYXRlZ29yeSAhPSAxLjkpICU+JSAKICBzdW1tYXJpemUoCiAgICBUb3RhbEluc3RhbGxzID0gc3VtKGFzLm51bWVyaWMoSW5zdGFsbHMpKQogICkgJT4lCiAgYXJyYW5nZSgtVG90YWxJbnN0YWxscykgJT4lCiAgaGNoYXJ0KCdzY2F0dGVyJywgaGNhZXMoeCA9ICJDYXRlZ29yeSIsIHkgPSAiVG90YWxJbnN0YWxscyIsIHNpemUgPSAiVG90YWxJbnN0YWxscyIsIGNvbG9yID0gIkNhdGVnb3J5IikpICU+JQogIGhjX2FkZF90aGVtZShoY190aGVtZV81MzgoKSkgJT4lCiAgaGNfdGl0bGUodGV4dCA9ICJNb3N0IHBvcHVsYXIgY2F0ZWdvcmllcyIpCmBgYAoKIyMjQ29ycmVsYXRpb24gbWFwCmBgYHtyfQpoZWFkKGlyaXMpCmxpYnJhcnkocmVzaGFwZTIpCmRmX2NvciA9IGlyaXNbLDI6M10KY29ybWF0IDwtIHJvdW5kKGNvcihkZl9jb3IpLDIpIAptZWx0ZWRfY29ybWF0IDwtIG1lbHQoY29ybWF0KQpnZ3Bsb3QoZGF0YSA9IG1lbHRlZF9jb3JtYXQsIGFlcyhWYXIyLCBWYXIxLCBmaWxsID0gdmFsdWUpKSsKIGdlb21fdGlsZShjb2xvciA9ICJ3aGl0ZSIpKwogc2NhbGVfZmlsbF9ncmFkaWVudDIobG93ID0gInllbGxvdyIsIGhpZ2ggPSAicHVycGxlIiwgbWlkID0gInJlZCIsCiAgIG1pZHBvaW50ID0gMCwgbGltaXQgPSBjKC0xLDEpLCBzcGFjZSA9ICJMYWIiLAogICBuYW1lPSJQZWFyc29uXG5Db3JyZWxhdGlvbiIpICsKICB0aGVtZV9taW5pbWFsKCkrCiB0aGVtZShheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDQ1LCB2anVzdCA9IDEsCiAgICBzaXplID0gMTIsIGhqdXN0ID0gMSkpKwogY29vcmRfZml4ZWQoKQpgYGAKCgoKIyMyLjJEaXZpZGUgSW5zdGFsbHMgaW50byAzIGNhdGVnb3JpZXMgCmBgYHtyfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKb3B0aW9ucyhzY2lwZW49OTk5KQojIHdyaXRlIGZ1bmN0aW9uIHRvIGNvbnZlcnQgaW5zdGFsbG1lbnQKY29udmVydF9pbnN0YWxsID0gZnVuY3Rpb24oZGF0YSwgaW5zdGFsbG1lbnQpIHsKICAjaW5zdGFsbC5sZXZlbHMgPSBmYWN0b3IoYygibG93IiwgIm1lZGl1bSIsICJoaWdoIikpCiAgCiAgaWYgKGluc3RhbGxtZW50ICVpbiUgYygiMCIsICIxIiwgIjUwIiwgIjEwMCIsICI1MDAiLCAiMSwwMDAiLCAiNSwwMDAiLCAiMTAsMDAwIiwgIjUwLDAwMCIpKSB7CiAgSW5zdGFsbHMuY2F0ID0gImxvdyIKICB9CiAgZWxzZSBpZiAoaW5zdGFsbG1lbnQgJWluJSBjICgiMTAwLDAwMCIsICI1MDAsMDAwIiwgIjEsMDAwLDAwMCIsICI1LDAwMCwwMDAiKSl7CiAgICBJbnN0YWxscy5jYXQgPSAibWVkaXVtIgogIH0KICBlbHNlIHsKICAgICAgSW5zdGFsbHMuY2F0ID0gImhpZ2giCiAgfQp9CiNnZy5uZXcgPSBnZy5uZXcgJT4lIGZpbHRlcighaXMubmEoSW5zdGFsbHMpKSAlPiUgbXV0YXRlKEluc3RhbGxzLmNhdCA9IGZhY3Rvcihjb252ZXJ0X2luc3RhbGwoZ2cubmV3LCBJbnN0YWxscyksICMgbGV2ZWxzID0gYygibG93IiwgIm1lZGl1bSIsICJoaWdoIikpKQpzdW0oKGdnLm5ldyRJbnN0YWxscykgJWluJSAiMTAsMDAwIikKIyBnZy5uZXcgPSBnZy5uZXcgJT4lIG11dGF0ZShJbnN0YWxscy5jYXQgPSAiMSIpCnN0cihnZy5uZXcpCnRhYmxlKGdnLm5ldyRJbnN0YWxscykKdGFibGUoZ2cubmV3JEluc3RhbGxzLmNhdCkKZ2cubmV3ID0gZ2cubmV3ICU+JSBmaWx0ZXIoSW5zdGFsbHMgIT0gIkZyZWUiKSAlPiUgbXV0YXRlKAogIEluc3RhbGxzLmNhdCA9IGZjdF9jb2xsYXBzZShJbnN0YWxscywgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxvdyA9IGMoIkZyZWUiLCIwIiwgIjArIiwiMSsiLCAiNSsiLCAiMTArIiwiMTAwKyIsICI1MCsiLCAiMTAwKyIsICI1MDArIiwgIjEsMDAwKyIsICI1LDAwMCsiKSwgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG1lZGl1bSA9IGMoIjEwLDAwMCsiLCAiNTAsMDAwKyIsICIxMDAsMDAwKyIsICI1MDAsMDAwKyIpLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgaGlnaCA9IGMoIjEsMDAwLDAwMCsiLCAiNSwwMDAsMDAwKyIsICIxLDAwMCwwMDAsMDAwKyIsICIxMCwwMDAsMDAwKyIsICIxMDAsMDAwLDAwMCsiLCAiNTAsMDAwLDAwMCsiLCAiNTAwLDAwMCwwMDArIikpKQp0YWJsZShnZy5uZXckSW5zdGFsbHMuY2F0KQpgYGAKCiMjIyBib3hwbG90IG9mIGRpZmZlcmVudCBJbnN0YWxsbWVudCBjYXRlZ29yaWVzCmBgYHtyfQpnZ3Bsb3QoZGF0YSA9IGdnLm5ldykgKwogIGdlb21fYm94cGxvdChhZXMoeCA9IHJlb3JkZXIoSW5zdGFsbHMuY2F0LCAtUmF0aW5nKSwgeSA9IFJhdGluZykpICsgCiAgbGFicyh4ID0gIkluc3RhbGxtZW50IENhdGVnb3JpZXMiLHkgPSAiUmF0aW5nIikKYGBgCgoKCiMjMi4zIERlbGV0ZSBkdXBsaWNhdGVkIHJvd3MKYGBge3J9CiMgbnVtYmVyIG9mIG9ic2VydmF0aW9ucyBiZWZvcmUgZGVsZXRpbmcgZHVwbGljYXRlZCByb3dzCihvcmlnaW5hbF9udW1fcm93cyA9IG5yb3coZ2cubmV3KSkKZ2cubmV3LnVuaXEgPSBnZy5uZXcgJT4lIGRpc3RpbmN0CiMgbnVtYmVyIG9mIHJvd3MgYWZ0ZXIgZGVsZXRlIGR1cGxpY2F0ZWQgcm93cwoodW5pcV9udW1fcm93cyA9IG5yb3coZ2cubmV3LnVuaXEpKQojIG51bWJlciBvZiBkdXBsaWNhdGVkIHJvd3MKKGR1cF9yb3dzID0gb3JpZ2luYWxfbnVtX3Jvd3MgLSB1bmlxX251bV9yb3dzKQpgYGAKCiMjMi40IE1lcmdlIENhdGVnb3J5IGludG8gNiAKYGBge3J9CiMgZ2cubmV3LnVuaXEgJT4lIGZpbHRlciAoIWlzLm5hKENhdGVnb3J5KSkgJT4lIHByaW50CmxldmVscyhnZy5uZXcudW5pcSRDYXRlZ29yeSkKYGBgCgpgYGB7cn0KbXlkYXRhMSA9IGdnLm5ldy51bmlxICU+JSBmaWx0ZXIoQ2F0ZWdvcnkgIT0gMS45KSAlPiUgbXV0YXRlKENhdC5jYXQgPSBmY3RfY29sbGFwc2UoQ2F0ZWdvcnksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgRWR1Y2F0aW9uID0gYygiRURVQ0FUSU9OIiwgIkJPT0tTX0FORF9SRUZFUkVOQ0UiLCAiTElCUkFSSUVTX0FORF9ERU1PIiwgIkFSVF9BTkRfREVTSUdOIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgUGVyc29uYWxpemF0aW9uID0gYygiUEVSU09OQUxJWkFUSU9OIiwgIkJFQVVUWSIsICJTSE9QUElORyIsICJEQVRJTkciLCAiUEhPVE9HUkFQSFkiKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBMaWZlc3R5bGUgPSBjKCJIRUFMVEhfQU5EX0ZJVE5FU1MiLCAiTUVESUNBTCIsICJMSUZFU1RZTEUiLCAiU1BPUlRTIiwgIkZPT0RfQU5EX0RSSU5LIiksCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgRmFtaWx5ID0gYygiRkFNSUxZIiwgIlBBUkVOVElORyIsICJIT1VTRV9BTkRfSE9NRSIsICIxLjkiKSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBFbnRlcnRhaW5tZW50ID0gYygiRU5URVJUQUlOTUVOVCIsICJHQU1FIiwgIkNPTUlDUyIsICJWSURFT19QTEFZRVJTIiksIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEJ1c2luZXNzID0gYygiQlVTSU5FU1MiLCAiRklOQU5DRSIsICJQUk9EVUNUSVZJVFkiLCAiVE9PTFMiLCAiTkVXU19BTkRfTUFHQVpJTkVTIiwgIkVWRU5UUyIsICJTT0NJQUwiLCAiQ09NTVVOSUNBVElPTiIpLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIFRyYXZlbCA9IGMoIk1BUFNfQU5EX05BVklHQVRJT04iLCAiQVVUT19BTkRfVkVISUNMRVMiLCAiVFJBVkVMX0FORF9MT0NBTCIsICJXRUFUSEVSIikpKQpgYGAKCmBgYHtyfQpteWRhdGEyID0gbXlkYXRhMSAlPiUgbXV0YXRlKEludGVydmFsID0gZGlmZnRpbWUodGltZTEgPSB0b2RheSgpLCB0aW1lMiA9IExhc3QuVXBkYXRlZCkpCnN0cihteWRhdGEyKQpteWRhdGEyICU+JSBmaWx0ZXIoSW5zdGFsbHMuY2F0ID09ICJsb3ciKSAlPiUgcHJpbnQKYGBgCgojIyMjIEltcHV0ZSBtaXNzaW5nIHZhbHVlcwpgYGB7cn0KI21pc3NGb3Jlc3QKbGlicmFyeShtaXNzRm9yZXN0KQojaW1wdXRlIG1pc3NpbmcgdmFsdWVzLCB1c2luZyBhbGwgcGFyYW1ldGVycyBhcyBkZWZhdWx0IHZhbHVlcwpnZy5uZXcuaW1wIDwtIG1pc3NGb3Jlc3QoZGF0YS5tYXRyaXgobXlkYXRhMiksIG1heGl0ZXIgPSA1LCBudHJlZSA9IDEwKQojY2hlY2sgaW1wdXRlZCB2YWx1ZXMKIyBnZy5uZXcuaW1wJHhpbXAKI2NoZWNrIGltcHV0YXRpb24gZXJyb3IKZ2cubmV3LmltcCRPT0JlcnJvcgpgYGAKCgojIyMjIGdldCB0aGUgc2VtYW50aWMgc2NvcmUKYGBge3J9CiMgaW5zdGFsbC5wYWNrYWdlcygic3RyaW5nciIpCiMgaW5zdGFsbC5wYWNrYWdlcygidGlkeXRleHQiKQpsaWJyYXJ5KHN0cmluZ3IpCmxpYnJhcnkodGlkeXRleHQpCmBgYAoKYGBge3J9CiMgcmVhZCBpbiB1c2VyIHJldmlld3MKdXNlcl9yZXZpZXcgPSByZWFkLmNzdigiZ29vZ2xlcGxheXN0b3JlX3VzZXJfcmV2aWV3cy5jc3YiKQpzdHIodXNlcl9yZXZpZXcpCnVzZXJfcmV2aWV3ICU+JSBwcmludApoZWFkKHVzZXJfcmV2aWV3KQojIGdldCBzZW50aW1lbnQgZGF0YSBmcmFtZQpzZW50cyA9IGdldF9zZW50aW1lbnRzKCJhZmlubiIpICU+JSBwcmludApyYW5nZShzZW50cyRzY29yZSkKYGBgCgpgYGB7cn0KIyBsZWZ0IGpvaW4gdGhlIHNlbnRpbWVudCBjaGFydCBhbmQgdGhlIHVzZXIgcmV2aWV3cyB0byBnZXQgc2NvcmUKdDEgPSB1c2VyX3JldmlldyAlPiUgbXV0YXRlKHJldmlldyA9IGFzLmNoYXJhY3RlcihUcmFuc2xhdGVkX1JldmlldykpICU+JSB1bm5lc3RfdG9rZW5zKHdvcmQsIHJldmlldykKIyB0MiA9IHVzZXJfcmV2aWV3WzE6NTAwLCBdCnVzZXJfc2NvcmUgPSBsZWZ0X2pvaW4odDEsIHNlbnRzKSAlPiUgZ3JvdXBfYnkoQXBwKSAlPiUgc3VtbWFyaXNlKG4gPSBuKCksIHNjb3JlPXN1bSh0MSRzY29yZSwgbmEucm09VCkpICU+JSBtdXRhdGUoYXZnLnNjb3JlID0gc2NvcmUgLyBuKSAlPiUgcHJpbnQKIyByYW5nZSh1c2VyX3Njb3JlICQgYXZnLnNjb3JlKQpgYGAKCgpgYGB7cn0KdXNlcl9yZXZpZXcgJT4lIGdyb3VwX2J5KEFwcCkgJT4lIGNvdW50CnQxMSA9IHVzZXJfc2NvcmUgJT4lIGlubmVyX2pvaW4oZ2cubmV3KSAlPiUgZmlsdGVyKEluc3RhbGxzICE9IDUwMDApICU+JSBmaWx0ZXIoSW5zdGFsbHMgIT0gMTAwMDAwMDAwMCkKZ2dwbG90KHQxMSkgKyBnZW9tX2xpbmUoYWVzKHggPSBJbnN0YWxscywgeSA9IGF2Zy5zY29yZSkpCmdncGxvdCh0MTEpICsgZ2VvbV9ib3hwbG90KGFlcyh4ID0gcmVvcmRlcihhcy5mYWN0b3IoSW5zdGFsbHMpLCAtYXZnLnNjb3JlKSwgeSA9IGF2Zy5zY29yZSkpICsgbGFicyh4ID0gIkluc3RhbGxtZW50cyIsIHkgPSAiQXZlcmFnZSBTY29yZSIpICsgY29vcmRfZmxpcCgpCmBgYApgYGB7cn0KIyByZWNvdmVyIGFwcCBuYW1lIGFmdGVyIGRhdGEgaW1wdXRhdGlvbgojIGFkZCBudW1fcm93IHRvIGdnLm5ldwpteWRhdGEyID0gbXlkYXRhMiAlPiUgbXV0YXRlKHIgPSByb3dfbnVtYmVyKCkpIAojIHNwbGl0IGRhdGEgaW50byB0cmFpbmluZyBhbmQgdGVzdCBkYXRhCiMgY2hhbmdlIHRoZSBsaXN0IHRvIGRhdGEgZnJhbWUgCmdnLmRmID0gZ2cubmV3LmltcFtbMV1dICU+JSB1bmxpc3QoKQpnZy5kYXRhID0gZGF0YS5mcmFtZShnZy5kZikgJT4lIG11dGF0ZShyID0gcm93X251bWJlcigpKSAKdDEgPSBsZWZ0X2pvaW4oZ2cuZGF0YSwgbXlkYXRhMiwgYnkgPSAiciIpICU+JSAKICBzZWxlY3QoUmF0aW5nLngsIFJldmlld3MueSwgU2l6ZS54LCBJbnN0YWxscy5jYXQueSwgUHJpY2UueSwgQ29udGVudC5SYXRpbmcueSwgQ2F0LmNhdC55LCBJbnRlcnZhbC55KSAlPiUgcHJpbnQKIyBzcGxpdCBkYXRhCih0b3RhbF9yb3cgPSBucm93KHQxKSkKaW5zLmw9IHdoaWNoKHQxJEluc3RhbGxzLmNhdC55ID09ICJsb3ciKQppbnMubT0gd2hpY2godDEkSW5zdGFsbHMuY2F0LnkgPT0gIm1lZGl1bSIpCmlucy5oPSB3aGljaCh0MSRJbnN0YWxscy5jYXQueSA9PSAiaGlnaCIpCnRyYWluLmlkID0gYyhzYW1wbGUoaW5zLmwsIHNpemUgPSB0cnVuYygwLjggKmxlbmd0aChpbnMubCkpKSwKICAgICAgICAgICAgIHNhbXBsZShpbnMubSwgc2l6ZSA9IHRydW5jKDAuOCAqbGVuZ3RoKGlucy5tKSkpLCAKICAgICAgICAgICAgIHNhbXBsZShpbnMuaCwgc2l6ZSA9IHRydW5jKDAuOCAqbGVuZ3RoKGlucy5oKSkpKQp0cmFpbi5nZyA9IHQxW3RyYWluLmlkLCBdCnRlc3QuZ2cgPSB0MVstdHJhaW4uaWQsIF0KbGV2ZWxzKHRyYWluLmdnJGBJbnN0YWxsc2ApCnRhYmxlKHRyYWluLmdnJGBJbnN0YWxsc2ApCmBgYAoKCmBgYHtyfQojIHJhbmRvbSBmb3Jlc3QKc2V0LnNlZWQoNDE1KQpsaWJyYXJ5KHJhbmRvbUZvcmVzdCkKdGFibGUoZmFjdG9yKHRyYWluLmdnJEluc3RhbGxzLmNhdC55KSkKYmFnLmdnPXJhbmRvbUZvcmVzdChJbnN0YWxscy5jYXQueX4uLCBkYXRhPXRyYWluLmdnLCBtdHJ5ID0gbmNvbCh0cmFpbi5nZykgLSAxLGltcG9ydGFuY2U9VFJVRSkKYmFnLmdnCiMgcGxvdAp5aGF0LmJhZyA9IHByZWRpY3QoYmFnLmdnLCBuZXdkYXRhPXRlc3QuZ2cpIAojIHRlc3QgZXJyb3IKKGZvcmVzdC50ZXN0LmVyciA9IG1lYW4oeWhhdC5iYWcgIT0gdGVzdC5nZyRJbnN0YWxscy5jYXQueSkpCiMgZ2V0IHRoZSBpbXBvcnRhbmNlCmltcG9ydGFuY2UoYmFnLmdnKQp2YXJJbXBQbG90KGJhZy5nZykKYGBgCgpgYGB7cn0KIyB0cmVlCnNldC5zZWVkKDQxNSkKbGlicmFyeSh0cmVlKQojdHJhaW4uZ2cKI2NvbG5hbWVzKHRyYWluLmdnKVsxXSA9ICJSYXRpbmciCiNjb2xuYW1lcyh0cmFpbi5nZylbMl0gPSAiUmV2aWV3cyIKI2NvbG5hbWVzKHRyYWluLmdnKVszXSA9ICJTaXplIgojY29sbmFtZXModHJhaW4uZ2cpWzVdID0gIlByaWNlIgojY29sbmFtZXModHJhaW4uZ2cpWzZdID0gIkNvbnRlbnQgUmF0aW5nIgojY29sbmFtZXModHJhaW4uZ2cpWzddID0gIkNhdGVnb3J5IgojY29sbmFtZXModHJhaW4uZ2cpWzFdID0gIlRpbWUgU2luY2UgTGFzdCBVcGRhdGUiCiN0cmFpbi5nZwp0cmFpbi5nZwp0cmVlLmdnID0gdHJlZShJbnN0YWxscy5jYXQueX4uLCBkYXRhID0gdHJhaW4uZ2cpCnN1bW1hcnkodHJlZS5nZykKcGxvdCh0cmVlLmdnKQp0ZXh0KHRyZWUuZ2csIHByZXR0eSA9IDEsIGNleCA9IDEpCnloYXQudHJlZSA9IHByZWRpY3QodHJlZS5nZywgbmV3ZGF0YT10ZXN0LmdnKSAKIyB0ZXN0IGVycm9yCih0cmVlLnRlc3QuZXJyID0gbWVhbih5aGF0LnRyZWUgIT0gdGVzdC5nZyRJbnN0YWxscy5jYXQueSkpCmBgYAogCgoKYGBge3J9CiMgcHJ1bmUgdGhlIHRyZWUKY3YuZ2cudHJlZT1jdi50cmVlKHRyZWUuZ2csRlVOPXBydW5lLm1pc2NsYXNzKQpjdi5nZy50cmVlCiMgcGFyKG1mcm93PWMoMSwyKSkKIyBwbG90KGN2LmdnLnRyZWUkc2l6ZSxjdi5nZy50cmVlJGRldiAvIGxlbmd0aCh0cmFpbi5nZykseWxhYj0iY3YgZXJyb3IiLCB4bGFiPSJzaXplIix0eXBlPSJiIikKIyBwbG90KGN2LmdnLnRyZWUkaywgY3YuZ2cudHJlZSRkZXYgLyBsZW5ndGgodHJhaW4uZ2cpLHlsYWI9ImN2IGVycm9yIiwgeGxhYj0iayIsdHlwZT0iYiIpCiMgcHJlZGljdCB1c2luZyBwcnVuaW5nIHRyZWUKcHJ1bmUudHJlZT1wcnVuZS5taXNjbGFzcyh0cmVlLmdnLGJlc3Q9OCkKdHJlZS5wcmVkPXByZWRpY3QocHJ1bmUudHJlZSwgdGVzdC5nZyx0eXBlPSJjbGFzcyIpCnRhYmxlKHRyZWUucHJlZCwgdGVzdC5nZyRJbnN0YWxscy5jYXQueSkKKHRlc3QudHJlZS5lcnIgPSBtZWFuKHRyZWUucHJlZCAhPSB0ZXN0LmdnJEluc3RhbGxzLmNhdC55KSkgCiMgcGxvdCB0aGUgdHJlZQpwbG90KHBydW5lLnRyZWUpCnRleHQocHJ1bmUudHJlZSwgcHJldHR5ID0gMCwgY2V4ID0gMSkKYGBgCgpBcyB3ZSBjYW4gc2VlIGluIGJvdGggc2luZ2xlIHRyZWUgYW5kIHJhbmRvbSBmb3Jlc3QsIHJldmlld3MgaXMgdGhlIG1vc3QgaW1wb3J0YW50IHByZWRpY3Rvci4gV2hlbiB3ZSBkaWcgaW50byB0aGUgcmV2aWV3cywgd2UgZmlndXJlIG91dCB0aGF0IGFwcHJveGlhbXRlbHkgMTAwMCBhcHBzIGhhdmUgbW9yZSB0aGFuIDEwMCByZWxldmFudCB0ZXh0IHJldmlld3MgLyBjb21tZW50cy4gCgojIyMjIFNWTSBvbiB0cmFuaW5nIHNldApgYGB7cn0Kc2V0LnNlZWQoNDE1KQojIGdldCBkYXRhIGZyYW1lIHJlYWR5IHRvIHVzZQp0cmFpbi5nZwp0YWJsZShmYWN0b3IodHJhaW4uZ2ckSW5zdGFsbHMuY2F0LnkpKQpjb3N0VmFscyA9IGMoMSwgNSwgMTAsIDUwKQojIGxpbmVhciBrZXJuZWwKIyBydW5uaW5nIHRvbyBzbG93LCBiZSBjYXJlZnVsIHRvIGNoYW5nZSBwcmVkaWN0b3JzCnN2bTEgPC0gdHVuZShzdm0sIGFzLmZhY3RvcihJbnN0YWxscy5jYXQueSkgfiAuLCBkYXRhID0gdHJhaW4uZ2csCiAgICAgICAgICAgICBrZXJuZWwgPSAibGluZWFyIiwKICAgICAgICAgICAgIHJhbmdlcyA9IGxpc3QoImNvc3QiID0gY29zdFZhbHMpKSAKc3VtbWFyeShzdm0xKQojIGZpbmQgdGhlIGJlc3QgY29zdCB1bmRlciBsaW5lYXIga2VybmVsCmJlc3RfbW9kX2xpbmVhciA9IHN2bTEkYmVzdC5tb2RlbApzdW1tYXJ5KGJlc3RfbW9kX2xpbmVhcikKIyB0aHVzIHRoZSBjb3N0IG9mIHRoZSBiZXN0IG1vZGVsIHNpIDUwLgpgYGAKCmBgYHtyfQojIGdldCB0aGUgdGVzdCBlcnJvciBvZiB0aGUgYmVzdCBtb2RlbCBvZiB0aGUgbGluZWFyIGtlcm5lbAp0ZXN0LmdnICU+JSBzdHIoKQpwcmVkX3Rlc3RfbGluZWFyID0gcHJlZGljdChiZXN0X21vZF9saW5lYXIsIG5ld2RhdGEgPSB0ZXN0LmdnKQp0YWJsZShwcmVkaWN0ID0gcHJlZF90ZXN0X2xpbmVhciwgdHJ1dGggPSB0ZXN0LmdnJEluc3RhbGxzLmNhdC55KQoodGVzdF9lcnJfbGluZWFyID0gbWVhbihwcmVkX3Rlc3RfbGluZWFyICE9IHRlc3QuZ2ckSW5zdGFsbHMuY2F0LnkpKQpgYGAKCmBgYHtyfQpzZXQuc2VlZCg0MTUpCiMga2VybmVsIHJhZGlhbApnYW1tYVZhbHMgPSBjKDEsIDIsIDMsIDQpCnN2bV9yYWRpYWwgPC10dW5lKHN2bSwgYXMuZmFjdG9yKEluc3RhbGxzLmNhdC55KSB+IC4sIGRhdGEgPSB0cmFpbi5nZywgCiAgICAgICAgICAgICAgICAgIGtlcm5lbCA9ICJyYWRpYWwiLAogICAgICAgICAgICAgICAgICBjb3N0ID0gMTAwLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZ2FtbWEgPWdhbW1hVmFscykKc3VtbWFyeShzdm1fcmFkaWFsKQpgYGAKCmBgYHtyfQpiZXN0X21vZF9yYWRpYWwgPSBzdm1fcmFkaWFsJGJlc3QubW9kZWwKc3VtbWFyeShiZXN0X21vZF9yYWRpYWwpCmBgYAoKYGBge3J9CiMgZ2V0IHRlc3QgZXJyb3Igb2Yga2VybmVsIG9mIHRoZSByYWRpYWwKcHJlZF90ZXN0X3JhZGlhbCA9IHByZWRpY3QoYmVzdF9tb2RfcmFkaWFsLCBuZXdkYXRhID0gdGVzdC5nZykKKHRlc3RfZXJyX3JhZGlhbCA9IG1lYW4ocHJlZF90ZXN0X3JhZGlhbCAhPSB0ZXN0LmdnJEluc3RhbGxzLmNhdC55KSkKYGBgCgoKCgoKCgpJcyBpdCB0cnVlIHRoYXQgcGVvcGxlIHRlbmRzIHRvIGdpdmUgdGV4dCByZXZpZXcgd2hlbiB0aGV5IGhpZ2hseSBwb3NpdGl2ZWx5IHJldmlldyB0aGUgYXBwPwpgYGB7cn0KIyBsZWZ0IGpvaW4gdGhlIHVzZXJfc2NvcmUgdGFibGUgYW5kIHQzCm15ZGF0YTIgPSBteWRhdGEyICU+JSBtdXRhdGUociA9IHJvd19udW1iZXIoKSkgJT4lIHByaW50IApnZy5kZiA9IGdnLm5ldy5pbXBbWzFdXSAlPiUgdW5saXN0KCkKZ2cuZGF0YSA9IGRhdGEuZnJhbWUoZ2cuZGYpICU+JSBtdXRhdGUociA9IHJvd19udW1iZXIoKSkgJT4lIHByaW50CnQzID0gbGVmdF9qb2luKGdnLmRhdGEsIG15ZGF0YTIsIGJ5ID0gInIiKSAlPiUgCiAgc2VsZWN0KFJhdGluZy54LCBSZXZpZXdzLnksIEFwcC55LCBJbnN0YWxscy5jYXQueSkgJT4lIHByaW50CmNvbG5hbWVzKHQzKVszXSA9ICJBcHAiCnQyID0gaW5uZXJfam9pbih1c2VyX3Njb3JlLCB0MywgYnkgPSAiQXBwIikgJT4lIHByaW50CiMgcmFpbmcgYW5kIGF2ZyBzY29yZQojIGFkZCBtYWluIHRpdGxlIG1hbnVhbGx5LCB3aGljaCBpcyAicmF0aW5nIHZzIGFhdmVyYWdlIHNlbnRpbWVudGFsIHNjb3JlIgpnZ3Bsb3QoZGF0YSA9IHQyLCBhZXMoeCA9IFJhdGluZy54LCB5ID0gYXZnLnNjb3JlKSkgKyBnZW9tX2JhcihzdGF0ID0gImlkZW50aXR5IikgKyBsYWJzKHggPSAiUmF0aW5nIiwgeSA9ICJBdmVyYWdlIFNlbnRpbWVudGFsIFNjb3JlIiwgdGl0bGUgPSAiUmF0aW5nIHZzIEF2ZXJhZ2Ugc2VudGltZW50YWwgU2NvcmUiKSAKZ2dwbG90KGRhdGEgPSB0MiwgYWVzKHggPSBhcy5mYWN0b3IoSW5zdGFsbHMuY2F0LnkpLCB5ID0gYXZnLnNjb3JlKSkgKyBnZW9tX2JveHBsb3QoKSArIGxhYnMoeCA9ICJJbnN0YWxsbWVudCBDYXRlZ29yeSIsIHkgPSAiQXZlcmFnZSBTZW50aW1lbnRhbCBTY29yZSIpCiNib3hwbG90KHQyJEluc3RhbGxzLmNhdC55IH4gdDIkYXZnLnNjb3JlKQojIHJhdGluZyB2cyByZXZpZXdzCmdncGxvdChkYXRhID0gdDIsIGFlcyh4ID0gUmV2aWV3cy55LCB5ID0gYXZnLnNjb3JlKSkgKyBnZW9tX2JhcihzdGF0ID0gImlkZW50aXR5IikgKyBsYWJzKHggPSAiTnVtYmVyIG9mICNSZXZpZXdzIiwgeSA9ICJBdmVyYWdlIFNlbnRpbWVudGFsIFNjb3JlIiwgdGl0bGUgPSAiTnVtYmVyIG9mIFJldmlld3MgdnMgQXZlcmFnZSBzZW50aW1lbnRhbCBTY29yZSIpIApgYGAKCkhpZ2ggYXZnIHNjb3JlIHRlbmRzIHRvIGNvbmNlbnRyYXRlZCBhdCByYXRpbmcgYWJvdmUgYW5kIGluY2x1ZGluZyA0LjAKCgoKCgoKCgoKCiMjIyMgZGF0YSBmcmFtZSB0aGF0IG1pZ2h0IG5vdCBiZSB1c2VkCmBgYHtyfQpmaW5hbDEgPSBsZWZ0X2pvaW4oZ2cuZGF0YSwgbXlkYXRhMiwgYnkgPSAiciIpICU+JSBzZWxlY3QoQXBwLnksIFJldmlld3MueSwgUmF0aW5nLngsIEludGVydmFsLnksIFNpemUueCwgUHJpY2UueSwgQ2F0LmNhdC55LCBDb250ZW50LlJhdGluZy55KSAlPiUgcHJpbnQKY29sbmFtZXMoZmluYWwxKVsxXSA9ICJBcHAiCmNvbG5hbWVzKGZpbmFsMSlbMl0gPSAiUmV2aWV3cyIKY29sbmFtZXMoZmluYWwxKVszXSA9ICJSYXRpbmciCmNvbG5hbWVzKGZpbmFsMSlbNF0gPSAiSW50ZXJ2YWwiCmNvbG5hbWVzKGZpbmFsMSlbNV0gPSAiU2l6ZSIKY29sbmFtZXMoZmluYWwxKVs2XSA9ICJQcmljZSIKY29sbmFtZXMoZmluYWwxKVs3XSA9ICJDYXRlZ29yeSIKY29sbmFtZXMoZmluYWwxKVs4XSA9ICJDb250ZW50IgpzaG93KChmaW5hbDEpKQpwbG90KGZpbmFsMSkKYGBg