Load libraries and data
library(cem)
## Loading required package: nlme
## Loading required package: lattice
## Loading required package: randomForest
## randomForest 4.6-7
## Type rfNews() to see new features/changes/bug fixes.
## Loading required package: combinat
##
## Attaching package: 'combinat'
##
## The following object is masked from 'package:utils':
##
## combn
##
##
## How to use CEM? Type vignette("cem")
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:nlme':
##
## collapse
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape)
## Loading required package: plyr
##
## Attaching package: 'plyr'
##
## The following objects are masked from 'package:dplyr':
##
## arrange, desc, failwith, id, mutate, summarise, summarize
##
##
## Attaching package: 'reshape'
##
## The following objects are masked from 'package:plyr':
##
## rename, round_any
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following object is masked from 'package:reshape':
##
## expand
##
## Loading required package: Rcpp
##
## Attaching package: 'lme4'
##
## The following object is masked from 'package:nlme':
##
## lmList
setwd("~/Dropbox/Shared folder with Mandy/Jorge/Yelp/Detailed Reviews")
load("reviewsneighborhood_all.RData") #contains individual reviews for each restaurant in DC. we will use them to calculate the average rating and number of reviews over time.
load("restlist_all_unique.RData") #this contains the list of restaurants we are interested in, as well as possible covariates.
load("yipitall.RData") #contains yipit deal information. we need the dates for this analysisdata('LeLonde')
load("periodDeals.Rda")
Le <- data.frame(na.omit(LeLonde))
## Error: object 'LeLonde' not found
Compare size of treated and control group
tr <- which(Le$treated == 1)
## Error: error in evaluating the argument 'x' in selecting a method for function 'which': Error: object 'Le' not found
ct <- which(Le$treated == 0)
## Error: error in evaluating the argument 'x' in selecting a method for function 'which': Error: object 'Le' not found
ntr <- length(tr)
## Error: object 'tr' not found
nct <- length(ct)
## Error: object 'ct' not found
The (unadjusted and therefore likely biased) di↵erence in means is then:
mean(Le$re78[tr]) - mean(Le$re78[ct])
## Error: error in evaluating the argument 'x' in selecting a method for function 'mean': Error: object 'Le' not found
mean(Le$re74[tr]) - mean(Le$re74[ct])
## Error: error in evaluating the argument 'x' in selecting a method for function 'mean': Error: object 'Le' not found
mean(Le$re78[tr]) - mean(Le$re78[ct])
## Error: error in evaluating the argument 'x' in selecting a method for function 'mean': Error: object 'Le' not found
Because the variable treated was not randomly assigned, the pre-treatment covariates dif- fer between the treated and control groups. To see this, we focus on these pre-treatment covariates:
vars <- c("age", "education", "black", "married", "nodegree", "re74", "re75",
"hispanic", "u74", "u75", "q1")
We compute L1 statistic, as well as several unidimensional measures of imbalance via our imbalance function. In our running example:
imbalance(group = Le$treated, data = Le[vars])
## Error: object 'Le' not found
mat <- cem(treatment = "treated", data = Le, drop = "re78", eval.imbalance = TRUE,
keep.all = TRUE)
## Error: object 'Le' not found
mat
## Error: object 'mat' not found
In general, we want to set the coarsening for each variable so that substantively indistinguish- able values are grouped and assigned the same numerical value. Groups may be of di↵erent sizes if appropriate. Recall that any coarsening during CEM is used only for matching; the original values of the variables are passed on to the analysis stage for all matched observations.
Let's look at how Categorical variables are treated differently from numerical ones. Q1 has a 5 level scale for the question
levels(Le$q1)
## Error: object 'Le' not found
For categorical variables we can use ordered command to transform from factors
q1.ord <- ordered(Le$q1, levels = c("strongly disagree", "disagree", "neutral",
"agree", "strongly agree"), exclude = "no opinion")
## Error: object 'Le' not found
Or we can do a grouping ourselves
q1.grp <- list(c("strongly agree", "agree"), c("neutral", "no opinion"), c("strongly disagree",
"disagree"))
the following discretization of years of education corresponds to di↵erent levels of school: grade school (0–6), middle school (7–8), high school (9–12), college (13–16), and graduate school (> 16).
table(Le$education)
## Error: object 'Le' not found
qplot(data = Le, education, geom = "histogram")
## Error: object 'Le' not found
define cutting points:
educut <- c(0, 6.5, 8.5, 12.5, 17)
RE-run results:
mat1 <- cem(treatment = "treated", data = Le, drop = "re78", cutpoints = list(education = educut),
grouping = list(q1 = q1.grp))
## Error: object 'Le' not found
What about relaxing matches?
tab <- relax.cem(mat, Le, depth = 1, perc = 0.3)
## Error: object 'mat' not found
Using the output from cem, we can estimate SATT via the att function. The simplest approach requires a weighted difference in means (unless k2k was used, in which case no weights are required). For convenience, we compute this as a regression of the outcome variable on a constant and the treatment variable,
att(mat, re78 ~ treated, data = Le)
## Error: object 'mat' not found
att(mat1, re78 ~ treated, data = Le)
## Error: object 'mat1' not found
or include a more complex model:
att(mat, re78 ~ treated + re74, data = Le)
## Error: object 'mat' not found
att(mat1, re78 ~ treated + re74, data = Le)
## Error: object 'mat1' not found
Rating/Review ~ Characteristics + Treatment (no time)
Compare size of treated and control group
tr <- which(restlist_all_unique$inyipit == TRUE)
ct <- which(restlist_all_unique$inyipit == FALSE)
ntr <- length(tr) #189
nct <- length(ct) #1832
The (unadjusted and therefore likely biased) difference in means in the outcome is then:
mean(restlist_all_unique$reviewscrrating.num[tr], na.rm = TRUE) - mean(restlist_all_unique$reviewscrrating.num[ct],
na.rm = TRUE)
## [1] 0.02885
mean(restlist_all_unique$reviewscrreviews[tr], na.rm = TRUE) - mean(restlist_all_unique$reviewscrreviews[ct],
na.rm = TRUE)
## [1] 69.47
# slightly higher rating and much higher number of reviews in the TREATED
# group
Because the variable treated was not randomly assigned, the pre-treatment covariates dif- fer between the treated and control groups. To see this, we focus on these pre-treatment covariates:
# Bad balance:
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscracceptcredit)
##
## 0 No Yes
## FALSE 628 63 1141
## TRUE 8 0 181
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrparking)
##
## 0 Garage Garage, Street Garage, Street, Private Lot
## FALSE 974 13 33 3
## TRUE 24 1 8 0
##
## Garage, Street, Valet Garage, Street, Validated Garage, Validated
## FALSE 6 5 1
## TRUE 2 1 0
##
## Garage, Validated, Private Lot Private Lot Street
## FALSE 3 6 696
## TRUE 0 1 131
##
## Street, Private Lot Street, Private Lot, Valet Street, Valet
## FALSE 13 1 39
## TRUE 2 0 13
##
## Street, Validated Street, Validated, Private Lot
## FALSE 7 3
## TRUE 2 0
##
## Street, Validated, Valet Valet Validated Validated, Private Lot
## FALSE 3 21 3 2
## TRUE 0 3 0 0
##
## Validated, Valet
## FALSE 0
## TRUE 1
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrattire)
##
## 0 Casual Dressy Formal (Jacket Required)
## FALSE 672 1075 76 9
## TRUE 10 162 17 0
table(restlist_all_unique$inyipit, restlist_all_unique$numcat)
##
## 0 1 2 3 4
## FALSE 420 957 401 52 2
## TRUE 6 120 52 11 0
# Good balance:
table(restlist_all_unique$inyipit, restlist_all_unique$pricepoint)
##
## 1 2 3 4
## FALSE 1173 527 108 24
## TRUE 42 116 29 2
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrreviews)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## FALSE 478 104 57 45 37 40 36 32 22 23 22 16 23 21 20 10
## TRUE 5 2 4 0 1 0 1 1 0 0 0 0 2 1 5 2
##
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
## FALSE 10 12 14 12 12 6 16 12 5 10 10 8 8 12 10 12
## TRUE 0 2 2 2 3 0 0 0 1 0 0 1 3 2 0 1
##
## 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
## FALSE 8 8 7 10 6 6 7 4 4 8 10 2 5 5 8 6
## TRUE 2 0 0 1 0 1 2 2 2 0 1 1 2 1 1 1
##
## 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
## FALSE 5 7 4 7 7 7 4 4 2 7 3 4 3 6 1 4
## TRUE 1 1 1 1 0 0 1 1 5 0 0 2 0 0 0 2
##
## 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
## FALSE 5 2 9 4 7 3 2 6 7 2 0 7 4 5 2 2
## TRUE 1 0 2 1 1 0 1 1 1 1 1 1 0 0 0 1
##
## 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
## FALSE 3 2 6 2 5 5 2 0 5 2 4 2 6 3 3 1
## TRUE 0 0 2 0 0 0 1 3 5 2 0 0 1 0 1 0
##
## 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
## FALSE 2 6 2 8 3 3 1 5 2 1 6 1 5 4 4 3
## TRUE 0 2 2 0 0 2 0 0 0 0 1 0 0 1 2 4
##
## 112 113 114 115 116 118 119 120 121 122 123 124 125 126 127 128
## FALSE 7 6 5 9 2 1 4 2 2 4 4 4 3 1 1 2
## TRUE 0 1 2 0 1 1 3 0 2 2 0 2 1 0 0 0
##
## 129 130 131 133 134 135 136 137 138 140 141 142 143 145 146 147
## FALSE 4 4 6 4 3 4 1 4 3 3 1 3 1 1 2 2
## TRUE 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1
##
## 149 151 152 153 154 155 156 157 158 159 160 161 163 164 165 167
## FALSE 1 5 2 2 1 1 3 1 1 2 2 2 3 0 0 4
## TRUE 1 0 1 0 0 0 0 0 1 0 1 1 1 1 1 0
##
## 168 169 170 172 173 174 175 176 177 178 179 180 181 183 185 186
## FALSE 2 2 1 0 0 4 1 2 1 2 1 1 3 1 1 2
## TRUE 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 2
##
## 187 189 190 191 193 194 195 197 198 201 202 203 204 205 206 207
## FALSE 1 2 1 0 1 1 1 1 2 1 3 2 2 1 3 3
## TRUE 0 1 0 1 0 0 0 1 0 0 0 1 0 1 0 1
##
## 208 210 213 214 216 217 218 219 220 221 223 224 226 227 228 230
## FALSE 4 4 1 1 2 1 0 1 1 1 1 1 1 3 2 1
## TRUE 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0
##
## 231 232 233 236 240 241 242 243 244 245 247 248 252 253 254 257
## FALSE 0 1 3 2 2 2 1 2 1 1 2 0 1 3 1 1
## TRUE 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0
##
## 258 259 260 262 265 266 271 272 274 275 277 279 281 282 283 286
## FALSE 2 2 0 1 1 1 2 2 1 1 1 1 1 1 1 1
## TRUE 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0
##
## 287 288 291 292 294 295 297 298 299 300 302 303 306 308 310 311
## FALSE 1 1 1 1 1 0 1 0 1 1 0 1 2 1 1 1
## TRUE 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0
##
## 313 315 317 320 321 324 326 328 335 338 344 346 347 348 350 351
## FALSE 2 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1
## TRUE 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
##
## 362 365 367 368 373 378 381 386 388 392 400 402 406 409 417 422
## FALSE 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0
## TRUE 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1
##
## 425 430 456 459 464 468 470 479 484 485 497 515 517 526 528 533
## FALSE 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
## TRUE 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
##
## 539 554 562 565 566 579 589 626 631 640 651 662 669 678 709 719
## FALSE 1 1 1 0 1 1 1 1 0 0 0 1 1 1 0 1
## TRUE 0 0 0 1 0 0 0 0 1 1 1 0 0 0 1 0
##
## 720 768 815 839 844 968 1377 1455 1619 1963
## FALSE 1 1 1 1 1 1 0 1 1 1
## TRUE 0 0 0 0 0 0 1 0 0 0
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrgoodforgroups)
##
## 0 No Yes
## FALSE 715 287 830
## TRUE 13 27 149
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrgoodforkids)
##
## 0 No Yes
## FALSE 734 429 669
## TRUE 16 85 88
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrwaiter)
##
## 0 No Yes
## FALSE 682 559 591
## TRUE 11 43 135
table(restlist_all_unique$inyipit, restlist_all_unique$american)
##
## 0 1
## FALSE 1544 288
## TRUE 148 41
table(restlist_all_unique$inyipit, restlist_all_unique$european)
##
## FALSE TRUE
## FALSE 1676 156
## TRUE 152 37
table(restlist_all_unique$inyipit, restlist_all_unique$asian)
##
## FALSE TRUE
## FALSE 1656 176
## TRUE 153 36
table(restlist_all_unique$inyipit, restlist_all_unique$bar)
##
## 0 1
## FALSE 1618 214
## TRUE 155 34
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrwifi)
##
## 0 Free No Paid
## FALSE 1104 153 566 9
## TRUE 51 22 115 1
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscralcohol)
##
## 0 Beer & Wine Only Full Bar No
## FALSE 721 81 535 495
## TRUE 15 13 128 33
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrnoiselevel)
##
## 0 Average Loud Quiet Very Loud
## FALSE 862 617 109 194 50
## TRUE 17 126 12 23 11
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrsmoking)
##
## 0 No Outdoor Area/ Patio Only Yes
## FALSE 1670 75 78 9
## TRUE 164 9 15 1
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscroutdorseating)
##
## 0 No Yes
## FALSE 682 636 514
## TRUE 10 85 94
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrwheelchair)
##
## 0 No Yes
## FALSE 963 181 688
## TRUE 47 39 103
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrdelivery)
##
## 0 No Yes
## FALSE 764 848 220
## TRUE 18 117 54
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrtakeout)
##
## 0 No Yes
## FALSE 689 154 989
## TRUE 15 24 150
table(restlist_all_unique$inyipit, restlist_all_unique$africanmiddleeastern)
##
## FALSE TRUE
## FALSE 1729 103
## TRUE 162 27
table(restlist_all_unique$inyipit, restlist_all_unique$indian)
##
## FALSE TRUE
## FALSE 1808 24
## TRUE 177 12
# To fix: create dummies:
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrgoodfor)
##
## 0 Breakfast Breakfast, Brunch Brunch Dessert Dessert, Breakfast
## FALSE 966 14 2 7 5 3
## TRUE 24 0 0 1 0 0
##
## Dessert, Late Night Dessert, Late Night, Lunch, Dinner
## FALSE 1 2
## TRUE 0 0
##
## Dessert, Late Night, Lunch, Dinner, Breakfast, Brunch
## FALSE 1
## TRUE 0
##
## Dessert, Lunch Dessert, Lunch, Breakfast
## FALSE 3 2
## TRUE 1 0
##
## Dessert, Lunch, Breakfast, Brunch Dessert, Lunch, Brunch
## FALSE 2 1
## TRUE 0 0
##
## Dessert, Lunch, Dinner Dinner Dinner, Breakfast, Brunch
## FALSE 5 285 1
## TRUE 0 83 0
##
## Dinner, Brunch Late Night Late Night, Breakfast Late Night, Dinner
## FALSE 8 57 2 46
## TRUE 2 13 1 9
##
## Late Night, Dinner, Brunch Late Night, Lunch
## FALSE 1 4
## TRUE 0 0
##
## Late Night, Lunch, Breakfast Late Night, Lunch, Brunch
## FALSE 1 1
## TRUE 0 0
##
## Late Night, Lunch, Dinner Late Night, Lunch, Dinner, Breakfast
## FALSE 23 1
## TRUE 3 0
##
## Lunch Lunch, Breakfast Lunch, Breakfast, Brunch Lunch, Brunch
## FALSE 183 32 5 1
## TRUE 15 2 0 1
##
## Lunch, Dinner Lunch, Dinner, Breakfast
## FALSE 161 1
## TRUE 32 1
##
## Lunch, Dinner, Breakfast, Brunch Lunch, Dinner, Brunch
## FALSE 3 2
## TRUE 0 1
table(restlist_all_unique$inyipit, restlist_all_unique$reviewscrambiance)
##
## 0 Casual Casual, Intimate Classy Classy, Casual
## FALSE 1190 375 6 25 4
## TRUE 44 66 3 6 3
##
## Classy, Casual, Intimate Classy, Intimate Classy, Trendy
## FALSE 1 3 3
## TRUE 0 3 1
##
## Classy, Trendy, Upscale Classy, Trendy, Upscale, Casual
## FALSE 2 1
## TRUE 0 0
##
## Classy, Trendy, Upscale, Touristy, Casual Classy, Upscale
## FALSE 0 12
## TRUE 1 1
##
## Classy, Upscale, Casual Divey Divey, Casual Hipster
## FALSE 0 28 22 9
## TRUE 1 4 4 2
##
## Hipster, Casual Hipster, Divey Hipster, Divey, Casual
## FALSE 9 1 1
## TRUE 6 0 0
##
## Hipster, Romantic, Classy, Trendy, Intimate Hipster, Trendy
## FALSE 1 1
## TRUE 0 1
##
## Hipster, Trendy, Casual Hipster, Trendy, Casual, Intimate
## FALSE 5 1
## TRUE 1 0
##
## Hipster, Trendy, Intimate Intimate Romantic Romantic, Casual
## FALSE 1 6 2 3
## TRUE 0 2 2 1
##
## Romantic, Casual, Intimate Romantic, Classy
## FALSE 2 7
## TRUE 1 0
##
## Romantic, Classy, Casual, Intimate Romantic, Classy, Intimate
## FALSE 0 1
## TRUE 2 3
##
## Romantic, Classy, Trendy, Casual
## FALSE 1
## TRUE 0
##
## Romantic, Classy, Trendy, Intimate
## FALSE 2
## TRUE 0
##
## Romantic, Classy, Trendy, Upscale Romantic, Classy, Upscale
## FALSE 2 4
## TRUE 0 0
##
## Romantic, Classy, Upscale, Intimate Romantic, Intimate
## FALSE 1 12
## TRUE 0 2
##
## Romantic, Touristy, Casual, Intimate Romantic, Trendy
## FALSE 1 1
## TRUE 0 0
##
## Romantic, Trendy, Casual, Intimate Romantic, Upscale
## FALSE 1 0
## TRUE 0 1
##
## Romantic, Upscale, Intimate Touristy Touristy, Casual Trendy
## FALSE 1 6 2 53
## TRUE 0 0 0 20
##
## Trendy, Casual Trendy, Upscale Upscale Upscale, Casual
## FALSE 16 2 4 1
## TRUE 5 2 1 0
vars <- c("pricepoint", "reviewscrreviews", "reviewscrgoodforgroups", "reviewscrgoodforkids",
"reviewscrwaiter", "american", "european", "asian", "bar")
temp <- na.omit(select(restlist_all_unique, reviewscrrating.num, inyipit, pricepoint,
reviewscrreviews, reviewscrgoodforgroups, reviewscrgoodforkids, reviewscrwaiter,
american, european, asian, bar), )
temp$reviewscrgoodforgroups <- factor(temp$reviewscrgoodforgroups)
temp$reviewscrgoodforkids <- factor(temp$reviewscrgoodforkids)
temp$reviewscrwaiter <- factor(temp$reviewscrwaiter)
temp$european <- as.integer(temp$european)
temp$asian <- as.integer(temp$asian)
temp$inyipit <- as.integer(temp$inyipit)
# temp[temp == '0']<- NA
sum(is.na(temp))
## [1] 0
We compute L1 statistic, as well as several unidimensional measures of imbalance via our imbalance function. In our running example:
imbalance(group = temp$inyipit, data = temp, drop = c("reviewscrrating.num",
"inyipit"))
##
## Multivariate Imbalance Measure: L1=0.633
## Percentage of local common support: LCS=19.4%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## pricepoint -0.37856 (diff) 0.31368 0 -1 -1 0 0
## reviewscrreviews -51.70169 (diff) 0.00000 0 -33 -58 -55 586
## reviewscrgoodforgroups 29.28125 (Chi2) 0.19283 NA NA NA NA NA
## reviewscrgoodforkids 26.24321 (Chi2) 0.14586 NA NA NA NA NA
## reviewscrwaiter 61.26229 (Chi2) 0.29943 NA NA NA NA NA
## american -0.01355 (diff) 0.01355 0 0 0 0 0
## european -0.09178 (diff) 0.09178 0 0 0 0 0
## asian -0.06071 (diff) 0.06071 0 0 0 0 0
## bar -0.03560 (diff) 0.03560 0 0 0 0 0
res <- cem(treatment = "inyipit", data = temp, drop = "reviewscrrating.num",
eval.imbalance = TRUE, keep.all = TRUE)
res
## G0 G1
## All 1354 184
## Matched 783 161
## Unmatched 571 23
##
##
## Multivariate Imbalance Measure: L1=0.353
## Percentage of local common support: LCS=50.9%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## pricepoint -2.220e-16 (diff) 1.388e-17 0 0 0 0 0
## reviewscrreviews -1.076e+01 (diff) 0.000e+00 0 -17 -14 -11 27
## reviewscrgoodforgroups 6.640e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrgoodforkids 1.525e+01 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrwaiter 3.328e+01 (Chi2) 0.000e+00 NA NA NA NA NA
## american 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## european -2.776e-17 (diff) 0.000e+00 0 0 0 0 0
## asian -2.776e-17 (diff) 1.388e-17 0 0 0 0 0
## bar 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
levels(temp$reviewscrgoodforgroups)
## [1] "0" "No" "Yes"
levels(temp$reviewscrgoodforkids)
## [1] "0" "No" "Yes"
levels(temp$reviewscrwaiter)
## [1] "0" "No" "Yes"
For categorical variables we can use ordered command to transform from factors. Should do this for which variables?
# example: q1.ord <- ordered(Le$q1, levels=c('strongly
# disagree','disagree','neutral','agree','strongly agree'), exclude='no
# opinion')
Or we can do a grouping ourselves
# example: q1.grp <- list(c('strongly agree', 'agree'), c('neutral', 'no
# opinion'),c('strongly disagree', 'disagree'))
the following discretization of years of education corresponds to di↵erent levels of school: grade school (0–6), middle school (7–8), high school (9–12), college (13–16), and graduate school (> 16).
table(temp$reviewscrreviews)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 106 61 45 38 40 37 33 22 23 22 16 25 22 25 12
## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## 10 14 16 14 15 6 16 12 6 10 10 9 11 14 10
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## 13 10 8 7 11 6 7 9 6 6 8 11 3 7 6
## 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 9 7 6 8 5 8 7 7 5 5 7 7 3 6 3
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## 6 1 6 6 2 11 5 8 3 3 7 8 3 1 8
## 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## 4 5 2 3 3 2 8 2 5 5 3 3 10 4 4
## 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
## 2 7 3 4 1 2 8 4 8 3 5 1 5 2 1
## 106 107 108 109 110 111 112 113 114 115 116 118 119 120 121
## 7 1 5 5 6 7 7 7 7 9 3 2 7 2 4
## 122 123 124 125 126 127 128 129 130 131 133 134 135 136 137
## 6 4 6 4 1 1 2 4 4 6 5 3 4 2 5
## 138 140 141 142 143 145 146 147 149 151 152 153 154 155 156
## 3 3 2 4 1 2 3 3 2 5 3 2 1 1 3
## 157 158 159 160 161 163 164 165 167 168 169 170 172 173 174
## 1 2 2 3 3 4 1 1 4 2 2 1 1 1 4
## 175 176 177 178 179 180 181 183 185 186 187 189 190 191 193
## 2 2 2 3 1 1 3 2 1 4 1 3 1 1 1
## 194 195 197 198 201 202 203 204 205 206 207 208 210 213 214
## 1 1 2 2 1 3 3 2 2 3 4 4 4 2 1
## 216 217 218 219 220 221 223 224 226 227 228 230 231 232 233
## 2 1 1 1 2 1 1 2 1 3 2 1 1 1 3
## 236 240 241 242 243 244 245 247 248 252 253 254 257 258 259
## 2 2 2 1 2 1 1 3 1 1 3 1 1 2 2
## 260 262 265 266 271 272 274 275 277 279 281 282 283 286 287
## 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1
## 288 291 292 294 295 297 298 299 300 302 303 306 308 310 311
## 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1
## 313 315 317 320 321 324 326 328 335 338 344 346 347 348 350
## 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 351 362 365 367 368 373 378 381 386 388 392 400 402 406 409
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## 417 422 425 430 456 459 464 468 470 479 484 485 497 515 517
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 526 528 533 539 554 562 565 566 579 589 626 631 640 651 662
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 669 678 709 719 720 768 815 839 844 968 1377 1455 1619 1963
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1
qplot(data = temp, reviewscrreviews, geom = "histogram", fill = inyipit) + xlim(1,
100)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## Error: 'x' and 'units' must have length > 0
define cutting points:
reviewscrreviewsCut <- c(1, 9.1, 38.1, 113.1, 1963)
RE-run results:
res1 <- cem(treatment = "inyipit", data = temp, drop = "reviewscrrating.num",
cutpoints = list(reviewscrreviews = reviewscrreviewsCut))
res1
## G0 G1
## All 1354 184
## Matched 677 161
## Unmatched 677 23
# Let's compare
res$breaks$reviewscrreviews
## [1] 1.0 179.4 357.7 536.1 714.5 892.8 1071.2 1249.5 1427.9 1606.3
## [11] 1784.6 1963.0
res1$breaks$reviewscrreviews
## [1] 1.0 9.1 38.1 113.1 1963.0
What about relaxing matches?
restab <- relax.cem(res, temp, depth = 1, perc = 0.3)
## Executing 22 different relaxations
##
|
| | 0%
|
|======== | 12%
|
|================ | 25%
|
|======================== | 38%
|
|================================ | 50%
|
|========================================= | 62%
|
|================================================= | 75%
|
|========================================================= | 88%
|
|=================================================================| 100%
Using the output from cem, we can estimate SATT via the att function. The simplest approach requires a weighted difference in means (unless k2k was used, in which case no weights are required). For convenience, we compute this as a regression of the outcome variable on a constant and the treatment variable,
att(res, reviewscrrating.num ~ inyipit, data = temp)
##
## G0 G1
## All 1354 184
## Matched 783 161
## Unmatched 571 23
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: -0.026872 (p.value=0.600785)
## 95% conf. interval: [-0.127489, 0.073745]
att(res1, reviewscrrating.num ~ inyipit, data = temp)
##
## G0 G1
## All 1354 184
## Matched 677 161
## Unmatched 677 23
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: -0.061423 (p.value=0.203177)
## 95% conf. interval: [-0.155952, 0.033106]
or include a more complex model:
summary(m.model1 <- att(res, reviewscrrating.num ~ inyipit * pricepoint + reviewscrreviews +
reviewscrgoodforgroups + reviewscrgoodforkids + reviewscrwaiter + american +
european + asian + bar, data = temp))
##
## Treatment effect estimation for data:
##
## G0 G1
## All 1354 184
## Matched 783 161
## Unmatched 571 23
##
## Linear regression model estimated on matched data only
##
## Coefficients:
## Estimate Std. Error t value p-value
## (Intercept) 2.699062 0.131637 20.50 < 2e-16 ***
## inyipit 0.364090 0.165923 2.19 0.0285 *
## pricepoint 0.197989 0.048830 4.05 5.4e-05 ***
## reviewscrreviews 0.000851 0.000196 4.34 1.6e-05 ***
## reviewscrgoodforgroupsNo 0.521378 0.123674 4.22 2.7e-05 ***
## reviewscrgoodforgroupsYes 0.429280 0.120931 3.55 0.0004 ***
## reviewscrgoodforkidsNo -0.084496 0.188871 -0.45 0.6547
## reviewscrgoodforkidsYes 0.058372 0.181015 0.32 0.7472
## reviewscrwaiterNo -0.092230 0.204304 -0.45 0.6518
## reviewscrwaiterYes -0.291434 0.210521 -1.38 0.1666
## american 0.006222 0.056789 0.11 0.9128
## european 0.081945 0.053991 1.52 0.1294
## asian -0.165354 0.055685 -2.97 0.0031 **
## bar 0.017077 0.053775 0.32 0.7509
## inyipit:pricepoint -0.206475 0.081686 -2.53 0.0116 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m.model2 <- att(res1, reviewscrrating.num ~ inyipit + pricepoint + reviewscrreviews +
reviewscrgoodforgroups + reviewscrgoodforkids + reviewscrwaiter + american +
european + asian + bar, data = temp))
##
## Treatment effect estimation for data:
##
## G0 G1
## All 1354 184
## Matched 677 161
## Unmatched 677 23
##
## Linear regression model estimated on matched data only
##
## Coefficients:
## Estimate Std. Error t value p-value
## (Intercept) 2.990962 0.125199 23.89 < 2e-16 ***
## inyipit -0.060133 0.045700 -1.32 0.1886
## pricepoint 0.128223 0.045935 2.79 0.0054 **
## reviewscrreviews 0.000733 0.000110 6.64 5.7e-11 ***
## reviewscrgoodforgroupsNo 0.731882 0.120876 6.05 2.1e-09 ***
## reviewscrgoodforgroupsYes 0.679881 0.116619 5.83 8.0e-09 ***
## reviewscrgoodforkidsNo -0.219102 0.182509 -1.20 0.2303
## reviewscrgoodforkidsYes -0.183155 0.175066 -1.05 0.2958
## reviewscrwaiterNo -0.209767 0.197228 -1.06 0.2878
## reviewscrwaiterYes -0.457910 0.204105 -2.24 0.0251 *
## american -0.046376 0.051917 -0.89 0.3720
## european 0.051803 0.052948 0.98 0.3282
## asian -0.119551 0.054775 -2.18 0.0293 *
## bar -0.018993 0.054319 -0.35 0.7267
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We want to now match the restaurants using time in a LME model instead of a simple linear model. The idea is that we need to have a 6 month treatment/no creatment panel where the dependent variables are the average ratings of the reviews on that period
## 1. Restrict yipitall to contain only restaurants that have restaurant
## reviews
yipitall.inyelp <- yipitall
restlist.inyipit <- subset(restlist_all_unique, inyipit)
# restlist.inyipit has the restlist of restaurants that are in yipit
yipitall.inyelp$inyelp <- (yipitall$Phone %in% restlist.inyipit$restphones)
yipitall.inyelp <- (yipitall.inyelp[yipitall.inyelp$inyelp == TRUE, ]) #317 deals with
summarise(yipitall.inyelp, count = n_distinct(Phone))
## count
## 1 189
length(unique(yipitall.inyelp$Phone)) #204 unique restaurants in the deals set
## [1] 189
# 317-204 = 113 deals with repeated restaurants
## 2. Create Time Periods of 2, 4 weeks for both the start and end date of
## the deal
periodLength <- 2 #meaning 2 weeks
# min time (earliest deal date)
minDate <- min(min(yipitall.inyelp$Date.Added.Num), min(yipitall.inyelp$Date.Ended.Num))
yipitall.inyelp$Date.Added.Per <- yipitall.inyelp$Date.Added.Num - minDate +
1
yipitall.inyelp$Date.Added.Per <- ceiling(yipitall.inyelp$Date.Added.Per/(7 *
periodLength))
yipitall.inyelp$Date.Ended.Per <- yipitall.inyelp$Date.Ended.Num - minDate +
1
yipitall.inyelp$Date.Ended.Per <- ceiling(yipitall.inyelp$Date.Ended.Per/(7 *
periodLength))
# Let's look at deal duration by Deal Company
yipitall.inyelp$Date.Duration <- yipitall.inyelp$Date.Ended.Num - yipitall.inyelp$Date.Added.Num
yipitall.inyelp$Date.Duration.Weeks <- yipitall.inyelp$Date.Duration/7
qplot(data = yipitall.inyelp, y = Date.Duration.Weeks, x = Site, geom = "boxplot") +
ylim(0, 4)
## Warning: Removed 3 rows containing non-finite values (stat_boxplot).
# let's summarize this
summarise(group_by(yipitall.inyelp, Site), duration = mean(Date.Duration))
## duration
## 1 4.984
## 3. Let's now create the matrix of deals in time periods
# how many periods do we have
maxPeriods <- max(max(yipitall.inyelp$Date.Added.Per), max(yipitall.inyelp$Date.Ended.Per))
Where each restaurant is repeated over the 15 period (15 rows per restaurant).
# Copy restlist_all_unique for CEM analysis. It will end molten
restlist_melt <- restlist_all_unique
# Create matrix of 2000 x 15 (periods)
matrixPeriods1 <- matrix(0, nrow(restlist_all_unique), maxPeriods)
# bind the restlist and the matrix
restlist_melt <- cbind(restlist_melt, matrixPeriods1)
# melt the data
restlist_melt <- melt(restlist_melt, names(restlist_melt)[1:49])
names(restlist_melt)[names(restlist_melt) == "variable"] <- "Period"
names(restlist_melt)[names(restlist_melt) == "restphones"] <- "Phone"
# restlist_melt$value <- NULL #unsure of whether I should take away these 0
# values
# Can I merge with my existing PeriodDeal df? Are all my PeriodDeal phones
# in the restlist?
sum(periodDeals$Phone %in% restlist_melt$Phone)
## [1] 2790
# so all my phones in periodDeals are in the restlist_melt How many deals
# are in the PeriodDeal? That way I can verify I still have those deals.
sum(periodDeals$numDeals) #414, so I should have this many deals after I merge
## [1] 414
# Now I can merge with the data of PeriodDeals that was already calculated
restlist_melt <- (merge(x = restlist_melt, y = periodDeals, by = c("Phone",
"Period"), all = TRUE))
sum(restlist_melt$numDeals, na.rm = TRUE) #414 as expected
## [1] 414
sum(unique(restlist_melt$Phone[!is.na(restlist_melt$numDeals)]) %in% unique(periodDeals$Phone)) #186 restaurants with deals
## [1] 186
# Let's look at one restaurant View(restlist_melt[restlist_melt$Phone ==
# '2022231818' | restlist_melt$Phone =='2022341969',]) Seems to have worked.
# However it added NAs to the numDeals column since there were no deals. We
# must replace this by 0
restlist_melt$oldNumDeals <- restlist_melt$numDeals # If OldNumDeals == 0, means there was no deal for that yipit restaurant in the period i. If OldNumDeals == NA, means that the restaurant is not in yipit.
restlist_melt$numDeals[is.na(restlist_melt$numDeals)] <- 0
restlist_melt$numDealsTr <- restlist_melt$numDeals > 0
sum(restlist_melt$numDealsTr) # 363 treatment YES rows out of 30315
## [1] 363
# restrict restlist to only ID columns
restlist_temp <- restlist_all_unique[c("restnames", "restphones", "restaddresses")]
names(restlist_temp)[1] <- "reviewscrname"
names(restlist_temp)[2] <- "Phone"
names(restlist_temp)[3] <- "reviewscraddress"
nrow(reviewsneighborhood_all) #143745
## [1] 143745
reviews.subset <- merge(x = reviewsneighborhood_all, y = restlist_temp, by = c("reviewscrname",
"reviewscraddress"), all.x = FALSE, all.y = FALSE)
nrow(reviews.subset) #all.x TRUE = 143824 FALSE= 134835
## [1] 134835
## all.x= FALSE means keep only the ones where we have a match and a phone.
## Probably a good idea even if we lose some reviews.
length(unique(reviews.subset$Phone)) #all.x TRUE = 1535 FALSE = 1534
## [1] 1534
length(unique(reviews.subset$reviewscrname)) #all.x TRUE = 1535 FALSE = 1534
## [1] 1314
# test cases: 1. restaurant with phone AMS
reviews.subset[1, ]
## reviewscrname reviewscraddress
## 1 14 K Restaurant & Lounge 1001 14th St NW, Washington, DC 20005
## reviewsids reviewsusername
## 1 review_BmCtCQT2cYya3IkYjfhIMg \n\t\t\t\t\t\talex m.\n\t\t\t\t
## reviewslocation reviewsfriendscount reviewscount reviewsrating
## 1 Norfolk, VA 4 friends 68 reviews 4
## reviewsdates reviewscheckin
## 1 9/11/2012 0
## reviewscomments
## 1 Excellent food... Loved my burger.Staff was very friendly, helpful, and attentive.Will go back.
## reviewsuseful reviewsfunny reviewscool numdate Phone
## 1 15594 2022187575
restlist_all_unique[restlist_all_unique$restnames == "Amsterdam Falafelshop",
2:5]
## restnames restlinks
## 4350 Amsterdam Falafelshop /biz/amsterdam-falafelshop-washington
## restaddresses restphones
## 4350 2425 18th St NW, Washington, DC 20009 2022341969
# Phone should match
# 2. restaurant without a phone in restlist (take away those reviews)
restlist_all_unique[restlist_all_unique$restnames == "Alero Restaurant", 2:5]
## restnames restlinks
## 4085 Alero Restaurant /biz/alero-restaurant-washington-4
## 5043 Alero Restaurant /biz/alero-restaurant-washington
## restaddresses restphones
## 4085 1301 U St NW, Washington, DC 20009 2024622322
## 5043 3500 Connecticut Ave NW, Washington, DC 20008 2029662530
# this restaurant name has 3 different restaurants. 2 with phones and 1
# without View(reviews.subset[reviews.subset$reviewscrname == 'Alero
# Restaurant',]) what we get the row for which a phone was found and not the
# rows/reviews for which there was no match in the restlist
# blank phones View(reviews.subset[reviews.subset$Phone =='',]) This is for
# the crepes at the market
## 2. Now let's get those depedent variables Objective is to have a time
## series of the new reviews and the avg. rating in the time period.
# a. Compare the number of unique restaurants in both lists
# How many unique restaurants in restlist_melt?
length(unique(restlist_melt$Phone)) #2021
## [1] 2021
length(unique(restlist_melt$restnames)) #1744
## [1] 1744
length(unique(restlist_melt$restaddresses)) #1688
## [1] 1688
# How many unique restaurants in our reviews subset?
length(unique(reviews.subset$Phone)) #1534
## [1] 1534
length(unique(reviews.subset$reviewscrname)) #1314
## [1] 1314
length(unique(reviews.subset$reviewscraddress)) #1357
## [1] 1357
# Seems we should stay merging at the Phone level. # Must make sure they
# have the same phones for the merge to work well
# Find the minium date for the deals minDate<- 15324 #it's what's calculated
# above.
# b. Create time period column Let's do the times in period times.
reviews.subset$reviewsdatesNum <- as.numeric(as.Date(reviews.subset$reviewsdates,
"%m/%d/%Y"))
reviews.subset$reviewdatesPer <- reviews.subset$reviewsdatesNum - minDate +
1
reviews.subset$reviewdatesPer <- ceiling(reviews.subset$reviewdatesPer/(7 *
periodLength))
# 1. Save the reviews.inyipit.subset into another variable so that we can
# compute before deal summaries reviews.subset.temp <- reviews.subset
# #before we remove the reviews before/after the deal period
nrow(reviews.subset) #134835
## [1] 134835
# c. Discard the negative periods and the ones over our yipit dataset
reviews.subset <- subset(reviews.subset, reviewdatesPer > 0)
reviews.subset <- subset(reviews.subset, reviewdatesPer <= maxPeriods)
nrow(reviews.subset) #23169
## [1] 23169
# Hence by discarding the reviews before and after deal periods we lose
# 111666 reviews
detach("package:dplyr", unload = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, desc, failwith, id, mutate, summarise, summarize
##
## The following object is masked from 'package:nlme':
##
## collapse
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# d. Use group by to summarize
reviews.subset <- group_by(reviews.subset, Phone, reviewdatesPer)
reviews.subset.summary <- summarize(reviews.subset, meanRating = mean(reviewsrating),
numReviews = n(), sdRating = sd(reviewsrating))
names(reviews.subset.summary)[2] <- "Period"
# View(subset(reviews.subset, Phone == '' & reviewdatesPer == 1)) Seems to
# have worked
# e. Merge with molten restlist
length(unique(reviews.subset.summary$Phone)) #1263
## [1] 1263
length(unique(restlist_melt$Phone)) #2021
## [1] 2021
# 2021-1263 = 758 restaurants without any reviews at any of the deal
# periods. They will get NAs in the DVs: numReviews, Ratings, Std dev.
# Those are okay as well.
sum(unique(reviews.subset.summary$Phone) %in% restlist_melt$Phone) #1263
## [1] 1263
# all the reviews have a phone match in the restlist as by design
restlist_melt_rating <- (merge(restlist_melt, reviews.subset.summary, by = c("Phone",
"Period"), all = TRUE))
# The NAs in the numReviews means that they did not have any reviews coming
# in for that period. We should set those to 0.
sum(is.na(restlist_melt_rating$numReviews)) #20918 out of 30315 or 9397 periods where there are reviews coming
## [1] 20918
restlist_melt_rating$numReviews[is.na(restlist_melt_rating$numReviews)] <- 0
sum(restlist_melt_rating$numReviews == 0) #20918
## [1] 20918
# However, leave meanRating and SDRating as NAs since they are really NAs.
tail(restlist_melt_rating[c("Phone", "restnames", "numDeals", "meanRating",
"numReviews")])
## Phone restnames numDeals meanRating numReviews
## 30310 9144892897 CapMac 0 3.333 3
## 30311 9144892897 CapMac 0 4.000 1
## 30312 9144892897 CapMac 0 4.000 3
## 30313 9144892897 CapMac 0 3.500 2
## 30314 9144892897 CapMac 0 NA 0
## 30315 9144892897 CapMac 0 NA 0
With our new molten data set, we can do a basic matching. That is, we can match on: 1. Restaurant Characteristics 3. On time periods 2. Previous deal history (never did a deal, did a deal in the last 6-months, did a deal in the last 12-months)
Compare size of treated and control group
tr <- which(restlist_melt_rating$numDealsTr == 1)
ct <- which(restlist_melt_rating$numDealsTr == 0)
ntr <- length(tr)
nct <- length(ct)
The (unadjusted and therefore likely biased) di↵erence in means is then:
mean(restlist_melt_rating$meanRating[tr], na.rm = TRUE) - mean(restlist_melt_rating$meanRating[ct],
na.rm = TRUE)
## [1] -0.2234
# Per period, the rating is .22 smaller on average for treated restaurants
mean(restlist_melt_rating$numReviews[tr], na.rm = TRUE) - mean(restlist_melt_rating$numReviews[ct],
na.rm = TRUE)
## [1] 1.295
# Per period, the number of reviews 1.29 higher on average for treated
# restaurants
Because the variable treated was not randomly assigned, the pre-treatment covariates dif- fer between the treated and control groups. To see this, we focus on these pre-treatment covariates.
We are going to match first restaurant characterstics
# What are the treatment variales?
table(restlist_melt_rating$inyipit) #2800 rows where the restaurant is in yipit or 189 restaurtants
##
## FALSE TRUE
## 27480 2835
table(restlist_melt_rating$numDealsTr) #363 restaurant periods where treatment is YES
##
## FALSE TRUE
## 29952 363
# Testing all variables spread table(restlist_melt_rating$numDealsTr,
# restlist_melt_rating$meanRating) table(restlist_melt_rating$numDealsTr,
# restlist_melt_rating$)
###### Good Balance: Early round:
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$Period)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13
## FALSE 2016 1979 1973 1991 2002 2001 1987 1999 1995 1999 2000 1995 1998
## TRUE 5 42 48 30 19 20 34 22 26 22 21 26 23
##
## 14 15
## FALSE 1999 2018
## TRUE 22 3
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$pricepoint)
##
## 1 2 3 4
## FALSE 18151 9416 2000 385
## TRUE 74 229 55 5
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrgoodforgroups)
##
## 0 No Yes
## FALSE 10894 4672 14386
## TRUE 26 38 299
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrgoodforkids)
##
## 0 No Yes
## FALSE 11219 7549 11184
## TRUE 31 161 171
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrwaiter)
##
## 0 No Yes
## FALSE 10374 8958 10620
## TRUE 21 72 270
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$american) #84
##
## 0 1
## FALSE 25101 4851
## TRUE 279 84
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$asian) #76
##
## FALSE TRUE
## FALSE 26848 3104
## TRUE 287 76
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$bar) #72
##
## 0 1
## FALSE 26304 3648
## TRUE 291 72
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$european) #70
##
## FALSE TRUE
## FALSE 27127 2825
## TRUE 293 70
#### Still to explore:
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscralcohol)
##
## 0 Beer & Wine Only Full Bar No
## FALSE 11009 1388 9694 7861
## TRUE 31 22 251 59
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrnoiselevel)
##
## 0 Average Loud Quiet Very Loud
## FALSE 13147 10896 1792 3222 895
## TRUE 38 249 23 33 20
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscroutdorseating)
##
## 0 No Yes
## FALSE 10358 10671 8923
## TRUE 22 144 197
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrwheelchair)
##
## 0 No Yes
## FALSE 15061 3223 11668
## TRUE 89 77 197
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrdelivery)
##
## 0 No Yes
## FALSE 11696 14244 4012
## TRUE 34 231 98
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrtakeout)
##
## 0 No Yes
## FALSE 10529 2620 16803
## TRUE 31 50 282
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$yipit.appearances) ## tell us the deal likelihood of the restaurants in our yipit set. Does not tell us if restaurants outside our DealPeriods did ever do a deal
##
## 0 1 2 3 4 5 6 7 8 9 10
## FALSE 27480 830 390 307 116 223 111 73 68 39 40
## TRUE 0 100 45 38 19 32 24 17 22 6 5
##
## 11 12 14 17 19 21 22 24 27
## FALSE 59 39 60 27 14 24 28 12 12
## TRUE 16 6 15 3 1 6 2 3 3
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$url.dummy)
##
## FALSE TRUE
## FALSE 13253 16699
## TRUE 37 326
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$africanmiddleeastern) #45
##
## FALSE TRUE
## FALSE 28047 1905
## TRUE 318 45
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$pizza) #32
##
## FALSE TRUE
## FALSE 27839 2113
## TRUE 331 32
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$breakfast) #31
##
## FALSE TRUE
## FALSE 27178 2774
## TRUE 332 31
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$sandwiches) #23
##
## FALSE TRUE
## FALSE 25160 4792
## TRUE 340 23
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$indian) #22
##
## FALSE TRUE
## FALSE 29434 518
## TRUE 341 22
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$latin) #18
##
## FALSE TRUE
## FALSE 28005 1947
## TRUE 345 18
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$southern) #11
##
## FALSE TRUE
## FALSE 29348 604
## TRUE 352 11
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$steakhouse) #7
##
## FALSE TRUE
## FALSE 29569 383
## TRUE 356 7
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$seafood) #6
##
## FALSE TRUE
## FALSE 29268 684
## TRUE 357 6
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$vegetarian) #6
##
## FALSE TRUE
## FALSE 29583 369
## TRUE 357 6
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$foodstands) #1
##
## FALSE TRUE
## FALSE 29248 704
## TRUE 362 1
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$fastfood) #0
##
## FALSE TRUE
## FALSE 29097 855
## TRUE 363 0
###### Bad Balance
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscracceptcredit) #No
##
## 0 No Yes
## FALSE 9527 945 19480
## TRUE 13 0 350
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrwifi) #Paid
##
## 0 Free No Paid
## FALSE 17240 2577 9986 149
## TRUE 85 48 229 1
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrsmoking) #Yes
##
## 0 No Outdoor Area/ Patio Only Yes
## FALSE 27199 1234 1370 149
## TRUE 311 26 25 1
###### Unsure:
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrreviews) #two restaurants where it shows 0 reviews in the restlist, yet it was in our reviews table Primi Piatti, Young Chow
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12
## FALSE 7237 1588 904 675 569 600 551 494 330 345 330 240 373
## TRUE 8 2 11 0 1 0 4 1 0 0 0 0 2
##
## 13 14 15 16 17 18 19 20 21 22 23 24 25
## FALSE 328 366 178 150 207 234 208 221 90 240 180 89 150
## TRUE 2 9 2 0 3 6 2 4 0 0 0 1 0
##
## 26 27 28 29 30 31 32 33 34 35 36 37 38
## FALSE 150 132 160 208 150 194 146 120 105 164 90 104 131
## TRUE 0 3 5 2 0 1 4 0 0 1 0 1 4
##
## 39 40 41 42 43 44 45 46 47 48 49 50 51
## FALSE 88 88 120 163 43 100 89 134 104 88 119 72 118
## TRUE 2 2 0 2 2 5 1 1 1 2 1 3 2
##
## 52 53 54 55 56 57 58 59 60 61 62 63 64
## FALSE 105 105 74 74 97 105 45 86 45 90 15 88 88
## TRUE 0 0 1 1 8 0 0 4 0 0 0 2 2
##
## 65 66 67 68 69 70 71 72 73 74 75 76 77
## FALSE 30 163 74 118 45 40 104 116 44 14 119 60 75
## TRUE 0 2 1 2 0 5 1 4 1 1 1 0 0
##
## 78 79 80 81 82 83 84 85 86 87 88 89 90
## FALSE 30 44 45 30 117 30 75 75 43 40 140 51 60
## TRUE 0 1 0 0 3 0 0 0 2 5 10 9 0
##
## 91 92 93 94 95 96 97 98 99 100 101 102 103
## FALSE 30 102 45 58 15 30 113 58 120 45 72 15 75
## TRUE 0 3 0 2 0 0 7 2 0 0 3 0 0
##
## 104 105 106 107 108 109 110 111 112 113 114 115 116
## FALSE 30 15 99 15 75 72 87 94 105 104 103 135 43
## TRUE 0 0 6 0 0 3 3 11 0 1 2 0 2
##
## 118 119 120 121 122 123 124 125 126 127 128 129 130
## FALSE 24 97 30 58 86 60 87 59 15 15 30 60 60
## TRUE 6 8 0 2 4 0 3 1 0 0 0 0 0
##
## 131 133 134 135 136 137 138 140 141 142 143 145 146
## FALSE 90 74 45 60 27 74 45 45 29 59 15 28 44
## TRUE 0 1 0 0 3 1 0 0 1 1 0 2 1
##
## 147 149 151 152 153 154 155 156 157 158 159 160 161
## FALSE 44 28 75 44 30 15 15 45 15 28 30 43 43
## TRUE 1 2 0 1 0 0 0 0 0 2 0 2 2
##
## 163 164 165 167 168 169 170 172 173 174 175 176 177
## FALSE 59 11 11 60 30 30 15 14 14 60 28 30 28
## TRUE 1 4 4 0 0 0 0 1 1 0 2 0 2
##
## 178 179 180 181 183 185 186 187 189 190 191 193 194
## FALSE 44 15 15 45 29 15 57 15 43 15 14 15 15
## TRUE 1 0 0 0 1 0 3 0 2 0 1 0 0
##
## 195 197 198 201 202 203 204 205 206 207 208 210 213
## FALSE 15 28 30 15 45 44 30 28 45 58 60 60 29
## TRUE 0 2 0 0 0 1 0 2 0 2 0 0 1
##
## 214 216 217 218 219 220 221 223 224 226 227 228 230
## FALSE 15 30 15 13 15 27 15 15 23 15 45 30 15
## TRUE 0 0 0 2 0 3 0 0 7 0 0 0 0
##
## 231 232 233 236 240 241 242 243 244 245 247 248 252
## FALSE 14 15 45 30 30 30 15 30 15 15 43 12 15
## TRUE 1 0 0 0 0 0 0 0 0 0 2 3 0
##
## 253 254 257 258 259 260 262 265 266 271 272 274 275
## FALSE 45 15 15 30 30 14 15 15 15 44 43 15 15
## TRUE 0 0 0 0 0 1 0 0 0 1 2 0 0
##
## 277 279 281 282 283 286 287 288 291 292 294 295 297
## FALSE 15 15 15 15 15 15 15 15 15 15 15 14 15
## TRUE 0 0 0 0 0 0 0 0 0 0 0 1 0
##
## 298 299 300 302 303 306 308 310 311 313 315 317 320
## FALSE 14 15 15 14 15 44 15 15 15 43 13 10 15
## TRUE 1 0 0 1 0 1 0 0 0 2 2 5 0
##
## 321 324 326 328 335 338 344 346 347 348 350 351 362
## FALSE 15 15 10 15 15 15 15 15 15 15 15 15 15
## TRUE 0 0 5 0 0 0 0 0 0 0 0 0 0
##
## 365 367 368 373 378 381 386 388 392 400 402 406 409
## FALSE 15 15 4 15 15 15 15 13 15 15 15 15 27
## TRUE 0 0 11 0 0 0 0 2 0 0 0 0 3
##
## 417 422 425 430 456 459 464 468 470 479 484 485 497
## FALSE 15 14 15 15 15 15 13 15 15 15 15 15 15
## TRUE 0 1 0 0 0 0 2 0 0 0 0 0 0
##
## 515 517 526 528 533 539 554 562 565 566 579 589 626
## FALSE 15 13 15 15 15 15 15 15 13 15 15 15 15
## TRUE 0 2 0 0 0 0 0 0 2 0 0 0 0
##
## 631 640 651 662 669 678 709 719 720 768 815 839 844
## FALSE 13 14 12 15 15 15 12 15 15 15 15 15 15
## TRUE 2 1 3 0 0 0 3 0 0 0 0 0 0
##
## 968 1377 1455 1619 1963
## FALSE 15 13 15 15 15
## TRUE 0 2 0 0 0
##### Requires work
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrcategory) # split categories in the text string
##
## Afghan Afghan, Ethnic Food Afghan, Hookah Bars
## FALSE 30 15 13
## TRUE 0 0 2
##
## Afghan, Persian/Iranian, Indian African African, American (New)
## FALSE 15 30 15
## TRUE 0 0 0
##
## African, Bars African, Ethnic Food African, Portuguese
## FALSE 15 15 30
## TRUE 0 0 0
##
## American (New) American (New), American (Traditional)
## FALSE 967 60
## TRUE 8 0
##
## American (New), Asian Fusion American (New), Bars
## FALSE 30 45
## TRUE 0 0
##
## American (New), Bars, Music Venues American (New), Belgian
## FALSE 15 13
## TRUE 0 2
##
## American (New), Breakfast & Brunch American (New), Breweries
## FALSE 58 15
## TRUE 2 0
##
## American (New), Cajun/Creole, Fast Food
## FALSE 15
## TRUE 0
##
## American (New), Dance Clubs American (New), Diners
## FALSE 13 15
## TRUE 2 0
##
## American (New), French American (New), French, Italian
## FALSE 15 14
## TRUE 0 1
##
## American (New), Ice Cream & Frozen Yogurt
## FALSE 30
## TRUE 0
##
## American (New), Italian American (New), Italian, Mediterranean
## FALSE 15 15
## TRUE 0 0
##
## American (New), Lounges American (New), Lounges, Mediterranean
## FALSE 69 13
## TRUE 6 2
##
## American (New), Music Venues American (New), Pubs, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## American (New), Sandwiches American (New), Seafood
## FALSE 30 28
## TRUE 0 2
##
## American (New), Specialty Food American (New), Steakhouses
## FALSE 13 45
## TRUE 2 0
##
## American (New), Sushi Bars American (New), Tapas/Small Plates
## FALSE 15 15
## TRUE 0 0
##
## American (New), Wine Bars American (Traditional)
## FALSE 30 520
## TRUE 0 5
##
## American (Traditional), American (New)
## FALSE 10
## TRUE 5
##
## American (Traditional), Barbeque, Pubs
## FALSE 15
## TRUE 0
##
## American (Traditional), Bars
## FALSE 45
## TRUE 0
##
## American (Traditional), Bookstores, Sandwiches
## FALSE 15
## TRUE 0
##
## American (Traditional), Breakfast & Brunch
## FALSE 43
## TRUE 2
##
## American (Traditional), Breakfast & Brunch, Coffee & Tea
## FALSE 15
## TRUE 0
##
## American (Traditional), Burgers American (Traditional), Cafes
## FALSE 30 15
## TRUE 0 0
##
## American (Traditional), Chicken Wings, Burgers
## FALSE 9
## TRUE 6
##
## American (Traditional), Coffee & Tea
## FALSE 30
## TRUE 0
##
## American (Traditional), Comfort Food
## FALSE 15
## TRUE 0
##
## American (Traditional), Dive Bars
## FALSE 29
## TRUE 1
##
## American (Traditional), Fast Food
## FALSE 15
## TRUE 0
##
## American (Traditional), Jazz & Blues, Lounges
## FALSE 11
## TRUE 4
##
## American (Traditional), Karaoke
## FALSE 15
## TRUE 0
##
## American (Traditional), Lounges, Breakfast & Brunch
## FALSE 13
## TRUE 2
##
## American (Traditional), Nightlife American (Traditional), Pubs
## FALSE 15 15
## TRUE 0 0
##
## American (Traditional), Pubs, Sports Bars
## FALSE 15
## TRUE 0
##
## American (Traditional), Sandwiches
## FALSE 15
## TRUE 0
##
## American (Traditional), Social Clubs
## FALSE 15
## TRUE 0
##
## American (Traditional), Soul Food
## FALSE 30
## TRUE 0
##
## American (Traditional), Southern, Bars
## FALSE 15
## TRUE 0
##
## American (Traditional), Spanish
## FALSE 15
## TRUE 0
##
## American (Traditional), Sports Bars
## FALSE 15
## TRUE 0
##
## American (Traditional), Steakhouses
## FALSE 12
## TRUE 3
##
## American (Traditional), Wine Bars Asian Fusion
## FALSE 15 85
## TRUE 0 5
##
## Asian Fusion, Japanese Asian Fusion, Lounges
## FALSE 23 15
## TRUE 7 0
##
## Asian Fusion, Pakistani, Indian Asian Fusion, Sushi Bars
## FALSE 15 60
## TRUE 0 0
##
## Asian Fusion, Thai Bagels, Coffee & Tea, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Bakeries, Cafes Bakeries, Caterers, Delis
## FALSE 30 15
## TRUE 0 0
##
## Bakeries, Caterers, Delis, Bagels
## FALSE 15
## TRUE 0
##
## Bakeries, Coffee & Tea, Sandwiches
## FALSE 45
## TRUE 0
##
## Bakeries, Coffee & Tea, Specialty Food, Delis
## FALSE 15
## TRUE 0
##
## Bakeries, French, Coffee & Tea Bakeries, Sandwiches
## FALSE 15 45
## TRUE 0 0
##
## Bakeries, Sandwiches, Coffee & Tea Bakeries, Soup, Sandwiches
## FALSE 15 30
## TRUE 0 0
##
## Bakeries, Vegan, Coffee & Tea Bakeries, Vegetarian Barbeque
## FALSE 15 15 42
## TRUE 0 0 3
##
## Barbeque, Bars
## FALSE 30
## TRUE 0
##
## Barbeque, Breakfast & Brunch, Caterers, Ethnic Food, Sandwiches
## FALSE 11
## TRUE 4
##
## Barbeque, Caribbean Barbeque, Caterers Barbeque, Diners
## FALSE 15 15 15
## TRUE 0 0 0
##
## Barbeque, Food Stands, Street Vendors Barbeque, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Barbeque, Soul Food Bars, American (New)
## FALSE 14 135
## TRUE 1 0
##
## Bars, American (New), American (Traditional)
## FALSE 15
## TRUE 0
##
## Bars, American (New), Barbeque Bars, American (New), Gastropubs
## FALSE 15 15
## TRUE 0 0
##
## Bars, American (New), Social Clubs
## FALSE 15
## TRUE 0
##
## Bars, American (New), Steakhouses Bars, American (Traditional)
## FALSE 15 133
## TRUE 0 2
##
## Bars, American (Traditional), Southern
## FALSE 14
## TRUE 1
##
## Bars, American (Traditional), Venues & Event Spaces
## FALSE 15
## TRUE 0
##
## Bars, Breakfast & Brunch, American (Traditional)
## FALSE 15
## TRUE 0
##
## Bars, Burgers, American (New) Bars, Comfort Food
## FALSE 15 15
## TRUE 0 0
##
## Bars, French, Creperies Bars, Irish, Delis Bars, Italian
## FALSE 15 15 15
## TRUE 0 0 0
##
## Bars, Italian, Greek Bars, Latin American
## FALSE 15 15
## TRUE 0 0
##
## Bars, Pool Halls, American (Traditional) Bars, Pool Halls, Delis
## FALSE 15 15
## TRUE 0 0
##
## Bars, Restaurants Bars, Seafood
## FALSE 15 30
## TRUE 0 0
##
## Bars, Soul Food, American (Traditional) Bars, Sushi Bars
## FALSE 15 15
## TRUE 0 0
##
## Bars, Tapas/Small Plates Beer, Wine & Spirits, Caterers, Delis
## FALSE 15 15
## TRUE 0 0
##
## Beer, Wine & Spirits, Delis
## FALSE 30
## TRUE 0
##
## Beer, Wine & Spirits, Delis, Seafood Markets Belgian
## FALSE 15 30
## TRUE 0 0
##
## Belgian, Breakfast & Brunch, Coffee & Tea
## FALSE 30
## TRUE 0
##
## Belgian, Modern European Belgian, Pubs, American (Traditional)
## FALSE 15 15
## TRUE 0 0
##
## Bookstores, American (New) Brazilian Brazilian, Steakhouses
## FALSE 15 13 15
## TRUE 0 2 0
##
## Breakfast & Brunch Breakfast & Brunch, American (Traditional)
## FALSE 105 60
## TRUE 0 0
##
## Breakfast & Brunch, Asian Fusion, Sandwiches
## FALSE 14
## TRUE 1
##
## Breakfast & Brunch, Cafes
## FALSE 15
## TRUE 0
##
## Breakfast & Brunch, Coffee & Tea, Belgian
## FALSE 15
## TRUE 0
##
## Breakfast & Brunch, Diners, Steakhouses
## FALSE 15
## TRUE 0
##
## Breakfast & Brunch, Irish, Pubs Breakfast & Brunch, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Breakfast & Brunch, Sandwiches, Bakeries
## FALSE 15
## TRUE 0
##
## Breweries, American (New) British, American (New) British, Pubs
## FALSE 15 13 30
## TRUE 0 2 0
##
## Buffets Buffets, Breakfast & Brunch, Sandwiches Buffets, Delis
## FALSE 60 15 15
## TRUE 0 0 0
##
## Buffets, Sandwiches Burgers Burgers, American (Traditional)
## FALSE 15 297 15
## TRUE 0 3 0
##
## Burgers, Bars Burgers, Fast Food Burgers, Sandwiches Burmese
## FALSE 30 15 15 15
## TRUE 0 0 0 0
##
## Butcher, Delis, Sandwiches Cafes Cafes, Bakeries, Desserts
## FALSE 15 180 15
## TRUE 0 0 0
##
## Cafes, Breakfast & Brunch Cafes, Coffee & Tea
## FALSE 14 30
## TRUE 1 0
##
## Cafes, Coffee & Tea, Sandwiches Cafes, Delis Cafes, Ethiopian
## FALSE 15 15 30
## TRUE 0 0 0
##
## Cafes, Ice Cream & Frozen Yogurt Cafes, Portuguese, Bakeries
## FALSE 15 15
## TRUE 0 0
##
## Cafes, Sandwiches Cafes, Sandwiches, American (Traditional)
## FALSE 15 12
## TRUE 0 3
##
## Cajun/Creole Cajun/Creole, Beer, Wine & Spirits, Seafood
## FALSE 30 15
## TRUE 0 0
##
## Cajun/Creole, Breakfast & Brunch
## FALSE 12
## TRUE 3
##
## Cajun/Creole, Breakfast & Brunch, Sandwiches
## FALSE 15
## TRUE 0
##
## Cajun/Creole, Caterers Caribbean Caribbean, Caterers, Dive Bars
## FALSE 15 119 12
## TRUE 0 1 3
##
## Caribbean, Dance Clubs Caribbean, Ethnic Food
## FALSE 15 15
## TRUE 0 0
##
## Caribbean, Soul Food, Lounges Caterers, Delis
## FALSE 14 15
## TRUE 1 0
##
## Caterers, Food Stands, Asian Fusion Cheesesteaks, Food Stands
## FALSE 15 15
## TRUE 0 0
##
## Chicken Wings Chicken Wings, Delis, Cheesesteaks
## FALSE 30 15
## TRUE 0 0
##
## Chicken Wings, Fast Food, Chinese Chinese Chinese, Asian Fusion
## FALSE 15 862 15
## TRUE 0 8 0
##
## Chinese, Asian Fusion, Gluten-Free Chinese, Buffets, Cafes
## FALSE 15 15
## TRUE 0 0
##
## Chinese, Fast Food Chinese, Food Delivery Services
## FALSE 30 15
## TRUE 0 0
##
## Chinese, Korean, Vietnamese Chinese, Lounges Chinese, Pizza
## FALSE 15 15 15
## TRUE 0 0 0
##
## Chinese, Seafood Chinese, Sushi Bars
## FALSE 15 14
## TRUE 0 1
##
## Chinese, Sushi Bars, Asian Fusion Chinese, Thai
## FALSE 30 13
## TRUE 0 2
##
## Chinese, Vegetarian
## FALSE 30
## TRUE 0
##
## Chocolatiers and Shops, Belgian, Breakfast & Brunch
## FALSE 15
## TRUE 0
##
## Coffee & Tea, American (Traditional) Coffee & Tea, Asian Fusion
## FALSE 15 30
## TRUE 0 0
##
## Coffee & Tea, Bagels, Sandwiches Coffee & Tea, Bakeries, Chinese
## FALSE 29 15
## TRUE 1 0
##
## Coffee & Tea, Bakeries, Sandwiches
## FALSE 30
## TRUE 0
##
## Coffee & Tea, Bars, Bakeries, Cafes
## FALSE 15
## TRUE 0
##
## Coffee & Tea, Beer, Wine & Spirits, Sandwiches
## FALSE 15
## TRUE 0
##
## Coffee & Tea, Breakfast & Brunch
## FALSE 15
## TRUE 0
##
## Coffee & Tea, Breakfast & Brunch, Lounges
## FALSE 14
## TRUE 1
##
## Coffee & Tea, Cafes, Caterers Coffee & Tea, Delis
## FALSE 15 45
## TRUE 0 0
##
## Coffee & Tea, Food Stands Coffee & Tea, Korean, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Coffee & Tea, Restaurants Coffee & Tea, Sandwiches
## FALSE 15 30
## TRUE 0 0
##
## Coffee & Tea, Sandwiches, Pizza Coffee & Tea, Soul Food
## FALSE 15 15
## TRUE 0 0
##
## Convenience Stores, Cafes Convenience Stores, Ethiopian
## FALSE 15 15
## TRUE 0 0
##
## Convenience Stores, Restaurants Convenience Stores, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Creperies Creperies, Coffee & Tea Cuban Cuban, Dance Clubs
## FALSE 72 15 15 15
## TRUE 3 0 0 0
##
## Cuban, Latin American Dance Clubs, American (New)
## FALSE 14 29
## TRUE 1 1
##
## Dance Clubs, Bars, American (New) Delis
## FALSE 15 1228
## TRUE 0 2
##
## Delis, Buffets, Caterers Delis, Burgers Delis, Caterers
## FALSE 15 15 45
## TRUE 0 0 0
##
## Delis, Caterers, Sandwiches Delis, Coffee & Tea
## FALSE 15 45
## TRUE 0 0
##
## Delis, Coffee & Tea, Bagels Delis, Greek, Mediterranean
## FALSE 30 15
## TRUE 0 0
##
## Delis, Grocery, Sandwiches Delis, Italian Delis, Korean
## FALSE 14 15 15
## TRUE 1 0 0
##
## Delis, Meat Shops Delis, Sandwiches Delis, Sandwiches, Pizza
## FALSE 15 120 15
## TRUE 0 0 0
##
## Delis, Soup, Sandwiches Delis, Specialty Food Delis, Sushi Bars
## FALSE 15 15 14
## TRUE 0 0 1
##
## Desserts, American (New)
## FALSE 15
## TRUE 0
##
## Desserts, American (Traditional), American (New)
## FALSE 15
## TRUE 0
##
## Desserts, Coffee & Tea, Sandwiches Desserts, Creperies
## FALSE 15 15
## TRUE 0 0
##
## Desserts, Food Stands
## FALSE 60
## TRUE 0
##
## Desserts, Food Stands, Ice Cream & Frozen Yogurt Dim Sum Diners
## FALSE 15 30 71
## TRUE 0 0 4
##
## Diners, American (New), American (Traditional)
## FALSE 15
## TRUE 0
##
## Diners, American (Traditional) Diners, Bars Diners, Burgers
## FALSE 15 15 15
## TRUE 0 0 0
##
## Diners, Cafes Diners, Coffee & Tea, Wine Bars Diners, Pizza
## FALSE 15 15 15
## TRUE 0 0 0
##
## Diners, Soul Food Dive Bars, American (Traditional)
## FALSE 15 30
## TRUE 0 0
##
## Dive Bars, Greek Dive Bars, Italian
## FALSE 15 15
## TRUE 0 0
##
## Dive Bars, Lounges, Restaurants Dive Bars, Pubs, Irish
## FALSE 15 15
## TRUE 0 0
##
## Ethiopian Ethiopian, African Ethiopian, Coffee & Tea, Cafes
## FALSE 268 15 15
## TRUE 2 0 0
##
## Ethiopian, Ethnic Food Ethiopian, Italian Ethnic Food, African
## FALSE 15 15 15
## TRUE 0 0 0
##
## Ethnic Food, Food Stands, Middle Eastern Ethnic Food, Mexican
## FALSE 15 15
## TRUE 0 0
##
## Ethnic Food, Middle Eastern Fast Food
## FALSE 15 435
## TRUE 0 0
##
## Fast Food, American (Traditional) Fast Food, Burgers
## FALSE 15 75
## TRUE 0 0
##
## Fast Food, Mexican Fast Food, Peruvian Fast Food, Sandwiches
## FALSE 15 15 45
## TRUE 0 0 0
##
## Fondue Food Stands Food Stands, Asian Fusion
## FALSE 15 105 30
## TRUE 0 0 0
##
## Food Stands, Asian Fusion, Caterers Food Stands, Bakeries
## FALSE 15 30
## TRUE 0 0
##
## Food Stands, Burgers Food Stands, Caribbean Food Stands, Indian
## FALSE 15 15 15
## TRUE 0 0 0
##
## Food Stands, Italian, Street Vendors Food Stands, Latin American
## FALSE 15 30
## TRUE 0 0
##
## Food Stands, Sandwiches Food Stands, Vegetarian French
## FALSE 14 15 264
## TRUE 1 0 6
##
## French, American (New) French, Art Galleries, Cafes
## FALSE 14 14
## TRUE 1 1
##
## French, Bars French, Bars, Brasseries French, Belgian
## FALSE 15 15 15
## TRUE 0 0 0
##
## French, Brasseries French, Creperies French, Seafood, Wine Bars
## FALSE 15 29 15
## TRUE 0 1 0
##
## French, Wine Bars Fruits & Veggies, Food Stands Gastropubs
## FALSE 15 15 15
## TRUE 0 0 0
##
## Gastropubs, American (Traditional), Bars Gay Bars, Italian
## FALSE 15 15
## TRUE 0 0
##
## Gay Bars, Sports Bars, Restaurants German German, Bars
## FALSE 15 29 15
## TRUE 0 1 0
##
## Gluten-Free, Vegetarian, Greek Greek
## FALSE 15 45
## TRUE 0 0
##
## Greek, American (Traditional)
## FALSE 15
## TRUE 0
##
## Greek, Burgers, Sandwiches, Mediterranean Greek, Caterers
## FALSE 15 15
## TRUE 0 0
##
## Greek, Mediterranean Greek, Mediterranean, Middle Eastern
## FALSE 90 14
## TRUE 0 1
##
## Greek, Mediterranean, Turkish Greek, Pizza, Mediterranean
## FALSE 15 15
## TRUE 0 0
##
## Greek, Seafood, Mediterranean Greek, Turkish
## FALSE 15 13
## TRUE 0 2
##
## Greek, Turkish, Mediterranean
## FALSE 15
## TRUE 0
##
## Grocery, Beer, Wine & Spirits, Delis Grocery, Cafes
## FALSE 15 15
## TRUE 0 0
##
## Grocery, Chinese Grocery, Delis Grocery, Delis, Sandwiches
## FALSE 15 30 15
## TRUE 0 0 0
##
## Grocery, Pizza Grocery, Wholesale Stores, Halal
## FALSE 15 15
## TRUE 0 0
##
## Hookah Bars, Moroccan Hot Dogs Hot Dogs, Burgers
## FALSE 14 14 15
## TRUE 1 1 0
##
## Hotels, Restaurants Ice Cream & Frozen Yogurt, Sandwiches
## FALSE 165 15
## TRUE 0 0
##
## Ice Cream & Frozen Yogurt, Vegan Indian
## FALSE 15 274
## TRUE 0 11
##
## Indian, Buffets, Vegetarian Indian, Fast Food
## FALSE 10 15
## TRUE 5 0
##
## Indian, Himalayan/Nepalese Indian, Pakistani Irish, Pubs
## FALSE 13 132 13
## TRUE 2 3 2
##
## Irish, Pubs, Beer, Wine & Spirits
## FALSE 15
## TRUE 0
##
## IT Services & Computer Repair, Delis Italian Italian, Bars
## FALSE 15 797 15
## TRUE 0 28 0
##
## Italian, Bars, American (New) Italian, Breakfast & Brunch, Pizza
## FALSE 15 14
## TRUE 0 1
##
## Italian, Cafes Italian, Caterers
## FALSE 15 15
## TRUE 0 0
##
## Italian, Ice Cream & Frozen Yogurt, Coffee & Tea
## FALSE 15
## TRUE 0
##
## Italian, Modern European Italian, Pizza Italian, Wine Bars
## FALSE 15 75 15
## TRUE 0 0 0
##
## Japanese Japanese, Dance Clubs Japanese, Fast Food
## FALSE 133 15 15
## TRUE 2 0 0
##
## Japanese, Karaoke Japanese, Sushi Bars
## FALSE 15 96
## TRUE 0 9
##
## Jazz & Blues, American (New)
## FALSE 15
## TRUE 0
##
## Jazz & Blues, American (Traditional)
## FALSE 14
## TRUE 1
##
## Jazz & Blues, American (Traditional), Ethiopian
## FALSE 12
## TRUE 3
##
## Jazz & Blues, Bars, Cajun/Creole Jazz & Blues, Cajun/Creole
## FALSE 15 14
## TRUE 0 1
##
## Juice Bars & Smoothies, Sandwiches
## FALSE 15
## TRUE 0
##
## Juice Bars & Smoothies, Vegetarian Karaoke, Korean Korean
## FALSE 15 15 60
## TRUE 0 0 0
##
## Korean, Food Stands Korean, Sandwiches Korean, Sushi Bars
## FALSE 15 15 28
## TRUE 0 0 2
##
## Kosher, Mediterranean, Tapas/Small Plates Latin American
## FALSE 15 282
## TRUE 0 3
##
## Latin American, Asian Fusion Latin American, Bars, Dance Clubs
## FALSE 14 15
## TRUE 1 0
##
## Latin American, Caribbean Latin American, Lounges, Hookah Bars
## FALSE 15 15
## TRUE 0 0
##
## Latin American, Mexican Latin American, Peruvian
## FALSE 74 45
## TRUE 1 0
##
## Latin American, Sandwiches Latin American, Sandwiches, Hot Dogs
## FALSE 15 15
## TRUE 0 0
##
## Lounges, American (New) Lounges, American (New), Cafes
## FALSE 15 15
## TRUE 0 0
##
## Lounges, American (New), Italian Lounges, American (Traditional)
## FALSE 15 30
## TRUE 0 0
##
## Lounges, Belgian Lounges, Dance Clubs, Tapas/Small Plates
## FALSE 13 15
## TRUE 2 0
##
## Lounges, Latin American Lounges, Restaurants
## FALSE 15 15
## TRUE 0 0
##
## Lounges, Tapas Bars, Hookah Bars Lounges, Vegetarian Malaysian
## FALSE 15 15 15
## TRUE 0 0 0
##
## Mediterranean Mediterranean, Delis, Food
## FALSE 148 14
## TRUE 2 1
##
## Mediterranean, Fast Food, Pizza Mediterranean, Greek
## FALSE 15 15
## TRUE 0 0
##
## Mediterranean, Greek, Turkish Mediterranean, Indian
## FALSE 15 14
## TRUE 0 1
##
## Mediterranean, Italian Mediterranean, Lebanese
## FALSE 15 15
## TRUE 0 0
##
## Mediterranean, Lounges, Hookah Bars
## FALSE 15
## TRUE 0
##
## Mediterranean, Middle Eastern Mediterranean, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Mediterranean, Vegetarian Mexican Mexican, Asian Fusion
## FALSE 15 716 15
## TRUE 0 4 0
##
## Mexican, Bakeries Mexican, Bars Mexican, Fast Food
## FALSE 15 15 60
## TRUE 0 0 0
##
## Mexican, Food Stands Mexican, Latin American Mexican, Peruvian
## FALSE 30 30 15
## TRUE 0 0 0
##
## Mexican, Salvadoran Mexican, Spanish Mexican, Tapas Bars
## FALSE 15 15 15
## TRUE 0 0 0
##
## Mexican, Tex-Mex Mexican, Tex-Mex, Breakfast & Brunch
## FALSE 45 15
## TRUE 0 0
##
## Middle Eastern Middle Eastern, Afghan Middle Eastern, Bars
## FALSE 156 14 11
## TRUE 9 1 4
##
## Middle Eastern, Fast Food Middle Eastern, Halal
## FALSE 15 15
## TRUE 0 0
##
## Middle Eastern, Hookah Bars Middle Eastern, Indian, Halal
## FALSE 15 15
## TRUE 0 0
##
## Middle Eastern, Mediterranean Middle Eastern, Persian/Iranian
## FALSE 43 30
## TRUE 2 0
##
## Middle Eastern, Pizza Middle Eastern, Vegetarian, Ethnic Food
## FALSE 15 14
## TRUE 0 1
##
## Modern European, Bakeries, Coffee & Tea Mongolian Moroccan
## FALSE 15 15 30
## TRUE 0 0 0
##
## Moroccan, Tapas Bars Music Venues, American (New)
## FALSE 13 15
## TRUE 2 0
##
## Music Venues, Cajun/Creole
## FALSE 15
## TRUE 0
##
## Newspapers & Magazines, Grocery, Restaurants
## FALSE 15
## TRUE 0
##
## Newspapers & Magazines, Restaurants Nightlife, Greek
## FALSE 15 13
## TRUE 0 2
##
## Nightlife, Latin American Nightlife, Restaurants
## FALSE 15 75
## TRUE 0 0
##
## Nightlife, Tobacco Shops, Restaurants Pakistani Peruvian Pizza
## FALSE 15 15 29 1297
## TRUE 0 0 1 23
##
## Pizza, American (New) Pizza, American (Traditional)
## FALSE 15 30
## TRUE 0 0
##
## Pizza, American (Traditional), Food Delivery Services
## FALSE 15
## TRUE 0
##
## Pizza, Caterers Pizza, Delis, Italian Pizza, Dive Bars
## FALSE 15 15 15
## TRUE 0 0 0
##
## Pizza, Food Stands Pizza, Gluten-Free Pizza, Gluten-Free, Bars
## FALSE 15 15 15
## TRUE 0 0 0
##
## Pizza, Ice Cream & Frozen Yogurt Pizza, Italian
## FALSE 15 218
## TRUE 0 7
##
## Pizza, Italian, Chicken Wings Pizza, Italian, Greek
## FALSE 15 14
## TRUE 0 1
##
## Pizza, Italian, Pubs Pizza, Italian, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Pizza, Sandwiches Pizza, Sandwiches, Bars Pizza, Seafood
## FALSE 60 15 15
## TRUE 0 0 0
##
## Pizza, Vegan, Gluten-Free Pizza, Vegetarian
## FALSE 45 15
## TRUE 0 0
##
## Pool Halls, American (New), Pubs
## FALSE 11
## TRUE 4
##
## Pool Halls, Sports Bars, Burgers Pool Halls, Sports Bars, Cafes
## FALSE 15 15
## TRUE 0 0
##
## Pubs, American (Traditional) Pubs, British, Fish & Chips
## FALSE 58 15
## TRUE 2 0
##
## Pubs, Fish & Chips Pubs, Irish
## FALSE 15 74
## TRUE 0 1
##
## Pubs, Italian, American (Traditional) Pubs, Music Venues, Irish
## FALSE 15 15
## TRUE 0 0
##
## Pubs, Vietnamese, American (Traditional) Restaurants
## FALSE 15 5741
## TRUE 0 4
##
## Restaurants, Bakeries Restaurants, Barbers Restaurants, Bars
## FALSE 15 15 45
## TRUE 0 0 0
##
## Restaurants, Coffee & Tea
## FALSE 45
## TRUE 0
##
## Restaurants, Coffee & Tea, Juice Bars & Smoothies
## FALSE 15
## TRUE 0
##
## Restaurants, Comedy Clubs Restaurants, Dance Clubs
## FALSE 15 30
## TRUE 0 0
##
## Restaurants, Dive Bars Restaurants, Food Restaurants, Gay Bars
## FALSE 74 15 60
## TRUE 1 0 0
##
## Restaurants, Grocery Restaurants, Hookah Bars
## FALSE 15 30
## TRUE 0 0
##
## Restaurants, Jazz & Blues Restaurants, Lawyers
## FALSE 15 15
## TRUE 0 0
##
## Restaurants, Lounges Restaurants, Lounges, Coffee & Tea
## FALSE 29 15
## TRUE 1 0
##
## Restaurants, Music Venues Restaurants, Property Management
## FALSE 15 15
## TRUE 0 0
##
## Restaurants, Pubs Restaurants, Tea Rooms
## FALSE 60 15
## TRUE 0 0
##
## Restaurants, Venues & Event Spaces Restaurants, Wine Bars Salad
## FALSE 15 15 15
## TRUE 0 0 0
##
## Salad, American (New) Salad, Gluten-Free Sandwiches
## FALSE 15 15 1634
## TRUE 0 0 1
##
## Sandwiches, American (Traditional) Sandwiches, Bagels
## FALSE 15 15
## TRUE 0 0
##
## Sandwiches, Breakfast & Brunch Sandwiches, Buffets
## FALSE 15 15
## TRUE 0 0
##
## Sandwiches, Cafes Sandwiches, Cafes, Vegetarian
## FALSE 13 15
## TRUE 2 0
##
## Sandwiches, Caterers Sandwiches, Coffee & Tea
## FALSE 14 45
## TRUE 1 0
##
## Sandwiches, Coffee & Tea, Breakfast & Brunch
## FALSE 15
## TRUE 0
##
## Sandwiches, Coffee & Tea, Convenience Stores
## FALSE 15
## TRUE 0
##
## Sandwiches, Delis, Chinese Sandwiches, Fast Food
## FALSE 15 30
## TRUE 0 0
##
## Sandwiches, Italian Sandwiches, Middle Eastern
## FALSE 15 11
## TRUE 0 4
##
## Sandwiches, Pizza Sandwiches, Salad Seafood
## FALSE 15 15 313
## TRUE 0 0 2
##
## Seafood Markets, Restaurants Seafood Markets, Seafood
## FALSE 15 15
## TRUE 0 0
##
## Seafood, American (New) Seafood, American (Traditional)
## FALSE 14 14
## TRUE 1 1
##
## Seafood, Cajun/Creole Seafood, Dim Sum Seafood, Food Stands
## FALSE 15 15 15
## TRUE 0 0 0
##
## Seafood, Lounges Seafood, Seafood Markets
## FALSE 15 30
## TRUE 0 0
##
## Seafood, Soul Food, Bars Seafood, Steakhouses
## FALSE 15 15
## TRUE 0 0
##
## Shopping Centers, Museums, Food Stands Soul Food
## FALSE 15 60
## TRUE 0 0
##
## Soul Food, American (New) Soul Food, Southern
## FALSE 15 15
## TRUE 0 0
##
## Soul Food, Vegan, Vegetarian Soup, Sandwiches Southern
## FALSE 15 15 60
## TRUE 0 0 0
##
## Southern, American (Traditional) Southern, Bars, Barbeque
## FALSE 15 15
## TRUE 0 0
##
## Southern, Bars, Burgers Southern, Breakfast & Brunch
## FALSE 15 15
## TRUE 0 0
##
## Southern, Fast Food Southern, Soul Food
## FALSE 15 30
## TRUE 0 0
##
## Southern, Soul Food, Caterers
## FALSE 11
## TRUE 4
##
## Spanish, Basque, Tapas/Small Plates Spanish, Caterers
## FALSE 15 14
## TRUE 0 1
##
## Spanish, Tapas Bars Spanish, Tapas Bars, Caterers
## FALSE 30 15
## TRUE 0 0
##
## Spanish, Tapas/Small Plates Specialty Food, American (New)
## FALSE 15 15
## TRUE 0 0
##
## Specialty Food, Delis, Beer, Wine & Spirits
## FALSE 15
## TRUE 0
##
## Sports Bars, American (New) Sports Bars, American (Traditional)
## FALSE 15 75
## TRUE 0 0
##
## Sports Bars, American (Traditional), Burgers Sports Bars, Irish
## FALSE 15 15
## TRUE 0 0
##
## Steakhouses Steakhouses, American (New)
## FALSE 101 15
## TRUE 4 0
##
## Steakhouses, American (Traditional) Steakhouses, Bars
## FALSE 45 15
## TRUE 0 0
##
## Steakhouses, Breakfast & Brunch Steakhouses, Burgers
## FALSE 15 15
## TRUE 0 0
##
## Steakhouses, Italian Steakhouses, Seafood
## FALSE 15 45
## TRUE 0 0
##
## Street Vendors, American (New), Comfort Food
## FALSE 15
## TRUE 0
##
## Street Vendors, Cajun/Creole
## FALSE 15
## TRUE 0
##
## Street Vendors, Food Delivery Services, Food Stands
## FALSE 15
## TRUE 0
##
## Street Vendors, Food Stands Street Vendors, Food Stands, Italian
## FALSE 15 15
## TRUE 0 0
##
## Street Vendors, Italian, American (Traditional)
## FALSE 15
## TRUE 0
##
## Street Vendors, Sandwiches Street Vendors, Vietnamese
## FALSE 15 30
## TRUE 0 0
##
## Sushi Bars Sushi Bars, Asian Fusion Sushi Bars, Buffets
## FALSE 100 44 15
## TRUE 5 1 0
##
## Sushi Bars, Chinese Sushi Bars, Japanese
## FALSE 14 54
## TRUE 1 6
##
## Sushi Bars, Japanese, Lounges Sushi Bars, Thai, Asian Fusion
## FALSE 11 15
## TRUE 4 0
##
## Taiwanese, Japanese Tapas Bars Tapas Bars, Italian, Bars
## FALSE 13 15 15
## TRUE 2 0 0
##
## Tapas Bars, Mediterranean
## FALSE 15
## TRUE 0
##
## Tapas Bars, Modern European, Venues & Event Spaces
## FALSE 11
## TRUE 4
##
## Tapas Bars, Spanish Tea Rooms, Asian Fusion Tex-Mex
## FALSE 15 15 30
## TRUE 0 0 0
##
## Tex-Mex, Mexican, Latin American Tex-Mex, Tapas Bars Thai
## FALSE 15 13 416
## TRUE 0 2 4
##
## Thai, Asian Fusion Thai, Food Delivery Services Thai, Japanese
## FALSE 15 14 15
## TRUE 0 1 0
##
## Thai, Sushi Bars Turkish Turkish, Do-It-Yourself Food
## FALSE 29 29 15
## TRUE 1 1 0
##
## Turkish, Ethnic Food, Tea Rooms Turkish, Tapas Bars Vegan
## FALSE 15 13 15
## TRUE 0 2 0
##
## Vegan, Live/Raw Food Vegetarian, American (New)
## FALSE 15 15
## TRUE 0 0
##
## Vegetarian, Cafes Vegetarian, Mediterranean
## FALSE 15 15
## TRUE 0 0
##
## Vegetarian, Mediterranean, Middle Eastern Vegetarian, Sandwiches
## FALSE 15 15
## TRUE 0 0
##
## Vietnamese Vietnamese, Salad Vietnamese, Street Vendors
## FALSE 160 15 15
## TRUE 5 0 0
##
## Vietnamese, Thai Vietnamese, Thai, Chinese
## FALSE 8 15
## TRUE 7 0
##
## Wine Bars, American (New)
## FALSE 45
## TRUE 0
##
## Wine Bars, American (New), Latin American Wine Bars, Italian
## FALSE 15 15
## TRUE 0 0
##
## Wine Bars, Mediterranean
## FALSE 14
## TRUE 1
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrparking) # group parking categories
##
## 0 Garage Garage, Street Garage, Street, Private Lot
## FALSE 14928 205 601 45
## TRUE 42 5 14 0
##
## Garage, Street, Valet Garage, Street, Validated Garage, Validated
## FALSE 116 88 15
## TRUE 4 2 0
##
## Garage, Validated, Private Lot Private Lot Street
## FALSE 45 104 12160
## TRUE 0 1 245
##
## Street, Private Lot Street, Private Lot, Valet Street, Valet
## FALSE 223 15 746
## TRUE 2 0 34
##
## Street, Validated Street, Validated, Private Lot
## FALSE 130 45
## TRUE 5 0
##
## Street, Validated, Valet Valet Validated Validated, Private Lot
## FALSE 45 354 45 30
## TRUE 0 6 0 0
##
## Validated, Valet
## FALSE 12
## TRUE 3
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrattire) # Group Dressy and Formal together
##
## 0 Casual Dressy Formal (Jacket Required)
## FALSE 10209 18253 1355 135
## TRUE 21 302 40 0
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrgoodfor) # Group by Breakfast, Lunch, Dinner,
##
## 0 Breakfast Breakfast, Brunch Brunch Dessert
## FALSE 14807 210 30 118 75
## TRUE 43 0 0 2 0
##
## Dessert, Breakfast Dessert, Late Night
## FALSE 45 15
## TRUE 0 0
##
## Dessert, Late Night, Lunch, Dinner
## FALSE 30
## TRUE 0
##
## Dessert, Late Night, Lunch, Dinner, Breakfast, Brunch
## FALSE 15
## TRUE 0
##
## Dessert, Lunch Dessert, Lunch, Breakfast
## FALSE 58 30
## TRUE 2 0
##
## Dessert, Lunch, Breakfast, Brunch Dessert, Lunch, Brunch
## FALSE 30 15
## TRUE 0 0
##
## Dessert, Lunch, Dinner Dinner Dinner, Breakfast, Brunch
## FALSE 75 5343 15
## TRUE 0 177 0
##
## Dinner, Brunch Late Night Late Night, Breakfast Late Night, Dinner
## FALSE 145 1025 41 811
## TRUE 5 25 4 14
##
## Late Night, Dinner, Brunch Late Night, Lunch
## FALSE 15 60
## TRUE 0 0
##
## Late Night, Lunch, Breakfast Late Night, Lunch, Brunch
## FALSE 15 15
## TRUE 0 0
##
## Late Night, Lunch, Dinner Late Night, Lunch, Dinner, Breakfast
## FALSE 384 15
## TRUE 6 0
##
## Lunch Lunch, Breakfast Lunch, Breakfast, Brunch Lunch, Brunch
## FALSE 2950 508 75 29
## TRUE 20 2 0 1
##
## Lunch, Dinner Lunch, Dinner, Breakfast
## FALSE 2837 29
## TRUE 58 1
##
## Lunch, Dinner, Breakfast, Brunch Lunch, Dinner, Brunch
## FALSE 45 42
## TRUE 0 3
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrambiance) # Group by bigger categories
##
## 0 Casual Casual, Intimate Classy Classy, Casual
## FALSE 18432 6491 131 453 100
## TRUE 78 124 4 12 5
##
## Classy, Casual, Intimate Classy, Intimate Classy, Trendy
## FALSE 15 83 58
## TRUE 0 7 2
##
## Classy, Trendy, Upscale Classy, Trendy, Upscale, Casual
## FALSE 30 15
## TRUE 0 0
##
## Classy, Trendy, Upscale, Touristy, Casual Classy, Upscale
## FALSE 14 193
## TRUE 1 2
##
## Classy, Upscale, Casual Divey Divey, Casual Hipster
## FALSE 13 476 384 160
## TRUE 2 4 6 5
##
## Hipster, Casual Hipster, Divey Hipster, Divey, Casual
## FALSE 202 15 15
## TRUE 23 0 0
##
## Hipster, Romantic, Classy, Trendy, Intimate Hipster, Trendy
## FALSE 15 29
## TRUE 0 1
##
## Hipster, Trendy, Casual Hipster, Trendy, Casual, Intimate
## FALSE 89 15
## TRUE 1 0
##
## Hipster, Trendy, Intimate Intimate Romantic Romantic, Casual
## FALSE 15 117 55 56
## TRUE 0 3 5 4
##
## Romantic, Casual, Intimate Romantic, Classy
## FALSE 43 105
## TRUE 2 0
##
## Romantic, Classy, Casual, Intimate Romantic, Classy, Intimate
## FALSE 28 50
## TRUE 2 10
##
## Romantic, Classy, Trendy, Casual
## FALSE 15
## TRUE 0
##
## Romantic, Classy, Trendy, Intimate
## FALSE 30
## TRUE 0
##
## Romantic, Classy, Trendy, Upscale Romantic, Classy, Upscale
## FALSE 30 60
## TRUE 0 0
##
## Romantic, Classy, Upscale, Intimate Romantic, Intimate
## FALSE 15 208
## TRUE 0 2
##
## Romantic, Touristy, Casual, Intimate Romantic, Trendy
## FALSE 15 15
## TRUE 0 0
##
## Romantic, Trendy, Casual, Intimate Romantic, Upscale
## FALSE 15 13
## TRUE 0 2
##
## Romantic, Upscale, Intimate Touristy Touristy, Casual Trendy
## FALSE 15 90 30 1053
## TRUE 0 0 0 42
##
## Trendy, Casual Trendy, Upscale Upscale Upscale, Casual
## FALSE 308 56 72 15
## TRUE 7 4 3 0
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$numcat) # Group 3 and 4
##
## 0 1 2 3 4
## FALSE 6382 15926 6692 922 30
## TRUE 8 229 103 23 0
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrrating.num) # Calculate the rating before the deal period!
##
## 1 1.5 2 2.5 3 3.5 4 4.5 5
## FALSE 345 404 1042 2250 5423 7271 4610 1010 360
## TRUE 0 1 8 30 112 154 40 10 0
######## Variables to match:
vars <- c("pricepoint", "reviewscrreviews", "reviewscrgoodforgroups", "reviewscrgoodforkids",
"reviewscrwaiter", "american", "european", "asian", "bar")
temp <- na.omit(select(restlist_melt_rating, meanRating, numReviews, reviewscrrating.num,
reviewscrreviews, inyipit, numDealsTr, pricepoint, reviewscrgoodforgroups,
reviewscrgoodforkids, reviewscrwaiter, american, european, asian, bar, Period,
Phone))
# 20918 NAs in meanRating, 0 in numReview, 7245 in reviewscrrating.num, 0 in
# inyipit, 0 in numDealsTr, 0 in pricepoint, 0 in reviewscrreviews, 0 in
# reviewscrgoodforgroups, 0 for reviewscrgoodforkids, 0 in reviewscrwaiter,
# 0 in american, european, asian, and bar
temp$reviewscrgoodforgroups <- factor(temp$reviewscrgoodforgroups)
temp$reviewscrgoodforkids <- factor(temp$reviewscrgoodforkids)
temp$reviewscrwaiter <- factor(temp$reviewscrwaiter)
temp$european <- as.integer(temp$european)
temp$asian <- as.integer(temp$asian)
temp$inyipit <- as.integer(temp$inyipit)
temp$numDealsTr <- as.integer(temp$numDealsTr)
# we should have 0 nas
sum(is.na(temp))
## [1] 0
We compute L1 statistic, as well as several unidimensional measures of imbalance via our imbalance function. In our running example:
imbalance(group = temp$numDealsTr, data = temp, drop = c("meanRating", "numReviews",
"reviewscrrating.num", "numDealsTr", "inyipit", "Phone"))
##
## Multivariate Imbalance Measure: L1=0.922
## Percentage of local common support: LCS=3.1%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -18.53434 (diff) 0.00000 -1 -27 -5 -16 586
## pricepoint -0.15432 (diff) 0.14891 0 -1 0 0 0
## reviewscrgoodforgroups 13.39442 (Chi2) 0.09691 NA NA NA NA NA
## reviewscrgoodforkids 1.35001 (Chi2) 0.02600 NA NA NA NA NA
## reviewscrwaiter 14.44491 (Chi2) 0.12934 NA NA NA NA NA
## american 0.02897 (diff) 0.02897 0 0 0 1 0
## european -0.03190 (diff) 0.03190 0 0 0 0 0
## asian -0.04799 (diff) 0.04799 0 0 0 0 0
## bar 0.01353 (diff) 0.01353 0 0 0 0 0
## Period 49.86355 (Chi2) 0.11039 NA NA NA NA NA
res <- cem(treatment = "numDealsTr", data = temp, drop = c("meanRating", "numReviews",
"reviewscrrating.num", "inyipit", "Phone", "Period"), eval.imbalance = TRUE,
keep.all = TRUE)
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
res1 <- cem(treatment = "numDealsTr", data = temp, drop = c("meanRating", "numReviews",
"reviewscrrating.num", "inyipit", "Phone"), eval.imbalance = TRUE, keep.all = TRUE) # matching on Period
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
res
## G0 G1
## All 9136 247
## Matched 5779 247
## Unmatched 3357 0
##
##
## Multivariate Imbalance Measure: L1=0.267
## Percentage of local common support: LCS=67.9%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -2.677e+00 (diff) 0.000e+00 -1 -13 3 7 0
## pricepoint 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## reviewscrgoodforgroups 3.406e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrgoodforkids 6.729e+00 (Chi2) 2.776e-17 NA NA NA NA NA
## reviewscrwaiter 1.273e+01 (Chi2) 0.000e+00 NA NA NA NA NA
## american 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## european 0.000e+00 (diff) 5.551e-17 0 0 0 0 0
## asian -2.776e-17 (diff) 0.000e+00 0 0 0 0 0
## bar 0.000e+00 (diff) 5.551e-17 0 0 0 0 0
res1
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
##
## Multivariate Imbalance Measure: L1=0.611
## Percentage of local common support: LCS=32.1%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -1.758e+00 (diff) 5.551e-17 1 -18 -1 19 27
## pricepoint 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## reviewscrgoodforgroups 4.845e+00 (Chi2) 6.332e-17 NA NA NA NA NA
## reviewscrgoodforkids 1.757e+01 (Chi2) 8.413e-17 NA NA NA NA NA
## reviewscrwaiter 2.743e+00 (Chi2) 6.939e-17 NA NA NA NA NA
## american 2.776e-17 (diff) 8.327e-17 0 0 0 0 0
## european 2.776e-17 (diff) 6.939e-17 0 0 0 0 0
## asian 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## bar 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## Period 7.868e+00 (Chi2) 9.021e-17 NA NA NA NA NA
Using the output from cem, we can estimate SATT via the att function. The simplest approach requires a weighted difference in means (unless k2k was used, in which case no weights are required). For convenience, we compute this as a regression of the outcome variable on a constant and the treatment variable,
####### LM: mean rating in time period
att(res, meanRating ~ numDealsTr, data = temp)
##
## G0 G1
## All 9136 247
## Matched 5779 247
## Unmatched 3357 0
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: -0.145474 (p.value=0.030234)
## 95% conf. interval: [-0.277018, -0.013930]
att(res1, meanRating ~ numDealsTr, data = temp)
##
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: -0.244992 (p.value=0.001417)
## 95% conf. interval: [-0.395226, -0.094758]
# number of reviews in time period
att(res, numReviews ~ numDealsTr, data = temp)
##
## G0 G1
## All 9136 247
## Matched 5779 247
## Unmatched 3357 0
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: 0.318526 (p.value=0.067998)
## 95% conf. interval: [-0.023491, 0.660543]
att(res1, numReviews ~ numDealsTr, data = temp)
##
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
## Linear regression model on CEM matched data:
##
## SATT point estimate: 0.383041 (p.value=0.004572)
## 95% conf. interval: [0.118630, 0.647452]
######## LME: (p-values broken) mean rating in time period
att(res, meanRating ~ numDealsTr, data = temp, model = "lme")
##
## G0 G1
## All 9136 247
## Matched 5779 247
## Unmatched 3357 0
##
## Linear random effect model on CEM matched data:
##
## SATT point estimate: -0.152910 (p.value=2.000000)
## 95% conf. interval: [-0.178045, -0.127776]
att(res1, meanRating ~ numDealsTr, data = temp, model = "lme")
##
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
## Linear random effect model on CEM matched data:
##
## SATT point estimate: -0.231997 (p.value=2.000000)
## 95% conf. interval: [-0.244379, -0.219615]
# number of reviews in time period
att(res, numReviews ~ numDealsTr, data = temp, model = "lme")
##
## G0 G1
## All 9136 247
## Matched 5779 247
## Unmatched 3357 0
##
## Linear random effect model on CEM matched data:
##
## SATT point estimate: 0.321328 (p.value=0.000000)
## 95% conf. interval: [0.262938, 0.379719]
att(res1, numReviews ~ numDealsTr, data = temp, model = "lme")
##
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
## Linear random effect model on CEM matched data:
##
## SATT point estimate: 0.475685 (p.value=0.000000)
## 95% conf. interval: [0.450712, 0.500657]
model1 <- att(res, meanRating ~ +numDealsTr + pricepoint + reviewscrreviews +
reviewscrgoodforgroups + reviewscrgoodforkids + reviewscrwaiter + american +
european + asian + bar, data = temp, model = "lme")
model1$att.model
## (Intercept) numDealsTr pricepoint reviewscrreviews
## Estimate 3.236e+00 -0.17620 0.0938 9.319e-04
## Std. Error 3.825e-02 0.01275 0.0103 2.663e-05
## DF 5.946e+03 5946.00000 67.0000 5.946e+03
## t value 8.458e+01 -13.81602 9.1088 3.499e+01
## p-value 0.000e+00 2.00000 0.0000 0.000e+00
## reviewscrgoodforgroupsNo reviewscrgoodforgroupsYes
## Estimate 0.87933 0.71533
## Std. Error 0.04312 0.04248
## DF 67.00000 67.00000
## t value 20.39053 16.83802
## p-value 0.00000 0.00000
## reviewscrgoodforkidsNo reviewscrgoodforkidsYes
## Estimate -0.83511 -0.71241
## Std. Error 0.04281 0.04215
## DF 67.00000 67.00000
## t value -19.50558 -16.90304
## p-value 2.00000 2.00000
## reviewscrwaiterYes american european asian bar
## Estimate -0.19184 0.02768 0.02824 -0.13004 0.002304
## Std. Error 0.01669 0.01279 0.01325 0.01455 0.013218
## DF 67.00000 67.00000 67.00000 67.00000 67.000000
## t value -11.49388 2.16476 2.13083 -8.93655 0.174285
## p-value 2.00000 0.03041 0.03310 2.00000 0.861642
model2 <- att(res1, meanRating ~ +numDealsTr * pricepoint + reviewscrreviews +
reviewscrgoodforgroups + reviewscrgoodforkids + reviewscrwaiter + american +
european + asian + bar, data = temp, model = "lme")
model2$att.model
## (Intercept) numDealsTr pricepoint reviewscrreviews
## Estimate 3.7374 -0.71759 2.223e-02 1.255e-03
## Std. Error 0.0311 0.02341 8.888e-03 2.597e-05
## DF 1623.0000 1623.00000 1.680e+02 1.623e+03
## t value 120.1604 -30.65701 2.501e+00 4.833e+01
## p-value 0.0000 2.00000 1.237e-02 0.000e+00
## reviewscrgoodforgroupsNo reviewscrgoodforgroupsYes
## Estimate -0.03288 -0.11365
## Std. Error 0.05955 0.05926
## DF 168.00000 168.00000
## t value -0.55210 -1.91778
## p-value 1.41912 1.94486
## reviewscrgoodforkidsNo reviewscrgoodforkidsYes
## Estimate -0.35464 -0.27385
## Std. Error 0.06612 0.06582
## DF 168.00000 168.00000
## t value -5.36322 -4.16070
## p-value 2.00000 1.99997
## reviewscrwaiterYes american european asian bar
## Estimate -0.1970 1.179e-01 6.218e-02 -0.117918 -0.172967
## Std. Error 0.0102 6.925e-03 7.536e-03 0.005869 0.006937
## DF 168.0000 1.680e+02 1.680e+02 168.000000 168.000000
## t value -19.3133 1.703e+01 8.250e+00 -20.091933 -24.934850
## p-value 2.0000 0.000e+00 2.220e-16 2.000000 2.000000
## numDealsTr:pricepoint
## Estimate 0.2245
## Std. Error 0.0116
## DF 1623.0000
## t value 19.3582
## p-value 0.0000
# Looks like using LMER works better 'cus the att function doesn't like the
# random effect on Phone Interaction of NumDealsTr*PricePoint
summary(lmer(data = temp, meanRating ~ numDealsTr * (pricepoint) + reviewscrreviews +
reviewscrgoodforgroups + reviewscrgoodforkids + american + european + asian +
bar + (1 | Phone), weights = res1$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint) + reviewscrreviews + reviewscrgoodforgroups +
## reviewscrgoodforkids + american + european + asian + bar +
## (1 | Phone)
## Data: temp
## Weights: res1$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.35 0.00 0.00 0.00 8.76
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.112 0.335
## Residual 0.112 0.335
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.858148 0.156366 24.67
## numDealsTr -0.717091 0.132100 -5.43
## pricepoint -0.141831 0.040268 -3.52
## reviewscrreviews 0.001266 0.000201 6.30
## reviewscrgoodforgroupsNo 0.051703 0.289710 0.18
## reviewscrgoodforgroupsYes 0.018672 0.286877 0.07
## reviewscrgoodforkidsNo -0.525707 0.325679 -1.61
## reviewscrgoodforkidsYes -0.386037 0.321399 -1.20
## american 0.155546 0.051069 3.05
## european 0.129354 0.055065 2.35
## asian -0.218899 0.051730 -4.23
## bar -0.212786 0.053730 -3.96
## numDealsTr:pricepoint 0.272868 0.064517 4.23
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr rvwscrgdfrgN rvwscrgdfrgY
## numDealsTr -0.127
## pricepoint -0.270 0.260
## revwscrrvws 0.032 -0.033 -0.154
## rvwscrgdfrgN 0.004 0.056 -0.026 -0.038
## rvwscrgdfrgY 0.008 0.057 -0.044 -0.059 0.980
## rvwscrgdfrkN -0.417 -0.049 -0.089 -0.007 -0.854 -0.862
## rvwscrgdfrkY -0.443 -0.049 -0.012 0.003 -0.867 -0.876
## american 0.034 -0.020 -0.129 0.051 -0.012 -0.025
## european 0.046 -0.036 -0.173 0.105 -0.010 -0.032
## asian 0.057 -0.037 -0.212 -0.029 -0.020 -0.034
## bar -0.053 -0.004 0.210 -0.063 -0.001 -0.022
## nmDlsTr:prc 0.124 -0.964 -0.273 0.015 -0.050 -0.049
## rvwscrgdfrkN rvwscrgdfrkY amercn europn asian bar
## numDealsTr
## pricepoint
## revwscrrvws
## rvwscrgdfrgN
## rvwscrgdfrgY
## rvwscrgdfrkN
## rvwscrgdfrkY 0.987
## american -0.047 0.001
## european -0.018 0.002 0.359
## asian 0.015 0.002 0.266 0.259
## bar -0.064 0.000 -0.109 -0.059 0.076
## nmDlsTr:prc 0.044 0.044 0.019 0.025 0.036 0.007
####### Interaction of NumDealsTr*PricePoint and NumDealsTr*reviewscrreviews
summary(lmer(data = temp, meanRating ~ numDealsTr * (pricepoint) + numDealsTr *
reviewscrreviews + reviewscrgoodforgroups + reviewscrgoodforkids + american +
european + asian + bar + (1 | Phone), weights = res1$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint) + numDealsTr * reviewscrreviews +
## reviewscrgoodforgroups + reviewscrgoodforkids + american +
## european + asian + bar + (1 | Phone)
## Data: temp
## Weights: res1$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.36 0.00 0.00 0.00 8.77
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.112 0.334
## Residual 0.112 0.334
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.844987 0.156349 24.59
## numDealsTr -0.610515 0.136364 -4.48
## pricepoint -0.148912 0.040313 -3.69
## reviewscrreviews 0.001557 0.000221 7.04
## reviewscrgoodforgroupsNo 0.066589 0.289614 0.23
## reviewscrgoodforgroupsYes 0.041561 0.286836 0.14
## reviewscrgoodforkidsNo -0.554908 0.325660 -1.70
## reviewscrgoodforkidsYes -0.410402 0.321342 -1.28
## american 0.157805 0.051050 3.09
## european 0.126794 0.055045 2.30
## asian -0.220464 0.051708 -4.26
## bar -0.215849 0.053713 -4.02
## numDealsTr:pricepoint 0.300995 0.065111 4.62
## numDealsTr:reviewscrreviews -0.001267 0.000405 -3.13
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr rvwscrgdfrgN rvwscrgdfrgY
## numDealsTr -0.129
## pricepoint -0.268 0.237
## revwscrrvws 0.017 0.076 -0.163
## rvwscrgdfrgN 0.003 0.059 -0.027 -0.027
## rvwscrgdfrgY 0.008 0.062 -0.046 -0.043 0.980
## rvwscrgdfrkN -0.416 -0.055 -0.087 -0.019 -0.854 -0.862
## rvwscrgdfrkY -0.442 -0.053 -0.010 -0.007 -0.867 -0.876
## american 0.034 -0.016 -0.130 0.052 -0.011 -0.025
## european 0.046 -0.039 -0.172 0.089 -0.011 -0.033
## asian 0.057 -0.038 -0.211 -0.030 -0.020 -0.034
## bar -0.053 -0.009 0.210 -0.065 -0.001 -0.023
## nmDlsTr:prc 0.119 -0.890 -0.278 0.072 -0.048 -0.045
## nmDlsTr:rvw 0.027 -0.250 0.056 -0.420 -0.016 -0.026
## rvwscrgdfrkN rvwscrgdfrkY amercn europn asian bar
## numDealsTr
## pricepoint
## revwscrrvws
## rvwscrgdfrgN
## rvwscrgdfrgY
## rvwscrgdfrkN
## rvwscrgdfrkY 0.987
## american -0.048 0.001
## european -0.018 0.003 0.359
## asian 0.015 0.002 0.265 0.259
## bar -0.063 0.000 -0.109 -0.059 0.076
## nmDlsTr:prc 0.040 0.040 0.020 0.023 0.035 0.004
## nmDlsTr:rvw 0.029 0.024 -0.014 0.015 0.010 0.018
## nmDlsTr:p
## numDealsTr
## pricepoint
## revwscrrvws
## rvwscrgdfrgN
## rvwscrgdfrgY
## rvwscrgdfrkN
## rvwscrgdfrkY
## american
## european
## asian
## bar
## nmDlsTr:prc
## nmDlsTr:rvw -0.138
# Both interaction with pricepoint and reviewscrreviews are significant +/-
####### Interaction of NumDealsTr*PricePoint and NumDealsTr*reviewscrreviews and
####### numDealsTr*reviewscrrating.num
summary(lmer(data = temp, meanRating ~ numDealsTr * (pricepoint) + numDealsTr *
reviewscrreviews + numDealsTr * reviewscrrating.num + reviewscrgoodforgroups +
reviewscrgoodforkids + american + european + asian + bar + (1 | Phone),
weights = res1$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint) + numDealsTr * reviewscrreviews +
## numDealsTr * reviewscrrating.num + reviewscrgoodforgroups +
## reviewscrgoodforkids + american + european + asian + bar +
## (1 | Phone)
## Data: temp
## Weights: res1$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.47 0.00 0.00 0.00 9.43
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.103 0.321
## Residual 0.103 0.321
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.609654 0.192210 3.17
## numDealsTr -0.802641 0.291892 -2.75
## pricepoint -0.137964 0.038669 -3.57
## reviewscrreviews 0.000722 0.000214 3.37
## reviewscrrating.num 0.930045 0.034144 27.24
## reviewscrgoodforgroupsNo -0.921659 0.280197 -3.29
## reviewscrgoodforgroupsYes -0.747060 0.276689 -2.70
## reviewscrgoodforkidsNo 0.413761 0.314588 1.32
## reviewscrgoodforkidsYes 0.482014 0.310218 1.55
## american 0.204005 0.049044 4.16
## european 0.138072 0.052902 2.61
## asian -0.015492 0.050151 -0.31
## bar -0.170230 0.051551 -3.30
## numDealsTr:pricepoint 0.323641 0.062587 5.17
## numDealsTr:reviewscrreviews -0.000544 0.000390 -1.40
## numDealsTr:reviewscrrating.num 0.026987 0.075068 0.36
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr rvwsc. rvwscrgdfrgN rvwscrgdfrgY
## numDealsTr -0.220
## pricepoint -0.217 0.113
## revwscrrvws 0.103 -0.001 -0.163
## rvwscrrtng. -0.625 0.249 0.012 -0.143
## rvwscrgdfrgN 0.084 -0.010 -0.028 -0.008 -0.131
## rvwscrgdfrgY 0.072 -0.003 -0.047 -0.027 -0.106 0.980
## rvwscrgdfrkN -0.396 0.018 -0.085 -0.035 0.118 -0.856 -0.864
## rvwscrgdfrkY -0.413 0.021 -0.009 -0.023 0.112 -0.869 -0.877
## american -0.003 0.033 -0.129 0.045 0.045 -0.017 -0.030
## european 0.045 -0.073 -0.172 0.089 -0.012 -0.009 -0.031
## asian -0.048 0.015 -0.207 -0.051 0.148 -0.039 -0.049
## bar -0.056 -0.019 0.210 -0.067 0.025 -0.004 -0.025
## nmDlsTr:prc 0.099 -0.456 -0.277 0.072 -0.008 -0.046 -0.044
## nmDlsTr:rvw -0.011 -0.139 0.056 -0.423 0.053 -0.023 -0.031
## nmDlsTr:rv. 0.204 -0.894 -0.007 0.041 -0.291 0.042 0.036
## rvwscrgdfrkN rvwscrgdfrkY amercn europn asian bar
## numDealsTr
## pricepoint
## revwscrrvws
## rvwscrrtng.
## rvwscrgdfrgN
## rvwscrgdfrgY
## rvwscrgdfrkN
## rvwscrgdfrkY 0.987
## american -0.041 0.006
## european -0.020 0.000 0.355
## asian 0.032 0.018 0.269 0.255
## bar -0.060 0.003 -0.109 -0.057 0.079
## nmDlsTr:prc 0.037 0.038 0.018 0.027 0.033 0.006
## nmDlsTr:rvw 0.034 0.029 -0.013 0.017 0.018 0.021
## nmDlsTr:rv. -0.049 -0.051 -0.046 0.063 -0.038 0.016
## nmDlsTr:p nmDlsTr:r
## numDealsTr
## pricepoint
## revwscrrvws
## rvwscrrtng.
## rvwscrgdfrgN
## rvwscrgdfrgY
## rvwscrgdfrkN
## rvwscrgdfrkY
## american
## european
## asian
## bar
## nmDlsTr:prc
## nmDlsTr:rvw -0.135
## nmDlsTr:rv. 0.065 0.030
# Interaction with pricepoint is only significant now. Drop interaction
# with b.rating
####### Interactions with cuisine
summary(lmer(data = temp, meanRating ~ numDealsTr * (pricepoint + reviewscrreviews +
american + european + asian + bar) + (1 | Phone), weights = res1$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint + reviewscrreviews + american +
## european + asian + bar) + (1 | Phone)
## Data: temp
## Weights: res1$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.45 0.00 0.00 0.00 8.73
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.111 0.333
## Residual 0.111 0.333
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.578426 0.060972 58.7
## numDealsTr -0.586730 0.137475 -4.3
## pricepoint -0.217798 0.035312 -6.2
## reviewscrreviews 0.001470 0.000219 6.7
## american 0.082597 0.050863 1.6
## european 0.211485 0.056878 3.7
## asian -0.235812 0.052974 -4.5
## bar -0.262292 0.051395 -5.1
## numDealsTr:pricepoint 0.329038 0.070189 4.7
## numDealsTr:reviewscrreviews -0.001465 0.000406 -3.6
## numDealsTr:american 0.237473 0.097562 2.4
## numDealsTr:european -0.636808 0.098653 -6.5
## numDealsTr:asian 0.167516 0.104558 1.6
## numDealsTr:bar -0.098334 0.105653 -0.9
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr amercn europn asian bar
## numDealsTr -0.331
## pricepoint -0.834 0.276
## revwscrrvws -0.124 0.073 -0.250
## american 0.134 -0.054 -0.362 0.037
## european 0.081 -0.046 -0.305 0.075 0.348
## asian 0.001 -0.017 -0.222 -0.033 0.300 0.267
## bar -0.068 0.026 0.031 -0.115 -0.303 -0.148 0.097
## nmDlsTr:prc 0.303 -0.844 -0.348 0.083 0.127 0.108 0.069 -0.036
## nmDlsTr:rvw 0.072 -0.232 0.088 -0.428 -0.031 -0.017 0.008 0.038
## nmDlsTr:mrc -0.056 0.109 0.129 -0.044 -0.304 -0.127 -0.084 0.143
## nmDlsTr:rpn -0.041 0.022 0.110 -0.027 -0.133 -0.310 -0.085 0.067
## numDlsTr:sn -0.004 -0.067 0.060 -0.002 -0.081 -0.076 -0.259 -0.004
## numDlsTr:br 0.041 -0.111 -0.041 0.015 0.139 0.058 -0.001 -0.309
## nmDlsTr:p nmDlsTr:rv nmDlsTr:m nmDlsTr:rp nmDlsTr:s
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## nmDlsTr:prc
## nmDlsTr:rvw -0.163
## nmDlsTr:mrc -0.349 0.068
## nmDlsTr:rpn -0.270 0.094 0.362
## numDlsTr:sn -0.132 -0.022 0.284 0.278
## numDlsTr:br 0.072 -0.026 -0.306 -0.135 0.064
# Interaction with pricepoint, reviews, american, european are significant
# Story so far is that there is an interesting interaction with pricepoint,
# reviewscrreviews, and some cuisines american, and european. This is an
# evidence that deal effects are not as simple as considered in the
# literature. In fact, consumer quality evaluations depend on a variety of
# factors.
# Pricepoint: If numDeals=1, pricepoint slope = -0.2177977+0.3290378 =
# 0.1112401 Pricepoint=1:-0.5867296+0.3290378*1 Pricepoint=2: Pricepoint=3:
# Pricepoint=4:
###### 1. Use Zip Code extract the zip code from the restaddress
restlist_melt_rating$zip <- gsub(".+(DC |VA |MD)(.+)", "\\2", restlist_melt_rating$restaddresses)
restlist_melt_rating$zip[!grepl("2", restlist_melt_rating$zip)] <- NA #Mostly food trucks and mkieDs
######## Combine the zips
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$zip)
##
## 20815 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010
## FALSE 180 4409 2526 1411 1316 1749 1688 2183 1386 4033 1266
## TRUE 0 31 24 14 19 21 22 37 24 62 24
##
## 20011 20012 20013 20015 20016 20020 20024 20032 20033 20036 20037
## FALSE 59 15 15 293 713 225 204 30 15 3080 1016
## TRUE 1 0 0 7 7 0 6 0 0 40 4
##
## 20050 20052 20059 20201 20210 20212 20223 20224 20301 20393 20420
## FALSE 427 90 15 15 15 30 15 15 15 15 15
## TRUE 8 0 0 0 0 0 0 0 0 0 0
##
## 20426 20441 20502 20515 20530 20540 20547 20565 20566 22201 22209
## FALSE 15 30 15 30 15 15 15 15 30 397 672
## TRUE 0 0 0 0 0 0 0 0 0 8 3
##
## 22216
## FALSE 29
## TRUE 1
table(restlist_melt_rating$zip)/15
##
## 20815 20001 20002 20003 20004 20005 20006 20007 20008 20009
## 12 296 170 95 89 118 114 148 94 273
## 20010 20011 20012 20013 20015 20016 20020 20024 20032 20033
## 86 4 1 1 20 48 15 14 2 1
## 20036 20037 20050 20052 20059 20201 20210 20212 20223 20224
## 208 68 29 6 1 1 1 2 1 1
## 20301 20393 20420 20426 20441 20502 20515 20530 20540 20547
## 1 1 1 1 2 1 2 1 1 1
## 20565 20566 22201 22209 22216
## 1 2 27 45 2
bigzips <- c("20001", "20002", "20003", "20004", "20005", "20006", "20007",
"20008", "20009", "20010", "20036")
restlist_melt_rating$zipgrp <- restlist_melt_rating$zip
restlist_melt_rating$zipgrp[!(restlist_melt_rating$zipgrp %in% bigzips)] <- "Other"
restlist_melt_rating$zipgrp <- factor(restlist_melt_rating$zipgrp, levels = c("Other",
bigzips))
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$zipgrp)
##
## Other 20001 20002 20003 20004 20005 20006 20007 20008 20009 20010
## FALSE 4905 4409 2526 1411 1316 1749 1688 2183 1386 4033 1266
## TRUE 45 31 24 14 19 21 22 37 24 62 24
##
## 20036
## FALSE 3080
## TRUE 40
##### Doint it programatically (was taking too long) df <-
##### select(restlist_melt_rating, Phone, Period, numDealsTr, zip) smallzips <-
##### select(subset(summarize(group_by(df, zip), count=n(),
##### dis=n_distinct(Phone), disBin=n_distinct(Phone)<70), disBin ==TRUE), zip)
##### smallzips <- smallzips[-35,]
######## 2. Calculate the numer of deals per zipcode per period
# find the number of deals in the zipcode per period.
# View(summarize(group_by(restlist_melt_rating, zipgrp, Period), count=n(),
# dis=n_distinct(Phone), deals=sum(numDealsTr)))
# summarize(group_by(restlist_melt_rating, zipgrp, Period),
# distinct=n_distinct(Phone))
df <- summarize(group_by(restlist_melt_rating, zipgrp, Period), restZipPeriod = n_distinct(Phone),
dealsZipPeriod = sum(numDealsTr))
restlist_melt_rating <- (merge(x = restlist_melt_rating, y = df, by = c("zipgrp",
"Period"), all = TRUE))
temp.loc <- na.omit(select(restlist_melt_rating, meanRating, numReviews, reviewscrrating.num,
reviewscrreviews, inyipit, numDealsTr, pricepoint, reviewscrgoodforgroups,
reviewscrgoodforkids, reviewscrwaiter, american, european, asian, bar, Period,
Phone, zipgrp, restZipPeriod, dealsZipPeriod, inyipit, yipit.appearances))
# 20918 NAs in meanRating, 0 in numReview, 7245 in reviewscrrating.num, 0 in
# inyipit, 0 in numDealsTr, 0 in pricepoint, 0 in reviewscrreviews, 0 in
# reviewscrgoodforgroups, 0 for reviewscrgoodforkids, 0 in reviewscrwaiter,
# 0 in american, european, asian, and bar, 195 NAs in zip, 0 for inyipit and
# yipit.apperances
temp.loc$reviewscrgoodforgroups <- factor(temp.loc$reviewscrgoodforgroups)
temp.loc$reviewscrgoodforkids <- factor(temp.loc$reviewscrgoodforkids)
temp.loc$reviewscrwaiter <- factor(temp.loc$reviewscrwaiter)
temp.loc$european <- as.integer(temp.loc$european)
temp.loc$asian <- as.integer(temp.loc$asian)
temp.loc$inyipit <- as.integer(temp.loc$inyipit)
temp.loc$numDealsTr <- as.integer(temp.loc$numDealsTr)
# we should have 0 nas
sum(is.na(temp.loc))
## [1] 0
######## Matching
res.zip <- cem(treatment = "numDealsTr", data = temp.loc, drop = c("meanRating",
"numReviews", "reviewscrrating.num", "inyipit", "Phone", "zipgrp", "restZipPeriod",
"dealsZipPeriod", "yipit.appearances"), eval.imbalance = TRUE, keep.all = TRUE) # matching on Period
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
res.zip
## G0 G1
## All 9136 247
## Matched 1608 197
## Unmatched 7528 50
##
##
## Multivariate Imbalance Measure: L1=0.611
## Percentage of local common support: LCS=32.1%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -1.758e+00 (diff) 5.551e-17 1 -18 -1 19 27
## pricepoint 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## reviewscrgoodforgroups 4.845e+00 (Chi2) 6.332e-17 NA NA NA NA NA
## reviewscrgoodforkids 1.757e+01 (Chi2) 8.413e-17 NA NA NA NA NA
## reviewscrwaiter 2.743e+00 (Chi2) 6.939e-17 NA NA NA NA NA
## american 2.776e-17 (diff) 8.327e-17 0 0 0 0 0
## european 2.776e-17 (diff) 6.939e-17 0 0 0 0 0
## asian 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## bar 0.000e+00 (diff) 6.939e-17 0 0 0 0 0
## Period 7.868e+00 (Chi2) 9.021e-17 NA NA NA NA NA
Using Zipcode as an IV
########## meanRating ~ Tr * Chars + Neighborhood
summary(lmer(data = temp.loc, meanRating ~ numDealsTr * pricepoint + numDealsTr *
reviewscrreviews + numDealsTr * american + numDealsTr * european + asian +
bar + (1 | Phone) + zipgrp, weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * pricepoint + numDealsTr * reviewscrreviews +
## numDealsTr * american + numDealsTr * european + asian + bar +
## (1 | Phone) + zipgrp
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.48 0.00 0.00 0.00 8.83
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.11 0.331
## Residual 0.11 0.331
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.572196 0.072878 49.0
## numDealsTr -0.575208 0.136696 -4.2
## pricepoint -0.202131 0.035423 -5.7
## reviewscrreviews 0.001720 0.000222 7.7
## american 0.064957 0.050198 1.3
## european 0.264426 0.056487 4.7
## asian -0.255255 0.051726 -4.9
## bar -0.282368 0.049031 -5.8
## zipgrp20001 0.211902 0.066052 3.2
## zipgrp20002 -0.169014 0.082346 -2.1
## zipgrp20003 0.223585 0.088949 2.5
## zipgrp20004 -0.325557 0.087169 -3.7
## zipgrp20005 0.055821 0.084810 0.7
## zipgrp20006 -0.081185 0.088493 -0.9
## zipgrp20007 0.024689 0.079225 0.3
## zipgrp20008 -0.035623 0.079681 -0.4
## zipgrp20009 -0.207117 0.062843 -3.3
## zipgrp20010 0.526237 0.111886 4.7
## zipgrp20036 -0.358310 0.071723 -5.0
## numDealsTr:pricepoint 0.349111 0.069429 5.0
## numDealsTr:reviewscrreviews -0.001501 0.000406 -3.7
## numDealsTr:american 0.148998 0.087827 1.7
## numDealsTr:european -0.679846 0.093411 -7.3
##
## Correlation matrix not shown by default, as p = 23 > 20.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
# Maintain our significant findings: numDealsTr, pricepoint, reviews,
# cuisines Significant zip codes: 20002,20004,20006,20007,20008,20009,20010
########## meanRating ~ Tr*Chars + Tr* Neighborhood
summary(lmer(data = temp.loc, meanRating ~ numDealsTr * pricepoint + numDealsTr *
reviewscrreviews + numDealsTr * american + numDealsTr * european + asian +
bar + (1 | Phone) + numDealsTr * zipgrp, weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * pricepoint + numDealsTr * reviewscrreviews +
## numDealsTr * american + numDealsTr * european + asian + bar +
## (1 | Phone) + numDealsTr * zipgrp
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.50 0.00 0.00 0.00 8.87
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.109 0.33
## Residual 0.109 0.33
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.568557 0.073602 48.5
## numDealsTr -0.485394 0.166997 -2.9
## pricepoint -0.202225 0.035408 -5.7
## reviewscrreviews 0.001771 0.000223 7.9
## american 0.060823 0.050175 1.2
## european 0.252688 0.056496 4.5
## asian -0.254409 0.051868 -4.9
## bar -0.298708 0.049129 -6.1
## zipgrp20001 0.235093 0.067214 3.5
## zipgrp20002 -0.124749 0.084872 -1.5
## zipgrp20003 0.205484 0.090891 2.3
## zipgrp20004 -0.296908 0.089971 -3.3
## zipgrp20005 0.042017 0.086715 0.5
## zipgrp20006 -0.117254 0.090739 -1.3
## zipgrp20007 0.122115 0.082758 1.5
## zipgrp20008 -0.070981 0.081062 -0.9
## zipgrp20009 -0.228792 0.064742 -3.5
## zipgrp20010 0.642584 0.122421 5.2
## zipgrp20036 -0.383078 0.074881 -5.1
## numDealsTr:pricepoint 0.262598 0.073145 3.6
## numDealsTr:reviewscrreviews -0.001096 0.000422 -2.6
## numDealsTr:american 0.239042 0.090811 2.6
## numDealsTr:european -0.517680 0.100280 -5.2
## numDealsTr:zipgrp20001 -0.332423 0.155924 -2.1
## numDealsTr:zipgrp20002 -0.328621 0.167487 -2.0
## numDealsTr:zipgrp20003 0.203894 0.179679 1.1
## numDealsTr:zipgrp20004 -0.218599 0.182470 -1.2
## numDealsTr:zipgrp20005 0.252674 0.195424 1.3
## numDealsTr:zipgrp20006 0.319214 0.195207 1.6
## numDealsTr:zipgrp20007 -0.626215 0.172691 -3.6
## numDealsTr:zipgrp20008 0.490259 0.178764 2.7
## numDealsTr:zipgrp20009 0.145653 0.129604 1.1
## numDealsTr:zipgrp20010 -0.389955 0.189973 -2.1
## numDealsTr:zipgrp20036 0.086634 0.145572 0.6
##
## Correlation matrix not shown by default, as p = 34 > 20.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
# Above + numDealsTr* for 20003, 20005, 20006, 20008, 20009, 20036
Using dealsPerPeriodPerZip (the deals in a time period in a zip code) and restaurantsPerPeriodPerZip (the restaurant in a zip code [not changing over periods])
########## meanRating ~ Tr * Chars + Deal Competition
summary(lmer(data = temp.loc, meanRating ~ numDealsTr * (pricepoint + reviewscrreviews +
american + european + asian + bar) + (1 | Phone) + restZipPeriod + dealsZipPeriod,
weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint + reviewscrreviews + american +
## european + asian + bar) + (1 | Phone) + restZipPeriod + dealsZipPeriod
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.55 0.00 0.00 0.00 8.92
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.111 0.333
## Residual 0.111 0.333
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.607624 0.075315 47.9
## numDealsTr -0.555972 0.137801 -4.0
## pricepoint -0.222074 0.035450 -6.3
## reviewscrreviews 0.001497 0.000219 6.8
## american 0.072267 0.050936 1.4
## european 0.205596 0.056880 3.6
## asian -0.238367 0.053221 -4.5
## bar -0.258744 0.051385 -5.0
## restZipPeriod 0.000117 0.000202 0.6
## dealsZipPeriod -0.015418 0.004690 -3.3
## numDealsTr:pricepoint 0.321046 0.070236 4.6
## numDealsTr:reviewscrreviews -0.001474 0.000406 -3.6
## numDealsTr:american 0.247471 0.097667 2.5
## numDealsTr:european -0.628833 0.098645 -6.4
## numDealsTr:asian 0.165899 0.104511 1.6
## numDealsTr:bar -0.111582 0.105701 -1.1
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr amercn europn asian bar rstZpP
## numDealsTr -0.278
## pricepoint -0.725 0.274
## revwscrrvws -0.057 0.073 -0.256
## american 0.104 -0.058 -0.358 0.035
## european 0.059 -0.048 -0.302 0.073 0.349
## asian 0.057 -0.022 -0.228 -0.027 0.300 0.266
## bar -0.046 0.027 0.029 -0.113 -0.304 -0.148 0.097
## restZipPerd -0.546 0.044 0.078 -0.059 -0.016 0.000 -0.101 -0.007
## dealsZipPrd -0.076 -0.070 0.030 -0.032 0.062 0.031 0.022 -0.020 -0.246
## nmDlsTr:prc 0.260 -0.844 -0.348 0.084 0.128 0.109 0.073 -0.037 -0.039
## nmDlsTr:rvw 0.031 -0.230 0.092 -0.430 -0.031 -0.017 0.004 0.037 0.044
## nmDlsTr:mrc -0.015 0.109 0.123 -0.040 -0.304 -0.128 -0.079 0.144 -0.040
## nmDlsTr:rpn -0.023 0.023 0.108 -0.025 -0.134 -0.311 -0.083 0.067 -0.009
## numDlsTr:sn -0.001 -0.067 0.059 -0.002 -0.080 -0.076 -0.257 -0.004 -0.006
## numDlsTr:br 0.017 -0.112 -0.037 0.012 0.141 0.059 -0.003 -0.310 0.013
## dlsZpP nmDlsTr:p nmDlsTr:rv nmDlsTr:m nmDlsTr:rp nmDlsTr:s
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## restZipPerd
## dealsZipPrd
## nmDlsTr:prc 0.037
## nmDlsTr:rvw 0.003 -0.164
## nmDlsTr:mrc -0.028 -0.348 0.066
## nmDlsTr:rpn -0.024 -0.270 0.093 0.363
## numDlsTr:sn 0.005 -0.131 -0.022 0.284 0.278
## numDlsTr:br 0.036 0.073 -0.025 -0.308 -0.136 0.064
########## meanRating ~ Tr * Chars * Deal Competition
summary(lmer(data = temp.loc, meanRating ~ numDealsTr * (pricepoint + reviewscrreviews +
american + european + asian + bar + restZipPeriod + dealsZipPeriod) + (1 |
Phone), weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint + reviewscrreviews + american +
## european + asian + bar + restZipPeriod + dealsZipPeriod) +
## (1 | Phone)
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.63 0.00 0.00 0.00 9.05
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.111 0.333
## Residual 0.111 0.333
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.607482 0.076041 47.4
## numDealsTr -0.688552 0.164995 -4.2
## pricepoint -0.223950 0.035408 -6.3
## reviewscrreviews 0.001513 0.000219 6.9
## american 0.073287 0.050886 1.4
## european 0.204622 0.056810 3.6
## asian -0.248615 0.053173 -4.7
## bar -0.263446 0.051344 -5.1
## restZipPeriod 0.000265 0.000209 1.3
## dealsZipPeriod -0.024839 0.004994 -5.0
## numDealsTr:pricepoint 0.374930 0.070881 5.3
## numDealsTr:reviewscrreviews -0.001556 0.000406 -3.8
## numDealsTr:american 0.255937 0.097889 2.6
## numDealsTr:european -0.628341 0.098541 -6.4
## numDealsTr:asian 0.214573 0.104791 2.0
## numDealsTr:bar -0.072619 0.106004 -0.7
## numDealsTr:restZipPeriod -0.001254 0.000434 -2.9
## numDealsTr:dealsZipPeriod 0.075556 0.013921 5.4
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr amercn europn asian bar rstZpP
## numDealsTr -0.308
## pricepoint -0.721 0.243
## revwscrrvws -0.051 0.037 -0.256
## american 0.098 -0.032 -0.357 0.034
## european 0.055 -0.027 -0.301 0.072 0.349
## asian 0.056 -0.011 -0.227 -0.028 0.299 0.266
## bar -0.041 0.007 0.028 -0.112 -0.305 -0.149 0.097
## restZipPerd -0.555 0.130 0.079 -0.063 -0.008 0.004 -0.101 -0.017
## dealsZipPrd -0.080 0.032 0.033 -0.037 0.059 0.032 0.032 -0.016 -0.252
## nmDlsTr:prc 0.261 -0.741 -0.347 0.087 0.126 0.106 0.067 -0.037 -0.030
## nmDlsTr:rvw 0.030 -0.183 0.092 -0.430 -0.031 -0.017 0.005 0.038 0.039
## nmDlsTr:mrc -0.027 0.134 0.124 -0.043 -0.300 -0.125 -0.079 0.141 -0.018
## nmDlsTr:rpn -0.018 0.003 0.107 -0.024 -0.135 -0.311 -0.083 0.068 -0.015
## numDlsTr:sn -0.007 -0.049 0.059 -0.003 -0.078 -0.075 -0.259 -0.006 0.013
## numDlsTr:br 0.007 -0.068 -0.036 0.010 0.143 0.060 -0.005 -0.312 0.035
## nmDlsTr:rZP 0.127 -0.379 -0.016 0.027 -0.030 -0.019 0.015 0.038 -0.249
## nmDlsTr:dZP 0.029 -0.248 -0.014 0.022 -0.003 -0.008 -0.035 -0.009 0.084
## dlsZpP nmDlsTr:p nmDlsTr:rv nmDlsTr:m nmDlsTr:rp nmDlsTr:s
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## restZipPerd
## dealsZipPrd
## nmDlsTr:prc -0.016
## nmDlsTr:rvw 0.016 -0.167
## nmDlsTr:mrc -0.025 -0.344 0.065
## nmDlsTr:rpn -0.024 -0.265 0.092 0.359
## numDlsTr:sn -0.021 -0.120 -0.025 0.286 0.276
## numDlsTr:br 0.016 0.078 -0.027 -0.298 -0.137 0.071
## nmDlsTr:rZP 0.119 -0.031 0.014 -0.081 0.026 -0.075
## nmDlsTr:dZP -0.348 0.144 -0.037 -0.002 0.007 0.075
## nmDlsTr:b nmDlsTr:rZP
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## restZipPerd
## dealsZipPrd
## nmDlsTr:prc
## nmDlsTr:rvw
## nmDlsTr:mrc
## nmDlsTr:rpn
## numDlsTr:sn
## numDlsTr:br
## nmDlsTr:rZP -0.091
## nmDlsTr:dZP 0.052 -0.342
########## meanRating ~ Tr * Chars * Deal Competition + location
summary(lmer(data = temp.loc, meanRating ~ numDealsTr * (pricepoint + reviewscrreviews +
american + european + asian + bar + restZipPeriod + dealsZipPeriod) + (1 |
Phone) + zipgrp, weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint + reviewscrreviews + american +
## european + asian + bar + restZipPeriod + dealsZipPeriod) +
## (1 | Phone) + zipgrp
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.60 0.00 0.00 0.00 9.04
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.109 0.33
## Residual 0.109 0.33
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.639460 0.182560 14.46
## numDealsTr -0.775101 0.165305 -4.69
## pricepoint -0.201684 0.035475 -5.69
## reviewscrreviews 0.001745 0.000222 7.84
## american 0.050072 0.050833 0.99
## european 0.248942 0.056766 4.39
## asian -0.292945 0.053485 -5.48
## bar -0.277414 0.051475 -5.39
## restZipPeriod 0.003067 0.000591 5.19
## dealsZipPeriod -0.016107 0.005125 -3.14
## zipgrp20001 0.291384 0.060278 4.83
## zipgrp20002 0.282683 0.099986 2.83
## zipgrp20003 0.894695 0.137210 6.52
## zipgrp20004 0.361215 0.136979 2.64
## zipgrp20005 0.655369 0.124596 5.26
## zipgrp20006 0.537022 0.128358 4.18
## zipgrp20007 0.551362 0.105548 5.22
## zipgrp20008 0.650183 0.131061 4.96
## zipgrp20009 -0.040063 0.055446 -0.72
## zipgrp20010 1.233177 0.156496 7.88
## numDealsTr:pricepoint 0.381204 0.070961 5.37
## numDealsTr:reviewscrreviews -0.001592 0.000406 -3.92
## numDealsTr:american 0.222884 0.097592 2.28
## numDealsTr:european -0.613081 0.098392 -6.23
## numDealsTr:asian 0.252185 0.104731 2.41
## numDealsTr:bar -0.006064 0.106123 -0.06
## numDealsTr:restZipPeriod -0.000932 0.000434 -2.15
## numDealsTr:dealsZipPeriod 0.075557 0.013844 5.46
##
## Correlation matrix not shown by default, as p = 28 > 20.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
# Non-identifable because restZipPeriod is a linear combination of zipgrp
########## meanRating ~ Tr * Chars * Deal Competition + inyipit
temp.loc$yipit.appearances.bin <- temp.loc$yipit.appearances > 0
summary(lmer(data = temp.loc, meanRating ~ +numDealsTr * (pricepoint + reviewscrrating.num +
reviewscrreviews + american + european + asian + bar + restZipPeriod + dealsZipPeriod) +
(1 | Phone) + yipit.appearances, weights = res.zip$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ +numDealsTr * (pricepoint + reviewscrrating.num +
## reviewscrreviews + american + european + asian + bar + restZipPeriod +
## dealsZipPeriod) + (1 | Phone) + yipit.appearances
## Data: temp.loc
## Weights: res.zip$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -9.69 0.00 0.00 0.00 9.68
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.102 0.32
## Residual 0.102 0.32
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.86e-01 1.36e-01 4.32
## numDealsTr -8.94e-01 3.13e-01 -2.85
## pricepoint -1.80e-01 3.41e-02 -5.29
## reviewscrrating.num 8.91e-01 3.37e-02 26.42
## reviewscrreviews 8.05e-04 2.13e-04 3.78
## american 1.65e-01 4.90e-02 3.36
## european 2.25e-01 5.47e-02 4.11
## asian -3.13e-02 5.19e-02 -0.60
## bar -1.90e-01 4.96e-02 -3.83
## restZipPeriod -5.65e-05 2.01e-04 -0.28
## dealsZipPeriod -2.14e-02 4.80e-03 -4.46
## yipit.appearances -4.48e-03 5.98e-03 -0.75
## numDealsTr:pricepoint 3.97e-01 6.84e-02 5.80
## numDealsTr:reviewscrrating.num 3.03e-02 7.65e-02 0.40
## numDealsTr:reviewscrreviews -8.16e-04 3.91e-04 -2.09
## numDealsTr:american 1.61e-01 9.42e-02 1.70
## numDealsTr:european -5.15e-01 9.49e-02 -5.42
## numDealsTr:asian 1.25e-01 1.01e-01 1.23
## numDealsTr:bar 1.98e-02 1.03e-01 0.19
## numDealsTr:restZipPeriod -1.17e-03 4.18e-04 -2.81
## numDealsTr:dealsZipPeriod 7.23e-02 1.34e-02 5.39
##
## Correlation matrix not shown by default, as p = 21 > 20.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
# Yipit appearances are not significant
###### Using gmap
library(ggmap)
q <- qmap("Washington, DC", zoom = 12)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Washington,+DC&zoom=12&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Washington,+DC&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
df <- summarize(group_by(restlist_melt_rating, zip), rest = n_distinct(Phone),
deals = sum(numDealsTr))
# let's add coordinates
df$lon <- 0
df$lat <- 0
df[, c("lon", "lat")] <- (geocode(df$zip)) #takes a long time.
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=+20815&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20001&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20002&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20003&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20004&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20005&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20006&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20007&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20008&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20009&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20010&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20011&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20012&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20013&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20015&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20016&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20020&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20024&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20032&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20033&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20036&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20037&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20050&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20052&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20059&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20201&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20210&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20212&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20223&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20224&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20301&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20393&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20420&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20426&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20441&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20502&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20515&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20530&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20540&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20547&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20565&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=20566&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=22201&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=22209&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=22216&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## .Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=NA&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
# with the dot being the number of restaurants in the zip code
q + geom_point(data = df, aes(x = lon, y = lat, size = rest), alpha = 0.4) +
scale_size(range = c(3, 20))
## Warning: Removed 1 rows containing missing values (geom_point).
# with the dot being the number of deals in the zip code
q + geom_point(data = df, aes(x = lon, y = lat, size = deals), color = "red",
alpha = 0.7) + scale_size(range = c(3, 20))
## Warning: Removed 1 rows containing missing values (geom_point).
# Best map:
q + geom_point(data = df, aes(x = lon, y = lat, size = rest), alpha = 0.5) +
geom_point(data = df, aes(x = lon, y = lat, size = deals * 2), alpha = 0.5,
color = "red") + scale_size(range = c(3, 20))
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
q + geom_text(data = df, aes(x = lon, y = lat, label = zip, size = deals), color = "red")
## Warning: Removed 1 rows containing missing values (geom_text).
#### Still to explore:
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscralcohol)
##
## 0 Beer & Wine Only Full Bar No
## FALSE 11009 1388 9694 7861
## TRUE 31 22 251 59
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrnoiselevel)
##
## 0 Average Loud Quiet Very Loud
## FALSE 13147 10896 1792 3222 895
## TRUE 38 249 23 33 20
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscroutdorseating)
##
## 0 No Yes
## FALSE 10358 10671 8923
## TRUE 22 144 197
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrwheelchair)
##
## 0 No Yes
## FALSE 15061 3223 11668
## TRUE 89 77 197
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrdelivery)
##
## 0 No Yes
## FALSE 11696 14244 4012
## TRUE 34 231 98
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$reviewscrtakeout)
##
## 0 No Yes
## FALSE 10529 2620 16803
## TRUE 31 50 282
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$url.dummy)
##
## FALSE TRUE
## FALSE 13253 16699
## TRUE 37 326
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$africanmiddleeastern) #45
##
## FALSE TRUE
## FALSE 28047 1905
## TRUE 318 45
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$pizza) #32
##
## FALSE TRUE
## FALSE 27839 2113
## TRUE 331 32
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$breakfast) #31
##
## FALSE TRUE
## FALSE 27178 2774
## TRUE 332 31
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$sandwiches) #23
##
## FALSE TRUE
## FALSE 25160 4792
## TRUE 340 23
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$indian) #22
##
## FALSE TRUE
## FALSE 29434 518
## TRUE 341 22
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$latin) #18
##
## FALSE TRUE
## FALSE 28005 1947
## TRUE 345 18
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$southern) #11
##
## FALSE TRUE
## FALSE 29348 604
## TRUE 352 11
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$steakhouse) #7
##
## FALSE TRUE
## FALSE 29569 383
## TRUE 356 7
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$seafood) #6
##
## FALSE TRUE
## FALSE 29268 684
## TRUE 357 6
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$vegetarian) #6
##
## FALSE TRUE
## FALSE 29583 369
## TRUE 357 6
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$foodstands) #1
##
## FALSE TRUE
## FALSE 29248 704
## TRUE 362 1
table(restlist_melt_rating$numDealsTr, restlist_melt_rating$fastfood) #0
##
## FALSE TRUE
## FALSE 29097 855
## TRUE 363 0
temp.chars <- na.omit(select(restlist_melt_rating, meanRating, numReviews, reviewscrrating.num,
reviewscrreviews, inyipit, numDealsTr, pricepoint, reviewscrgoodforgroups,
reviewscrgoodforkids, reviewscrwaiter, american, european, asian, bar, Period,
Phone, zipgrp, restZipPeriod, dealsZipPeriod, inyipit, yipit.appearances,
reviewscralcohol, reviewscrnoiselevel, reviewscroutdorseating, reviewscrwheelchair,
reviewscrdelivery, reviewscrtakeout))
# 20918 NAs in meanRating, 0 in numReview, 7245 in reviewscrrating.num, 0 in
# inyipit, 0 in numDealsTr, 0 in pricepoint, 0 in reviewscrreviews, 0 in
# reviewscrgoodforgroups, 0 for reviewscrgoodforkids, 0 in reviewscrwaiter,
# 0 in american, european, asian, and bar, 195 NAs in zip, 0 for inyipit and
# yipit.apperances. 0 for reviewscralcochol:reviewscrtakeout
# sum(is.na(restlist_melt_rating$reviewscrtakeout))
temp.chars$reviewscrgoodforgroups <- factor(temp.chars$reviewscrgoodforgroups)
temp.chars$reviewscrgoodforkids <- factor(temp.chars$reviewscrgoodforkids)
temp.chars$reviewscrwaiter <- factor(temp.chars$reviewscrwaiter)
temp.chars$reviewscralcohol <- factor(temp.chars$reviewscralcohol)
temp.chars$reviewscrnoiselevel <- factor(temp.chars$reviewscrnoiselevel)
temp.chars$reviewscroutdorseating <- factor(temp.chars$reviewscroutdorseating)
temp.chars$reviewscrwheelchair <- factor(temp.chars$reviewscrwheelchair)
temp.chars$reviewscrdelivery <- factor(temp.chars$reviewscrdelivery)
temp.chars$reviewscrtakeout <- factor(temp.chars$reviewscrtakeout)
temp.chars$european <- as.integer(temp.chars$european)
temp.chars$asian <- as.integer(temp.chars$asian)
temp.chars$inyipit <- as.integer(temp.chars$inyipit)
temp.chars$numDealsTr <- as.integer(temp.chars$numDealsTr)
# we should have 0 nas
sum(is.na(temp.chars))
## [1] 0
imbalance(group = temp.loc$numDealsTr, data = temp.loc, drop = c("Phone", "meanRating",
"numReviews", "reviewscrrating.num", "inyipit", "reviewscrwaiter", "zipgrp",
"restZipPeriod", "dealsZipPeriod", "yipit.appearances", "reviewscralcohol"))
## Error: binary operation on non-conformable arrays
cem(treatment = "numDealsTr", data = temp.chars, drop = c("meanRating", "numReviews",
"reviewscrrating.num", "inyipit", "Phone", "reviewscrwaiter", "zipgrp",
"restZipPeriod", "dealsZipPeriod", "yipit.appearances", "reviewscralcohol",
"reviewscrnoiselevel"), eval.imbalance = TRUE, keep.all = TRUE)
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## G0 G1
## All 9136 247
## Matched 258 121
## Unmatched 8878 126
##
##
## Multivariate Imbalance Measure: L1=0.587
## Percentage of local common support: LCS=31.1%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -1.224e+01 (diff) 0.000e+00 3 -19 -14 0 46
## pricepoint 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## reviewscrgoodforgroups 1.331e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrgoodforkids 2.348e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## american 5.551e-17 (diff) 0.000e+00 0 0 0 0 0
## european 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## asian 0.000e+00 (diff) 0.000e+00 0 0 0 0 0
## bar 0.000e+00 (diff) 5.551e-17 0 0 0 0 0
## Period 7.681e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscroutdorseating 2.308e+00 (Chi2) 2.776e-17 NA NA NA NA NA
## reviewscrwheelchair 7.089e+00 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrdelivery 3.101e-01 (Chi2) 0.000e+00 NA NA NA NA NA
## reviewscrtakeout 8.078e-01 (Chi2) 0.000e+00 NA NA NA NA NA
res.chars <- cem(treatment = "numDealsTr", data = temp.chars, drop = c("Phone",
"meanRating", "numReviews", "inyipit", "reviewscrwaiter", "zipgrp", "restZipPeriod",
"dealsZipPeriod", "yipit.appearances", "reviewscralcohol", "reviewscroutdorseating",
"reviewscrnoiselevel", "reviewscrwheelchair", "reviewscrdelivery"), eval.imbalance = TRUE,
keep.all = TRUE, cutpoints = list(reviewscrrating.num = c(0, 1.1, 2.1, 2.6,
3.1, 3.6, 4.1, 4.6, 5.1)))
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
## Warning: Chi-squared approximation may be incorrect
res.chars
## G0 G1
## All 9136 247
## Matched 634 148
## Unmatched 8502 99
##
##
## Multivariate Imbalance Measure: L1=0.596
## Percentage of local common support: LCS=33.5%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrrating.num -8.882e-16 (diff) 5.551e-17 0 0 0 0 0
## reviewscrreviews -4.189e+00 (diff) 5.551e-17 1 -10 6 -16 46
## pricepoint -2.220e-16 (diff) 1.249e-16 0 0 0 0 0
## reviewscrgoodforgroups 1.389e+00 (Chi2) 1.149e-16 NA NA NA NA NA
## reviewscrgoodforkids 1.781e+01 (Chi2) 8.370e-17 NA NA NA NA NA
## american -2.776e-17 (diff) 6.939e-17 0 0 0 0 0
## european 0.000e+00 (diff) 1.249e-16 0 0 0 0 0
## asian -2.776e-17 (diff) 6.939e-17 0 0 0 0 0
## bar -1.388e-17 (diff) 6.245e-17 0 0 0 0 0
## Period 1.011e+01 (Chi2) 1.006e-16 NA NA NA NA NA
## reviewscrtakeout 1.021e+01 (Chi2) 6.245e-17 NA NA NA NA NA
summary(lmer(data = temp.chars, meanRating ~ numDealsTr * (pricepoint + reviewscrreviews +
american + european + asian + bar + restZipPeriod + dealsZipPeriod) + (1 |
Phone), weights = res.chars$w))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## meanRating ~ numDealsTr * (pricepoint + reviewscrreviews + american +
## european + asian + bar + restZipPeriod + dealsZipPeriod) +
## (1 | Phone)
## Data: temp.chars
## Weights: res.chars$w
##
## REML criterion at convergence: Inf
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -14.5 0.0 0.0 0.0 14.4
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.0477 0.218
## Residual 0.0477 0.218
## Number of obs: 9383, groups: Phone, 1261
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.54e+00 7.56e-02 46.8
## numDealsTr -6.66e-01 1.48e-01 -4.5
## pricepoint -1.76e-01 3.88e-02 -4.5
## reviewscrreviews 1.91e-03 2.13e-04 9.0
## american 3.48e-01 5.78e-02 6.0
## european 1.72e-01 5.72e-02 3.0
## asian -3.37e-01 4.97e-02 -6.8
## bar -3.44e-01 6.52e-02 -5.3
## restZipPeriod -4.86e-05 2.01e-04 -0.2
## dealsZipPeriod -4.58e-02 4.99e-03 -9.2
## numDealsTr:pricepoint 4.12e-01 7.12e-02 5.8
## numDealsTr:reviewscrreviews -1.24e-03 3.42e-04 -3.6
## numDealsTr:american -3.55e-01 8.14e-02 -4.4
## numDealsTr:european -5.38e-01 8.96e-02 -6.0
## numDealsTr:asian -1.09e-01 8.89e-02 -1.2
## numDealsTr:bar 3.96e-01 1.01e-01 3.9
## numDealsTr:restZipPeriod -1.38e-03 3.69e-04 -3.7
## numDealsTr:dealsZipPeriod 8.05e-02 1.12e-02 7.2
##
## Correlation of Fixed Effects:
## (Intr) nmDlsT prcpnt rvwscr amercn europn asian bar rstZpP
## numDealsTr -0.382
## pricepoint -0.738 0.296
## revwscrrvws -0.066 0.020 -0.296
## american 0.189 -0.062 -0.276 0.005
## european 0.036 0.035 -0.237 0.094 0.336
## asian 0.128 -0.033 -0.308 0.097 0.278 0.256
## bar -0.022 0.001 -0.037 0.050 -0.464 -0.237 0.020
## restZipPerd -0.480 0.144 -0.008 0.012 -0.164 0.016 -0.093 0.053
## dealsZipPrd -0.100 0.051 0.066 -0.065 0.035 0.000 0.019 0.001 -0.267
## nmDlsTr:prc 0.307 -0.751 -0.396 0.104 0.128 0.033 0.098 -0.004 0.004
## nmDlsTr:rvw 0.033 -0.150 0.118 -0.328 -0.073 -0.032 -0.044 0.001 -0.018
## nmDlsTr:mrc -0.072 0.151 0.137 -0.058 -0.358 -0.115 -0.117 0.169 0.062
## nmDlsTr:rpn 0.039 -0.086 0.044 -0.028 -0.116 -0.350 -0.080 0.079 -0.026
## numDlsTr:sn -0.025 0.008 0.089 -0.036 -0.095 -0.069 -0.324 -0.002 0.029
## numDlsTr:br -0.001 0.009 0.007 -0.019 0.162 0.075 -0.004 -0.368 0.006
## nmDlsTr:rZP 0.146 -0.371 0.014 -0.009 0.022 -0.035 0.022 0.021 -0.339
## nmDlsTr:dZP 0.056 -0.209 -0.044 0.014 0.019 0.010 -0.010 -0.029 0.120
## dlsZpP nmDlsTr:p nmDlsTr:rv nmDlsTr:m nmDlsTr:rp nmDlsTr:s
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## restZipPerd
## dealsZipPrd
## nmDlsTr:prc -0.024
## nmDlsTr:rvw 0.001 -0.312
## nmDlsTr:mrc -0.038 -0.314 0.156
## nmDlsTr:rpn -0.026 -0.177 0.200 0.354
## numDlsTr:sn -0.035 -0.203 0.134 0.301 0.248
## numDlsTr:br -0.002 -0.018 0.070 -0.319 -0.212 0.031
## nmDlsTr:rZP 0.149 -0.061 0.162 -0.129 0.105 -0.065
## nmDlsTr:dZP -0.439 0.158 -0.089 -0.053 -0.029 0.070
## nmDlsTr:b nmDlsTr:rZP
## numDealsTr
## pricepoint
## revwscrrvws
## american
## european
## asian
## bar
## restZipPerd
## dealsZipPrd
## nmDlsTr:prc
## nmDlsTr:rvw
## nmDlsTr:mrc
## nmDlsTr:rpn
## numDlsTr:sn
## numDlsTr:br
## nmDlsTr:rZP -0.082
## nmDlsTr:dZP 0.034 -0.365
imbalance(group = temp.chars$numDealsTr, data = temp.chars, drop = c("Phone",
"meanRating", "numReviews", "reviewscrrating.num", "inyipit", "reviewscrwaiter",
"zipgrp", "restZipPeriod", "dealsZipPeriod", "yipit.appearances", "reviewscralcohol",
"reviewscroutdorseating", "reviewscrnoiselevel", "reviewscrwheelchair",
"reviewscrdelivery"))
##
## Multivariate Imbalance Measure: L1=1.000
## Percentage of local common support: LCS=0.0%
##
## Univariate Imbalance Measures:
##
## statistic type L1 min 25% 50% 75% max
## reviewscrreviews -18.53434 (diff) 0.00000 -1 -27 -5 -16 586
## numDealsTr -1.00000 (diff) 1.00000 -1 -1 -1 -1 -1
## pricepoint -0.15432 (diff) 0.14891 0 -1 0 0 0
## reviewscrgoodforgroups 13.39442 (Chi2) 0.09691 NA NA NA NA NA
## reviewscrgoodforkids 1.35001 (Chi2) 0.02600 NA NA NA NA NA
## american 0.02897 (diff) 0.02897 0 0 0 1 0
## european -0.03190 (diff) 0.03190 0 0 0 0 0
## asian -0.04799 (diff) 0.04799 0 0 0 0 0
## bar 0.01353 (diff) 0.01353 0 0 0 0 0
## Period 49.86355 (Chi2) 0.11039 NA NA NA NA NA
## reviewscrtakeout 2.29667 (Chi2) 0.03070 NA NA NA NA NA
names(temp.chars)
## [1] "meanRating" "numReviews"
## [3] "reviewscrrating.num" "reviewscrreviews"
## [5] "inyipit" "numDealsTr"
## [7] "pricepoint" "reviewscrgoodforgroups"
## [9] "reviewscrgoodforkids" "reviewscrwaiter"
## [11] "american" "european"
## [13] "asian" "bar"
## [15] "Period" "Phone"
## [17] "zipgrp" "restZipPeriod"
## [19] "dealsZipPeriod" "yipit.appearances"
## [21] "reviewscralcohol" "reviewscrnoiselevel"
## [23] "reviewscroutdorseating" "reviewscrwheelchair"
## [25] "reviewscrdelivery" "reviewscrtakeout"
Run with all variables do the CEM.relax function fix the categorical variables use the ggplots of the density to find out if they are well-balanced or not
vars <- c("numDealsTr", "Period", "reviewscrrating.num", "reviewscrreviews",
"pricepoint", "Phone")
df <- restlist_melt_rating[vars]
imbalance(group = numDealsTr, data = , drop = "Phone")
## Error: argument "data" is missing, with no default
head(na.omit(restlist_melt_rating[vars]))
## numDealsTr Period reviewscrrating.num reviewscrreviews pricepoint
## 2 FALSE 1 2.0 5 1
## 4 FALSE 1 4.0 30 1
## 5 FALSE 1 4.5 3 1
## 6 FALSE 1 3.5 709 3
## 8 FALSE 1 4.0 49 1
## 9 FALSE 1 3.5 43 2
## Phone
## 2 2023470771
## 4 2024080417
## 5 2023478790
## 6 2026823123
## 8 2026567287
## 9 2027892289
ggplot(data=restlist_melt_rating,aes(x=reviewscrgoodforkids,y = ..density.., fill=numDealsTr)) + geom_histogram(position=“dodge”) ggplot(data=restlist_melt_rating,aes(x=reviewscrgoodforkids,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”) ggplot(data=restlist_melt_rating,aes(x=reviewscrgoodforkids,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”) + facet_wrap(~ Period, nrow=3) ggplot(data=subset(restlist_melt_rating, reviewscrgoodforkids !=0),aes(x=reviewscrgoodforkids,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”) + facet_wrap(~ Period, nrow=3) ggplot(data=subset(restlist_melt_rating, reviewscrgoodforkids !=0),aes(x=reviewscrgoodforkids,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”) ggplot(data=restlist_melt_rating,aes(x=reviewscrgoodforgroups,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”) + facet_wrap(~ Period, nrow=3) ggplot(data=restlist_melt_rating,aes(x=reviewscrgoodforgroups,y = ..density.., fill=numDealsTr, group=numDealsTr)) + geom_histogram(position=“dodge”)