This report aims to report on the Groupon effect found by Byers, Mitzenmacher, Zervas (2012). In the article, the authors found a discounuity at the start-date of the deal on the numerical ratings. The suggestion was to observe whether this discontinuity existed in our data set.
I proceed to reshape our data set to have numerical offsets and the averages of the restaurants with a deal.
The following steps are performed:
Now we can plot the data as BWZ do. They actually have two graphs. The first graph shows the arrival of the reviews before and after the deal.
# *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = length)) + geom_bar(stat = "identity") +
facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red") + xlab("Offset (in days)") +
ylab("Number of new reviews")
## 100 offset
ggplot(data = subset(reviews.inyipit.summary, offset > -100 & offset < 100),
aes(x = offset, y = length)) + geom_bar(stat = "identity") + facet_grid(. ~
after, scales = "free") + geom_smooth(colour = "red") + xlab("Offset (in days)") +
ylab("Number of new reviews")
Note that there is indeed an increase in the number or reviews, BMZ term these reviews as “Groupon Reviews” becuase they contain the word 'Groupon' in them. We have not explored this yet. However, unlike their graph, there seem to be a sharp decrease in the number of reviews of these restaurants after the deal.
# Now let's find the Groupon effect *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = mean)) + geom_point() +
facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red", span = 1.5) +
xlab("Offset (in days)") + ylab("Mean rating of all Groupon restaurants") +
ggtitle("Groupon-Effect Graph 2: Mean ratings change")
## 100 offset
ggplot(data = subset(reviews.inyipit.summary, offset > -100 & offset < 100),
aes(x = offset, y = mean)) + geom_point() + facet_grid(. ~ after, scales = "free") +
geom_smooth(colour = "red", span = 1.5) + xlab("Offset (in days)") + ylab("Mean rating of all Groupon restaurants") +
ggtitle("Groupon-Effect Graph 2: Mean ratings change")
Note that the discontinuity is fairly small and probably not significant in comparisson to the BMZ graph.
Perhaps we are unable to see the Groupon effect because we need to smooth over a longer period. I tried smoothing over 5 days
The arrival of reviews now looks like this:
# *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset.bin, y = length)) + geom_bar(stat = "identity") +
facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red")
The mean ratings change plot now looks like this:
# Now let's find the Groupon effect *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset.bin, y = mean)) + geom_point() +
facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red")
Now let's observe how creating quantiles for the number of reviews changes the plots.
The plots for the number of reviews and mean changes now look like this:
# plot for quantiles for number of reviews and mean change with smoother
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = length)) + geom_bar(stat = "identity") +
facet_grid(numreviews.group ~ after, scales = "free", space = "free_y") +
geom_smooth(colour = "red") + ggtitle("number of reviews in quantiles of number of reviews")
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = mean)) + geom_bar(stat = "identity") +
facet_grid(numreviews.group ~ after, scales = "free", space = "free_y") +
geom_smooth(colour = "red") + ggtitle("mean change rating in quantiles of number reviews")
# Next step: new plot to show the histogram in 90 days to see if the trend
# is more over a long time
We need to correct the analysis because the data collection of reviews ends possibly before the offsets we have tried to far. We therefore examine the dates we have in our data set.
## The last deal start date is:
as.Date(max(restlist_temp$yipit.first.date.added), origin = "1970-01-01")
## [1] "2012-06-22"
## [1] '2012-06-22'
## The last restaurant review date is:
as.Date(max(reviews.inyipit$numdate), origin = "1970-01-01")
## [1] "2012-10-11"
# [1] '2012-10-11'
max(reviews.inyipit$numdate) - max(restlist_temp$yipit.first.date.added)
## [1] 111
# MAX offset should be around 100s as.Date(15513, origin = '1970-01-01')
Let's compare the Groupon effect graphs with a control, a group of restaurants that do not go into a deal. The offset is then selected arbitrarily to be median of the deal data period, which is “2012-03-27”
On the top facet, (NO-DEAL), you can see how reviews arrive on our data set before and after the arbitrary offset. On the bottom facet, (DEAL), you can see how reviews to our data before and after the deal. You can observe the spike (discontinuity) initially in the deal.
# *keep: facet for inyipit and colour for after the deal
ggplot(data = reviews.all.summary, aes(x = offset, y = length, color = after)) +
geom_bar(stat = "identity", position = "identity", alpha = 0.7) + facet_grid(inyipit ~
., scales = "free", space = "free_x") + geom_smooth() + xlab("Offset (in days)") +
ylab("Number of new reviews") + opts(panel.margin = rep(unit(0, "null"),
1)) + labs(title = "Number of reviews by offset. Restaurants without deals on top")
In the top facet, (NO-DEAL), there is no apparent discontinuity in the rating. However, in the bottom facet, (DEAL), you can see a negative discontinuity in the rating.
ggplot(data = reviews.all.summary, aes(x = offset, y = mean, color = after)) +
geom_point(stat = "identity", position = "identity", alpha = 0.7) + facet_grid(inyipit ~
., scales = "free", space = "free_x") + geom_smooth() + xlab("Offset (in days)") +
ylab("Mean rating of all Groupon restaurants") + opts(panel.margin = rep(unit(0,
"null"), 1)) + labs(title = "Ratings by offset. Restaurants without deals on top.")
The role of competition is unexplored. Let's first explore how competition can be measured in our data set, and start exploring how we can visualize it.
Location effects are going to be part of competition. In its most basic form, co-located restaurants are competing with each other. The first way we can explore location is by neighborhood.
Our data has neighborhood data, so the first thing we can do is observe whether the effects are more obvious in some neighborhoods than in others.
Two forms of basic forms of competition can be measured:
Note that Adams Morgan, Dupont, Georgetown lead in number of restraurants and number of restaurant with deals
ggplot(data = nbhd.count) + geom_bar(aes(x = nbhd, y = all.count), fill = "red",
alpha = 0.7) + geom_bar(aes(x = nbhd, y = inyipit.count), fill = "blue",
alpha = 0.7) + theme(text = element_text(size = 20), axis.text.x = element_text(angle = 90,
vjust = 1)) + labs(title = "Restaurants by neighborhood (Blue for DEAL)")
Group the restaurants in quantiles:
Keep in mind a large number of restaurants don't have a neighborhood, so NA is also shown.
ggplot(data = subset(reviews.all.summary.1, inyipit == TRUE), aes(x = offset,
y = length, color = after, fill = after)) + geom_bar(stat = "identity",
position = "identity", alpha = 0.5) + facet_grid(all.count.grp ~ ., scales = "free") +
geom_smooth() + xlab("Offset (in days)") + ylab("Number of new reviews") +
opts(panel.margin = rep(unit(0, "null"), 1)) + labs(title = "Num Reviews by # restaurants in neighborhood") +
xlim(-100, 100)
ggplot(data = subset(reviews.all.summary.2, inyipit == TRUE), aes(x = offset,
y = length, color = after, fill = after)) + geom_bar(stat = "identity",
position = "identity", alpha = 0.5) + facet_grid(inyipit.count.grp ~ .,
scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Number of new reviews") +
opts(panel.margin = rep(unit(0, "null"), 1)) + labs(title = "Num Reviews by # restaurant with DEALS") +
xlim(-100, 100)
Now let's explore the effect on the rating:
ggplot(data = subset(reviews.all.summary.1, inyipit == TRUE), aes(x = offset,
y = mean, color = after, fill = after)) + geom_point() + facet_grid(all.count.grp ~
., scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Ratings") +
labs(title = "Rating by # restaurants in neighborhood") + xlim(-100, 100)
ggplot(data = subset(reviews.all.summary.2, inyipit == TRUE), aes(x = offset,
y = mean, color = after, fill = after)) + geom_point() + facet_grid(inyipit.count.grp ~
., scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Ratings") +
labs(title = "Rating by # restaurant with DEALS") + xlim(-100, 100)
Idea: Observe how rating/numreviews of NODEAL restaurants are affected by DEAL restaurants around them.
Dependent variables:
Pools of data for the dependent variable:
Independent variables:
Timeframe: 6 months of deal data set. The range is for deal start and for deal end:
as.Date(range(yipitall$Date.Added.Num), origin = "1970-01-01")
## [1] "2011-12-16" "2012-06-29"
as.Date(range(yipitall$Date.Ended.Num), origin = "1970-01-01")
## [1] "2012-01-01" "2012-06-30"
The histogram for date added:
qplot(data = yipitall, Date.Added.Num, geom = "histogram", binwidth = 1)
ggplot(yipitall, aes(x = Date.Added.Num)) + geom_histogram(binwidth = 1, aes(fill = Date.Ended.Num))
## Error: 'x' and 'units' must have length > 0
DVs = rating/reviews IVs = number of deals in time period
Time period is 1, 2 , or 4 weeks
## 1. Restrict yipitall to contain only restaurants that have restaurant
## reviews
yipitall.inyelp <- yipitall
# restlist.inyipit has the restlist of restaurants that are in yipit
yipitall.inyelp$inyelp <- (yipitall$Phone %in% restlist.inyipit$restphones)
yipitall.inyelp <- (yipitall.inyelp[yipitall.inyelp$inyelp == TRUE, ]) #317 deals with
summarise(yipitall.inyelp, count = n_distinct(Phone))
## count
## 1 189
length(unique(yipitall.inyelp$Phone)) #204 unique restaurants in the deals set
## [1] 189
# 317-204 = 113 deals with repeated restaurants
## 2. Create Time Periods of 2, 4 weeks for both the start and end date of
## the deal
periodLength <- 2 #meaning 2 weeks
# min time (earliest deal date)
minDate <- min(min(yipitall.inyelp$Date.Added.Num), min(yipitall.inyelp$Date.Ended.Num))
yipitall.inyelp$Date.Added.Per <- yipitall.inyelp$Date.Added.Num - minDate +
1
yipitall.inyelp$Date.Added.Per <- ceiling(yipitall.inyelp$Date.Added.Per/(7 *
periodLength))
yipitall.inyelp$Date.Ended.Per <- yipitall.inyelp$Date.Ended.Num - minDate +
1
yipitall.inyelp$Date.Ended.Per <- ceiling(yipitall.inyelp$Date.Ended.Per/(7 *
periodLength))
# Let's look at deal duration by Deal Company
yipitall.inyelp$Date.Duration <- yipitall.inyelp$Date.Ended.Num - yipitall.inyelp$Date.Added.Num
yipitall.inyelp$Date.Duration.Weeks <- yipitall.inyelp$Date.Duration/7
qplot(data = yipitall.inyelp, y = Date.Duration.Weeks, x = Site, geom = "boxplot") +
ylim(0, 4)
# let's summarize this
summarise(group_by(yipitall.inyelp, Site), duration = mean(Date.Duration))
## Source: local data frame [22 x 2]
##
## Site duration
## 1 Amazon Local 3.000
## 2 Daily Candy 14.000
## 3 Deal Chicken 4.778
## 4 DealFind 2.333
## 5 Eversave 2.000
## 6 Gilt City 10.917
## 7 Google Offers 5.727
## 8 Google Offers Partners 3.733
## 9 Groupon 3.571
## 10 HomeRun 3.500
## 11 LivingSocial 4.649
## 12 OpenTable 2.000
## 13 Recoup 6.667
## 14 Rue La La 6.000
## 15 SalesVote 6.444
## 16 Savored 10.250
## 17 Scoutmob 3.000
## 18 Signpost 10.000
## 19 Specialicious 4.500
## 20 The Capitol Deal 4.867
## 21 Travelzoo Local Deals 8.400
## 22 kgb deals 10.500
## 3. Let's now create the matrix of deals in time periods
# how many periods do we have
maxPeriods <- max(max(yipitall.inyelp$Date.Added.Per), max(yipitall.inyelp$Date.Ended.Per))
# now let's create the matrix
matrixPeriods <- matrix(0, nrow(yipitall.inyelp), maxPeriods)
for (i in 1:nrow(yipitall.inyelp)) {
matrixPeriods[i, ] <- 1:maxPeriods %in% yipitall.inyelp$Date.Added.Per[i]:yipitall.inyelp$Date.Ended.Per[i]
}
yipitall.inyelp <- (cbind(yipitall.inyelp, matrixPeriods))
# in theory melt from Reshape can help us get the data in a form we like
yipitall.melt <- yipitall.inyelp[, c("Phone", "Site", paste0(1:maxPeriods))]
yipitall.melt <- (melt(yipitall.melt, id = c("Phone", "Site")))
colnames(yipitall.melt)[3] <- "Period"
yipitall.melt <- group_by(yipitall.melt, Phone, Period)
# yipitall.melt <- subset(x=yipitall.melt, subset=value>0) #This would erase
# the 0s
periodDeals <- summarise(group_by(yipitall.melt, Phone, Period), numDeals = sum(value))
qplot(data = periodDeals, x = Period, y = numDeals, group = factor(Phone), color = factor(Phone),
geom = "line")
# unique restaurants of phone numbers yipitRestUnique =
# unique(periodDeals$Phone) yipitRestSample <- sort(sample(yipitRestUnique,
# size=20, replace=FALSE))
periodDeals$numDeals[periodDeals$numDeals == 0] <- NA # the NAs are 0
periodDeals$numDealsJitter <- periodDeals$numDeals + rnorm(n = nrow(periodDeals),
mean = 0, sd = 0.1)
ggplot(data = periodDeals, aes(x = Period, y = numDealsJitter, group = factor(Phone),
color = factor(Phone))) + geom_line() + geom_point(size = 2)
## 4. Now let's get those depedent variables Objective is to have a time
## series of the new reviews and the avg. rating in the time period.
# copy reviews.inyipit since we will be modifying it
reviews.inyipit.subset <- reviews.inyipit
# a. Truncate reviews to include only the restarurants that do deals.
length(unique(periodDeals$Phone)) #189
## [1] 189
length(unique(reviews.inyipit$restphones)) #186
## [1] 186
# We have reviews for 186 of the 189 deal restaurants, so we need to take
# out the periodDeal rows for the restaurants we don't have reviews for.
periodDeals <- (subset(periodDeals, Phone %in% reviews.inyipit$restphones))
length(unique(periodDeals$Phone)) #is now 186
## [1] 186
# b. Create time period column Let's do the times in period times.
reviews.inyipit.subset$reviewsdatesNum <- as.numeric(as.Date(reviews.inyipit.subset$reviewsdates,
"%m/%d/%Y"))
reviews.inyipit.subset$reviewdatesPer <- reviews.inyipit.subset$reviewsdatesNum -
minDate + 1
reviews.inyipit.subset$reviewdatesPer <- ceiling(reviews.inyipit.subset$reviewdatesPer/(7 *
periodLength))
# 1. Save the reviews.inyipit.subset into another variable so that we can
# compute before deal summaries
reviews.temp <- reviews.inyipit.subset #before we remove the reviews before/after the deal period
# c. Discard the negative periods and the ones over our yipit dataset
reviews.inyipit.subset <- subset(reviews.inyipit.subset, reviewdatesPer > 0)
reviews.inyipit.subset <- subset(reviews.inyipit.subset, reviewdatesPer <= maxPeriods)
# d. Use group by to summarize
reviews.inyipit.subset <- group_by(reviews.inyipit.subset, restphones, reviewdatesPer)
reviews.inyipit.subset.summary <- summarize(reviews.inyipit.subset, meanRating = mean(reviewsrating),
numReviews = n(), sdRating = sd(reviewsrating))
names(reviews.inyipit.subset.summary)[1:2] <- c("Phone", "Period")
# e. Merge with other deals table
periodDeals$numDeals[is.na(periodDeals$numDeals)] <- 0 #go back to adding the zeros
# HERE if u need to redo PeriodDealRating merging with restlist.period
periodDealRating <- (merge(periodDeals, reviews.inyipit.subset.summary, by = c("Phone",
"Period"), all = TRUE))
# Model1:
periodDealRating$Period <- as.numeric(as.character(periodDealRating$Period))
# explore: time relationships mean rating
tmp <- summarize(group_by(periodDealRating, Period), meanRating = mean(meanRating,
na.rm = TRUE))
ggplot(data = periodDealRating, aes(x = Period, y = meanRating)) + geom_line(aes(group = factor(Phone),
color = factor(Phone))) + geom_line(data = tmp, aes(x = Period, y = meanRating),
color = "black")
## Warning: Removed 415 rows containing missing values (geom_path).
# numReviews
tmp2 <- summarize(group_by(periodDealRating, Period), numReviews = mean(numReviews,
na.rm = TRUE))
ggplot(data = periodDealRating, aes(x = Period, y = numReviews)) + geom_line(aes(group = factor(Phone),
color = factor(Phone))) + geom_line(data = tmp2, aes(x = Period, y = numReviews),
color = "black")
## Warning: Removed 415 rows containing missing values (geom_path).
# stDev
tmp3 <- summarize(group_by(periodDealRating, Period), sdRating = mean(sdRating,
na.rm = TRUE))
ggplot(data = periodDealRating, aes(x = Period, y = sdRating)) + geom_line(aes(group = factor(Phone),
color = factor(Phone))) + geom_line(data = tmp3, aes(x = Period, y = sdRating),
color = "black")
## Warning: Removed 709 rows containing missing values (geom_path).
# meanRating and numDeals pdf('Figure: Boxplot of meanRating vs
# numDeals.pdf')
ggplot(data = periodDealRating, aes(x = numDeals, y = meanRating, group = factor(numDeals))) +
geom_boxplot()
## Warning: Removed 1004 rows containing non-finite values (stat_boxplot).
# dev.off()
ggplot(data = periodDealRating, aes(x = numDeals, y = sdRating, group = factor(numDeals))) +
geom_boxplot()
## Warning: Removed 1748 rows containing non-finite values (stat_boxplot).
ggplot(data = periodDealRating, aes(x = numDeals, y = numReviews, group = factor(numDeals))) +
geom_boxplot()
## Warning: Removed 1004 rows containing non-finite values (stat_boxplot).
# Density helps us see the effect on numReviews from numDeals
ggplot(data = periodDealRating, aes(x = numReviews, color = numDeals, group = numDeals)) +
geom_density(adjust = 3)
## Warning: Removed 894 rows containing non-finite values (stat_density).
## Warning: Removed 94 rows containing non-finite values (stat_density).
## Warning: Removed 15 rows containing non-finite values (stat_density).
## Warning: Removed 1 rows containing non-finite values (stat_density).
# now let's fit the model most basic model
periodDealRating$numReviews[is.na(periodDealRating$numReviews)] <- 0
lmer(data = periodDealRating, meanRating ~ numDeals + (1 | Phone))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ numDeals + (1 | Phone)
## Data: periodDealRating
## REML criterion at convergence: 5065
## Random effects:
## Groups Name Std.Dev.
## Phone (Intercept) 0.540
## Residual 0.928
## Number of obs: 1786, groups: Phone, 182
## Fixed Effects:
## (Intercept) numDeals
## 3.3025 -0.0582
lmer(data = periodDealRating, numReviews ~ numDeals + (1 | Phone))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ numDeals + (1 | Phone)
## Data: periodDealRating
## REML criterion at convergence: 10772
## Random effects:
## Groups Name Std.Dev.
## Phone (Intercept) 1.95
## Residual 1.49
## Number of obs: 2790, groups: Phone, 186
## Fixed Effects:
## (Intercept) numDeals
## 1.682 0.181
# Add controls: Starting Rating, Starting num reviews,
# Baseline summary is the summary before the deals get started
baseline.summary <- summarize(group_by(subset(reviews.temp, reviewsdatesNum <
minDate), restphones), bRating = mean(reviewsrating), bNumReviews = n())
names(baseline.summary)[1] <- "Phone"
# Expand periodDealRating with baseline numReviews and baseline Rating
periodDealRating <- (merge(periodDealRating, baseline.summary, by = "Phone",
all = TRUE))
# add 0s for num reviews NAs
periodDealRating$bNumReviews[is.na(periodDealRating$bNumReviews)] <- 0
# Now let's fit a model where we control for the starting Rating and
# Starting num Reviews
qplot(data = baseline.summary, x = bRating, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(data = baseline.summary, x = bNumReviews, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(data = periodDealRating, x = numDeals, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
table(periodDealRating$numDeals)
##
## 0 1 2 3 6
## 2427 321 36 5 1
summary(lmer(data = periodDealRating, meanRating ~ +factor(numDeals) + bRating +
bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ +factor(numDeals) + bRating + bNumReviews + (1 | Phone)
## Data: periodDealRating
##
## REML criterion at convergence: 4788
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.109 0.331
## Residual 0.863 0.929
## Number of obs: 1722, groups: Phone, 170
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.289939 0.268132 1.08
## factor(numDeals)1 -0.081799 0.069560 -1.18
## factor(numDeals)2 -0.517718 0.215857 -2.40
## factor(numDeals)3 1.316826 0.489018 2.69
## factor(numDeals)6 0.032589 0.965481 0.03
## bRating 0.872917 0.079695 10.95
## bNumReviews 0.000529 0.000251 2.10
##
## Correlation of Fixed Effects:
## (Intr) fc(D)1 fc(D)2 fc(D)3 fc(D)6 bRatng
## fctr(nmDl)1 -0.040
## fctr(nmDl)2 -0.035 0.041
## fctr(nmDl)3 -0.053 0.023 0.020
## fctr(nmDl)6 0.040 0.005 0.001 -0.001
## bRating -0.984 0.008 0.025 0.046 -0.044
## bNumReviews 0.024 -0.014 -0.007 0.009 0.015 -0.140
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ +factor(numDeals) +
bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ +factor(numDeals) + bRating + bNumReviews + (1 | Phone)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4780
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.110 0.331
## Residual 0.864 0.929
## Number of obs: 1717, groups: Phone, 170
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.279958 0.268824 1.04
## factor(numDeals)1 -0.081297 0.069602 -1.17
## factor(numDeals)2 -0.515471 0.216009 -2.39
## bRating 0.875876 0.079902 10.96
## bNumReviews 0.000528 0.000252 2.10
##
## Correlation of Fixed Effects:
## (Intr) fc(D)1 fc(D)2 bRatng
## fctr(nmDl)1 -0.040
## fctr(nmDl)2 -0.036 0.041
## bRating -0.984 0.009 0.026
## bNumReviews 0.025 -0.014 -0.007 -0.140
summary(lmer(data = subset(periodDealRating, numDeals <= 2), numReviews ~ +factor(numDeals) +
bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ +factor(numDeals) + bRating + bNumReviews + (1 | Phone)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 9871
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 1.86 1.36
## Residual 2.21 1.49
## Number of obs: 2589, groups: Phone, 173
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.268398 0.786449 -0.34
## factor(numDeals)1 0.156309 0.094649 1.65
## factor(numDeals)2 0.424482 0.275975 1.54
## bRating 0.236840 0.234539 1.01
## bNumReviews 0.011092 0.000837 13.25
##
## Correlation of Fixed Effects:
## (Intr) fc(D)1 fc(D)2 bRatng
## fctr(nmDl)1 -0.020
## fctr(nmDl)2 -0.020 0.037
## bRating -0.984 0.007 0.016
## bNumReviews 0.018 -0.010 0.000 -0.135
Let's add the price point, the neighborhood, and the previous yipit apperances (historical)
# 5. Now let's add restaurant characteristics.
# 5.0 Make sure we have the right rows and right columns
restlist.periods <- restlist.inyipit
sum((periodDealRating$Phone) %in% (restlist.periods$restphones)) #all in periodDealRating Covered
## [1] 2790
dim(restlist.periods) #is 189 but should be 186
## [1] 189 49
restlist.periods <- (subset(restlist.periods, restlist.periods$restphones %in%
periodDealRating$Phone))
dim(restlist.periods) #is now 186
## [1] 186 49
# select the columns I want to merge
restlist.periods <- (select(restlist.periods, restphones, restaddresses, restneighborhoods,
yipit.appearances, pricepoint, url.dummy))
# change the name of phone
names(restlist.periods)[1] <- "Phone"
Let's add neighborhood variables. First, let's visualize how this data looks. First, we need to geocode the addresses of the restaurants
Plot neighborhood information:
q <- qmap("Washington, DC", zoom = 13)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Washington,+DC&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Washington,+DC&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
q + geom_point(data = restlist.periods, aes(x = lon, y = lat, color = factor(restneighborhoods),
size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))
## Warning: Removed 14 rows containing missing values (geom_point).
# It appears we should fix our NAs.
dim(restlist.periods[is.na(restlist.periods$restneighborhoods), c("restaddresses",
"restneighborhoods")])[1]/dim(restlist.periods)[1]
## [1] 0.3387
# 33% don't have a neighborhood. Let's fix that!
## Now let's plot the ones that are still missing.
dfna <- restlist.periods[is.na(restlist.periods$restneighborhoods), ]
q + geom_text(data = dfna, aes(x = lon, y = lat), label = 1:dim(dfna)[1], color = "red")
## Warning: Removed 2 rows containing missing values (geom_text).
# you can see how the ones missing are in border areas, so it makes sense
# that they are NAs
grestlist.periods <- group_by(restlist.periods, restneighborhoods)
summarize(grestlist.periods, count = n())
## Source: local data frame [24 x 2]
##
## restneighborhoods count
## 1 Adams Morgan 17
## 2 Capitol Hill/Northeast 3
## 3 Capitol Hill/Southeast 7
## 4 Chevy Chase 6
## 5 Chinatown 6
## 6 Cleveland Park 1
## 7 Columbia Heights 4
## 8 Dupont Circle 17
## 9 Federal Triangle 2
## 10 Foggy Bottom 2
## 11 Georgetown 12
## 12 Glover Park 1
## 13 H Street Corridor/Atlas District/Near Northeast 7
## 14 Lincoln Park 1
## 15 Logan Circle 2
## 16 Mount Pleasant 1
## 17 Park View 2
## 18 Penn Quarter 3
## 19 Shaw 4
## 20 Tenleytown 6
## 21 U Street Corridor 10
## 22 Van Ness/Forest Hills 3
## 23 Woodley Park 6
## 24 NA 63
Let's fix the neighborhoods: Foggy Bottom and Penn Quarter
# 5.1 Fix neighborhoods option 1: draw lines and whomever is inside is in
# that neighborhood Foggy Bottom first
# Plot Foggy Bottom ones
subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & (lon >= -77.052573) &
(lon <= -77.036651))
## Phone restaddresses
## 2570 2022962333 2157 P St NW, Washington, DC 20037
## 2598 2022931425 1915 I St NW, Washington, DC 20006
## 1246 2022932765 1924 I St NW, Washington, DC 20006
## 1944 2029730404 2001 L St NW, Washington, DC 20036
## 1353 2025305500 1915 I St NW, Washington, DC 20006
## 2007 2022937760 1725 Desales St NW, Washington, DC 20036
## 2130 2023317574 1132 19th St NW, Washington, DC 20036
## 1377 2025586188 2153 P St NW, Washington, DC 20037
## 2038 2029559494 1909 K St NW, Washington, DC 20006
## 2024 2027851110 1825 M St NW, Washington, DC 20036
## 1972 2028728700 2030 M St NW, Washington, DC 20036
## 2087 2027755660 1810 K St NW, Washington, DC 20006
## 2155 2022231818 1103 19th St NW, Washington, DC 20036
## 2163 2023313232 919 19th St NW, Washington, DC 20006
## 2192 2023315800 1900 I St NW, Washington, DC 20006
## 2255 2024571111 1815 M St NW, Washington, DC 20036
## 4953 2022233600 2013 I St NW, Washington, DC 20006
## 2676 2022967700 2000 Pennsylvania Ave NW, Washington, DC 20006
## 1851 2022960125 2140 F St NW, Washington, DC 20037
## 1982 2022988488 1716 I St NW, Washington, DC 20006
## 2015 2026592696 1720 I St NW, Washington, DC 20050
## 2021 2024291701 1020 19th St NW, Washington, DC 20036
## 2123 2022932057 2100 Pennsylvania Ave NW, Washington, DC 20037
## 2128 2029744260 836 17th St NW, Washington, DC 20006
## 2758 2023272252 1120 19th St NW, Washington, DC 20036
## restneighborhoods yipit.appearances pricepoint url.dummy lon lat
## 2570 <NA> 1 1 TRUE -77.05 38.90
## 2598 <NA> 3 1 FALSE -77.04 38.90
## 1246 <NA> 1 2 TRUE -77.04 38.90
## 1944 <NA> 12 1 TRUE -77.05 38.90
## 1353 <NA> 4 3 TRUE -77.04 38.90
## 2007 <NA> 14 3 TRUE -77.04 38.90
## 2130 <NA> 1 2 TRUE -77.04 38.90
## 1377 <NA> 4 2 TRUE -77.05 38.90
## 2038 <NA> 1 3 TRUE -77.04 38.90
## 2024 <NA> 2 2 TRUE -77.04 38.91
## 1972 <NA> 6 3 TRUE -77.05 38.91
## 2087 <NA> 21 2 TRUE -77.04 38.90
## 2155 <NA> 5 2 TRUE -77.04 38.90
## 2163 <NA> 4 3 TRUE -77.04 38.90
## 2192 <NA> 14 2 TRUE -77.04 38.90
## 2255 <NA> 1 1 TRUE -77.04 38.91
## 4953 <NA> 10 1 FALSE -77.05 38.90
## 2676 Foggy Bottom 2 3 TRUE -77.05 38.90
## 1851 Foggy Bottom 3 1 TRUE -77.05 38.90
## 1982 <NA> 2 1 TRUE -77.04 38.90
## 2015 <NA> 9 2 TRUE -77.04 38.90
## 2021 <NA> 5 2 FALSE -77.04 38.90
## 2123 <NA> 3 1 TRUE -77.05 38.90
## 2128 <NA> 1 2 TRUE -77.04 38.90
## 2758 <NA> 2 1 TRUE -77.04 38.90
q + geom_point(data = subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) &
(lon >= -77.052573) & (lon <= -77.036651)), aes(x = lon, y = lat, color = factor(restneighborhoods),
size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))
restlist.periods$restneighborhoods[(restlist.periods$lat <= 38.906) & (restlist.periods$lat >=
38.892373) & (restlist.periods$lon >= -77.052573) & (restlist.periods$lon <=
-77.036651)] <- "Foggy Bottom"
# Penn Quarter now
subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & (lon >= -77.036652) &
(lon <= -77.01455))
## Phone restaddresses
## 490 2026823123 465 K St NW, Washington, DC 20001
## 1280 2027377275 920 14th St Nw, Washington, DC 20005
## 1296 2027892800 1401 K St NW, Washington, DC 20005
## 1352 2023794366 1155 14th St NW, Washington, DC 20005
## 1386 2023932292 1004 Vermont Ave NW, Washington, DC 20005
## 4522 2022346870 1100 P St NW, Washington, DC 20005
## 3686 2027837007 444 7th St NW, Washington, DC 20004
## 3692 2025562050 410 7th St NW, Washington, DC 20004
## 3703 2026399727 650 F St NW, Washington, DC 20004
## 3713 2026380800 701 9th St NW, Washington, DC 20001
## 3719 2022893600 707 6th St NW, Washington, DC 20001
## 3724 2023932929 781 7th St NW, Washington, DC 20001
## 3729 2022169550 1100 New York Ave, Washington, DC 20005
## 3731 2026383434 901 F St NW, Washington, DC 20004
## 3742 2023932905 701 7th St, Washington, DC 20001
## 3746 2023935444 518 10th St NW, Washington, DC 20004
## 3751 2027370101 734 11th St NW, Washington, DC 20001
## 3756 2022891001 617 H St NW, Washington, DC 20001
## 3769 2028421405 507 H St NW, Washington, DC 20001
## 3781 2026294355 915 E St NW, Washington, DC 20004
## 3790 2025062888 608 H St NW, Washington, DC 20001
## 3814 2025891504 1201 New York Ave NW, Washington, DC 20005
## 3816 2024081288 611 H St NW, Washington, DC 20001
## 3822 2024089292 817 7th St NW, Washington, DC 20001
## 3835 2023712295 1100 Pennsylvania Ave NW, Washington, DC 20004
## 3840 2026280980 555 11th St NW, Washington, DC 20004
## 3859 2022897482 1100 Pennsylvania Ave NW, Washington, DC 20004
## 3985 2026399830 1099 New York Ave NW, Washington, DC 20001
## 3994 2026379770 1108 K St NW, Washington, DC 20005
## 4147 2026829333 800 K Street NW, Washington, DC 20001
## restneighborhoods yipit.appearances pricepoint url.dummy lon lat
## 490 <NA> 2 3 TRUE -77.02 38.90
## 1280 <NA> 2 2 TRUE -77.03 38.90
## 1296 <NA> 3 3 TRUE -77.03 38.90
## 1352 <NA> 17 3 TRUE -77.03 38.91
## 1386 <NA> 21 2 FALSE -77.03 38.90
## 4522 <NA> 2 2 TRUE -77.03 38.90
## 3686 Penn Quarter 1 3 TRUE -77.02 38.90
## 3692 Penn Quarter 6 2 TRUE -77.02 38.90
## 3703 Penn Quarter 12 1 TRUE -77.02 38.90
## 3713 <NA> 1 3 TRUE -77.02 38.90
## 3719 <NA> 1 3 TRUE -77.02 38.90
## 3724 Chinatown 22 3 TRUE -77.02 38.90
## 3729 <NA> 1 3 TRUE -77.03 38.90
## 3731 <NA> 1 2 TRUE -77.02 38.90
## 3742 <NA> 6 2 TRUE -77.02 38.90
## 3746 <NA> 4 2 TRUE -77.03 38.90
## 3751 <NA> 1 3 TRUE -77.03 38.90
## 3756 Chinatown 1 2 FALSE -77.02 38.90
## 3769 Chinatown 1 2 TRUE -77.02 38.90
## 3781 <NA> 8 2 TRUE -77.02 38.90
## 3790 Chinatown 1 2 TRUE -77.02 38.90
## 3814 <NA> 1 3 TRUE -77.03 38.90
## 3816 Chinatown 1 2 TRUE -77.02 38.90
## 3822 Chinatown 11 2 TRUE -77.02 38.90
## 3835 Federal Triangle 2 1 TRUE -77.03 38.89
## 3840 <NA> 19 2 TRUE -77.03 38.90
## 3859 Federal Triangle 1 1 FALSE -77.03 38.89
## 3985 <NA> 12 3 TRUE -77.03 38.90
## 3994 <NA> 1 1 TRUE -77.03 38.90
## 4147 <NA> 3 1 FALSE -77.02 38.90
q + geom_point(data = subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) &
(lon >= -77.036652) & (lon <= -77.01455)), aes(x = lon, y = lat, color = factor(restneighborhoods),
size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))
restlist.periods$restneighborhoods[(restlist.periods$lat <= 38.906) & (restlist.periods$lat >=
38.892373) & (restlist.periods$lon >= -77.036652) & (restlist.periods$lon <=
-77.01455)] <- "Penn Quarter"
### Add yipit-per-neighborhood count
grestlist.periods <- group_by(restlist.periods, restneighborhoods)
sumtemp <- summarize(grestlist.periods, count = n())
restlist.periods$yipitNeighborhoodCount <- 0
for (i in 1:nrow(restlist.periods)) {
restlist.periods$yipitNeighborhoodCount[i] <- sumtemp$count[sumtemp$restneighborhoods ==
restlist.periods$restneighborhoods[i]][1]
}
# 5.2 Add Yelp count of restaurants by neighborhood.
grestlist_all <- group_by(restlist_all_unique, restneighborhoods)
sumtemp <- summarize(grestlist_all, count = n())
restlist.periods$yelpNeighborhoodCount <- 0
for (i in 1:nrow(restlist.periods)) {
restlist.periods$yelpNeighborhoodCount[i] <- sumtemp$count[sumtemp$restneighborhoods ==
restlist.periods$restneighborhoods[i]][1]
}
# now that our restlist.periods looks good, we merge it with
# periodDealRating
periodDealRating <- (merge(x = periodDealRating, y = restlist.periods, by = "Phone",
all = TRUE))
# Model for ratings
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4776
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.0987 0.314
## Residual 0.8602 0.927
## Number of obs: 1717, groups: Phone, 170
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.911385 0.313670 2.91
## yipit.appearances -0.016805 0.007193 -2.34
## numDeals -0.898868 0.561460 -1.60
## factor(pricepoint)2 -0.305429 0.099048 -3.08
## factor(pricepoint)3 -0.238002 0.127767 -1.86
## factor(pricepoint)4 0.041826 0.320895 0.13
## bRating 0.778135 0.083943 9.27
## bNumReviews 0.000676 0.000259 2.61
## numDeals:factor(pricepoint)2 0.409118 0.180225 2.27
## numDeals:factor(pricepoint)3 0.726391 0.215447 3.37
## numDeals:factor(pricepoint)4 0.211875 0.374038 0.57
## numDeals:bRating 0.110708 0.153149 0.72
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw nD:()2
## yipt.pprncs -0.182
## numDeals -0.259 -0.016
## fctr(prcp)2 -0.452 -0.058 0.133
## fctr(prcp)3 -0.328 -0.094 0.088 0.655
## fctr(prcp)4 -0.160 -0.238 0.059 0.270 0.219
## bRating -0.960 0.115 0.245 0.251 0.183 0.113
## bNumReviews 0.132 -0.061 0.014 -0.199 -0.312 -0.006 -0.184
## nmDls:fc()2 0.137 -0.011 -0.442 -0.277 -0.174 -0.074 -0.069 -0.010
## nmDls:fc()3 0.099 -0.016 -0.253 -0.191 -0.245 -0.058 -0.042 0.003 0.675
## nmDls:fc()4 0.093 -0.007 -0.326 -0.119 -0.088 -0.218 -0.062 -0.003 0.425
## nmDls:bRtng 0.239 0.017 -0.958 -0.064 -0.034 -0.037 -0.247 -0.016 0.195
## nD:()3 nD:()4
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## nmDls:fc()2
## nmDls:fc()3
## nmDls:fc()4 0.330
## nmDls:bRtng 0.040 0.212
# Model for number of deals
summary(lmer(data = subset(periodDealRating, numDeals <= 2), numReviews ~ +(numDeals) +
yipit.appearances + bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ +(numDeals) + yipit.appearances + bRating + bNumReviews + (1 | Phone)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 9877
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 1.87 1.37
## Residual 2.21 1.49
## Number of obs: 2589, groups: Phone, 173
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.189476 0.811270 -0.23
## numDeals 0.174294 0.079385 2.20
## yipit.appearances -0.008755 0.021597 -0.41
## bRating 0.224525 0.236861 0.95
## bNumReviews 0.011112 0.000841 13.22
##
## Correlation of Fixed Effects:
## (Intr) numDls ypt.pp bRatng
## numDeals -0.023
## yipt.pprncs -0.236 -0.016
## bRating -0.977 0.012 0.121
## bNumReviews 0.032 -0.007 -0.063 -0.141
This is a good baseline model. Note the interaction effects between numDeals and Price of the restaurant and numDeals and baseline rating
# Let's add neighborhood variables
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone) + factor(restneighborhoods)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4261
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.113 0.336
## Residual 0.852 0.923
## Number of obs: 1533, groups: Phone, 151
##
## Fixed effects:
## Estimate
## (Intercept) 0.994810
## yipit.appearances -0.019866
## numDeals -0.743955
## factor(pricepoint)2 -0.336186
## factor(pricepoint)3 -0.223962
## factor(pricepoint)4 0.131833
## bRating 0.740444
## bNumReviews 0.000799
## factor(restneighborhoods)Capitol Hill/Northeast 0.082580
## factor(restneighborhoods)Capitol Hill/Southeast 0.286942
## factor(restneighborhoods)Chevy Chase 0.143144
## factor(restneighborhoods)Cleveland Park 0.260625
## factor(restneighborhoods)Columbia Heights 0.061772
## factor(restneighborhoods)Dupont Circle -0.058277
## factor(restneighborhoods)Foggy Bottom -0.006993
## factor(restneighborhoods)Georgetown 0.040817
## factor(restneighborhoods)Glover Park -0.240572
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast 0.120371
## factor(restneighborhoods)Lincoln Park 0.294341
## factor(restneighborhoods)Logan Circle 0.127744
## factor(restneighborhoods)Mount Pleasant -0.063818
## factor(restneighborhoods)Park View 0.504486
## factor(restneighborhoods)Penn Quarter -0.008578
## factor(restneighborhoods)Shaw 0.017048
## factor(restneighborhoods)Tenleytown 0.324642
## factor(restneighborhoods)U Street Corridor 0.135683
## factor(restneighborhoods)Van Ness/Forest Hills 0.248572
## factor(restneighborhoods)Woodley Park 0.098378
## numDeals:factor(pricepoint)2 0.334358
## numDeals:factor(pricepoint)3 0.689189
## numDeals:factor(pricepoint)4 0.147056
## numDeals:bRating 0.081658
## Std. Error
## (Intercept) 0.395146
## yipit.appearances 0.008152
## numDeals 0.575307
## factor(pricepoint)2 0.122963
## factor(pricepoint)3 0.148352
## factor(pricepoint)4 0.393186
## bRating 0.101535
## bNumReviews 0.000294
## factor(restneighborhoods)Capitol Hill/Northeast 0.349102
## factor(restneighborhoods)Capitol Hill/Southeast 0.208556
## factor(restneighborhoods)Chevy Chase 0.230459
## factor(restneighborhoods)Cleveland Park 0.446505
## factor(restneighborhoods)Columbia Heights 0.278673
## factor(restneighborhoods)Dupont Circle 0.162642
## factor(restneighborhoods)Foggy Bottom 0.154332
## factor(restneighborhoods)Georgetown 0.183997
## factor(restneighborhoods)Glover Park 0.532294
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast 0.216449
## factor(restneighborhoods)Lincoln Park 0.598233
## factor(restneighborhoods)Logan Circle 0.319339
## factor(restneighborhoods)Mount Pleasant 0.471066
## factor(restneighborhoods)Park View 0.395421
## factor(restneighborhoods)Penn Quarter 0.150577
## factor(restneighborhoods)Shaw 0.378374
## factor(restneighborhoods)Tenleytown 0.248162
## factor(restneighborhoods)U Street Corridor 0.193107
## factor(restneighborhoods)Van Ness/Forest Hills 0.362898
## factor(restneighborhoods)Woodley Park 0.238282
## numDeals:factor(pricepoint)2 0.205128
## numDeals:factor(pricepoint)3 0.235840
## numDeals:factor(pricepoint)4 0.383884
## numDeals:bRating 0.156131
## t value
## (Intercept) 2.52
## yipit.appearances -2.44
## numDeals -1.29
## factor(pricepoint)2 -2.73
## factor(pricepoint)3 -1.51
## factor(pricepoint)4 0.34
## bRating 7.29
## bNumReviews 2.72
## factor(restneighborhoods)Capitol Hill/Northeast 0.24
## factor(restneighborhoods)Capitol Hill/Southeast 1.38
## factor(restneighborhoods)Chevy Chase 0.62
## factor(restneighborhoods)Cleveland Park 0.58
## factor(restneighborhoods)Columbia Heights 0.22
## factor(restneighborhoods)Dupont Circle -0.36
## factor(restneighborhoods)Foggy Bottom -0.05
## factor(restneighborhoods)Georgetown 0.22
## factor(restneighborhoods)Glover Park -0.45
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast 0.56
## factor(restneighborhoods)Lincoln Park 0.49
## factor(restneighborhoods)Logan Circle 0.40
## factor(restneighborhoods)Mount Pleasant -0.14
## factor(restneighborhoods)Park View 1.28
## factor(restneighborhoods)Penn Quarter -0.06
## factor(restneighborhoods)Shaw 0.05
## factor(restneighborhoods)Tenleytown 1.31
## factor(restneighborhoods)U Street Corridor 0.70
## factor(restneighborhoods)Van Ness/Forest Hills 0.68
## factor(restneighborhoods)Woodley Park 0.41
## numDeals:factor(pricepoint)2 1.63
## numDeals:factor(pricepoint)3 2.92
## numDeals:factor(pricepoint)4 0.38
## numDeals:bRating 0.52
##
## Correlation matrix not shown by default, as p = 32 > 20.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
Does not look too helpful. No significant terms. What if we limit the data to the big neighborhoods?
# big neigbordhoods
summary(lmer(data = subset(periodDealRating, numDeals <= 2 & restneighborhoods %in%
c("Adams Morgan", "Dupont Circle", "Foggy Bottom", "Georgetown", "Penn Quarter")),
meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals *
bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods)
## Data: subset(periodDealRating, numDeals <= 2 & restneighborhoods %in% c("Adams Morgan", "Dupont Circle", "Foggy Bottom", "Georgetown", "Penn Quarter"))
##
## REML criterion at convergence: 2744
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.100 0.316
## Residual 0.854 0.924
## Number of obs: 984, groups: Phone, 95
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.010432 0.459823 2.20
## yipit.appearances -0.007702 0.009781 -0.79
## numDeals -0.317202 0.745289 -0.43
## factor(pricepoint)2 -0.402797 0.139644 -2.88
## factor(pricepoint)3 -0.230625 0.160660 -1.44
## factor(pricepoint)4 -0.619083 0.517688 -1.20
## bRating 0.745109 0.119065 6.26
## bNumReviews 0.000854 0.000337 2.54
## factor(restneighborhoods)Dupont Circle -0.040756 0.159402 -0.26
## factor(restneighborhoods)Foggy Bottom -0.059413 0.151317 -0.39
## factor(restneighborhoods)Georgetown 0.017384 0.178911 0.10
## factor(restneighborhoods)Penn Quarter -0.063607 0.147740 -0.43
## numDeals:factor(pricepoint)2 0.314164 0.260438 1.21
## numDeals:factor(pricepoint)3 0.785105 0.275405 2.85
## numDeals:factor(pricepoint)4 0.399445 0.544052 0.73
## numDeals:bRating -0.088142 0.200154 -0.44
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw fc()DC
## yipt.pprncs -0.102
## numDeals -0.248 -0.022
## fctr(prcp)2 -0.451 -0.026 0.128
## fctr(prcp)3 -0.232 -0.058 0.073 0.658
## fctr(prcp)4 -0.209 -0.443 0.083 0.299 0.212
## bRating -0.939 0.052 0.250 0.258 0.100 0.189
## bNumReviews 0.149 -0.003 0.003 -0.190 -0.327 -0.054 -0.200
## fctr(rst)DC -0.323 -0.019 0.021 -0.071 -0.050 -0.117 0.189 -0.050
## fctr(rst)FB -0.311 -0.108 0.012 0.151 -0.036 0.102 0.107 0.047 0.538
## fctr(rstn)G -0.166 -0.056 -0.080 -0.001 -0.040 0.025 0.019 -0.026 0.458
## fctr(rst)PQ -0.275 -0.140 0.002 0.066 -0.083 0.088 0.107 -0.175 0.572
## nmDls:fc()2 0.125 0.006 -0.534 -0.256 -0.171 -0.070 -0.065 -0.004 -0.021
## nmDls:fc()3 0.086 -0.007 -0.280 -0.199 -0.236 -0.052 -0.023 0.003 -0.024
## nmDls:fc()4 0.128 0.013 -0.501 -0.122 -0.087 -0.217 -0.107 -0.001 -0.016
## nmDls:bRtng 0.232 0.019 -0.951 -0.058 -0.014 -0.065 -0.258 -0.004 -0.013
## fc()FB fct()G fc()PQ nD:()2 nD:()3 nD:()4
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## fctr(rst)DC
## fctr(rst)FB
## fctr(rstn)G 0.492
## fctr(rst)PQ 0.632 0.514
## nmDls:fc()2 -0.029 0.048 -0.023
## nmDls:fc()3 -0.038 0.032 -0.030 0.740
## nmDls:fc()4 -0.018 0.044 -0.011 0.479 0.361
## nmDls:bRtng 0.002 0.072 0.009 0.277 0.024 0.390
No. Nothing significant on the neighborhoods.
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone) + (yipitNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + (yipitNeighborhoodCount)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4257
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.0921 0.304
## Residual 0.8524 0.923
## Number of obs: 1533, groups: Phone, 151
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.047509 0.332924 3.15
## yipit.appearances -0.016961 0.007218 -2.35
## numDeals -0.750246 0.567913 -1.32
## factor(pricepoint)2 -0.323581 0.110118 -2.94
## factor(pricepoint)3 -0.201480 0.135702 -1.48
## factor(pricepoint)4 0.001166 0.319539 0.00
## bRating 0.771336 0.085550 9.02
## bNumReviews 0.000741 0.000260 2.85
## yipitNeighborhoodCount -0.007235 0.003939 -1.84
## numDeals:factor(pricepoint)2 0.329384 0.203417 1.62
## numDeals:factor(pricepoint)3 0.691606 0.234628 2.95
## numDeals:factor(pricepoint)4 0.147422 0.382831 0.39
## numDeals:bRating 0.083008 0.154441 0.54
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw yptNgC
## yipt.pprncs -0.157
## numDeals -0.259 -0.021
## fctr(prcp)2 -0.482 -0.035 0.141
## fctr(prcp)3 -0.289 -0.050 0.092 0.659
## fctr(prcp)4 -0.201 -0.235 0.068 0.310 0.228
## bRating -0.932 0.093 0.250 0.234 0.137 0.123
## bNumReviews 0.174 -0.045 0.009 -0.203 -0.272 -0.022 -0.207
## yptNghbrhdC -0.290 -0.032 0.008 0.172 -0.081 0.114 0.093 -0.150
## nmDls:fc()2 0.143 -0.009 -0.435 -0.287 -0.186 -0.089 -0.063 -0.008 -0.035
## nmDls:fc()3 0.109 -0.018 -0.267 -0.213 -0.254 -0.073 -0.037 0.008 -0.039
## nmDls:fc()4 0.103 -0.005 -0.341 -0.139 -0.103 -0.225 -0.062 0.000 -0.023
## nmDls:bRtng 0.231 0.023 -0.945 -0.056 -0.026 -0.038 -0.253 -0.013 0.008
## nD:()2 nD:()3 nD:()4
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## yptNghbrhdC
## nmDls:fc()2
## nmDls:fc()3 0.720
## nmDls:fc()4 0.469 0.384
## nmDls:bRtng 0.147 0.010 0.194
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone) + numDeals * (yipitNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + numDeals * (yipitNeighborhoodCount)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4259
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.0962 0.310
## Residual 0.8477 0.921
## Number of obs: 1533, groups: Phone, 151
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.974261 0.336954 2.89
## yipit.appearances -0.016752 0.007289 -2.30
## numDeals -0.303012 0.595549 -0.51
## factor(pricepoint)2 -0.313174 0.111135 -2.82
## factor(pricepoint)3 -0.213396 0.137043 -1.56
## factor(pricepoint)4 0.025281 0.323016 0.08
## bRating 0.778514 0.086329 9.02
## bNumReviews 0.000732 0.000263 2.78
## yipitNeighborhoodCount -0.004729 0.004114 -1.15
## numDeals:factor(pricepoint)2 0.319371 0.203056 1.57
## numDeals:factor(pricepoint)3 0.863292 0.244463 3.53
## numDeals:factor(pricepoint)4 0.047913 0.384060 0.12
## numDeals:bRating 0.033816 0.155418 0.22
## numDeals:yipitNeighborhoodCount -0.018130 0.007446 -2.43
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw yptNgC
## yipt.pprncs -0.157
## numDeals -0.270 -0.016
## fctr(prcp)2 -0.483 -0.035 0.144
## fctr(prcp)3 -0.284 -0.050 0.075 0.656
## fctr(prcp)4 -0.202 -0.235 0.073 0.310 0.226
## bRating -0.931 0.093 0.245 0.234 0.136 0.124
## bNumReviews 0.174 -0.045 0.003 -0.204 -0.272 -0.023 -0.207
## yptNghbrhdC -0.301 -0.028 0.085 0.176 -0.087 0.117 0.097 -0.149
## nmDls:fc()2 0.143 -0.009 -0.421 -0.285 -0.183 -0.089 -0.063 -0.008 -0.039
## nmDls:fc()3 0.078 -0.014 -0.155 -0.191 -0.251 -0.061 -0.026 0.002 0.037
## nmDls:fc()4 0.111 -0.006 -0.356 -0.141 -0.097 -0.224 -0.064 0.002 -0.049
## nmDls:bRtng 0.237 0.021 -0.932 -0.060 -0.021 -0.041 -0.252 -0.010 -0.025
## nmDls:yptNC 0.086 -0.012 -0.307 -0.038 0.037 -0.029 -0.030 0.017 -0.254
## nD:()2 nD:()3 nD:()4 nmDl:R
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## yptNghbrhdC
## nmDls:fc()2
## nmDls:fc()3 0.683
## nmDls:fc()4 0.468 0.335
## nmDls:bRtng 0.148 -0.027 0.205
## nmDls:yptNC 0.022 -0.288 0.107 0.128
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone) + (yelpNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + (yelpNeighborhoodCount)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4262
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.092 0.303
## Residual 0.854 0.924
## Number of obs: 1533, groups: Phone, 151
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.976830 0.329742 2.96
## yipit.appearances -0.018067 0.007234 -2.50
## numDeals -0.725251 0.568410 -1.28
## factor(pricepoint)2 -0.280568 0.108669 -2.58
## factor(pricepoint)3 -0.230046 0.135414 -1.70
## factor(pricepoint)4 0.110757 0.319268 0.35
## bRating 0.775029 0.085614 9.05
## bNumReviews 0.000674 0.000257 2.62
## yelpNeighborhoodCount -0.001042 0.000832 -1.25
## numDeals:factor(pricepoint)2 0.312644 0.203437 1.54
## numDeals:factor(pricepoint)3 0.671711 0.234608 2.86
## numDeals:factor(pricepoint)4 0.128673 0.382986 0.34
## numDeals:bRating 0.081387 0.154563 0.53
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw ylpNgC
## yipt.pprncs -0.187
## numDeals -0.253 -0.023
## fctr(prcp)2 -0.427 -0.035 0.143
## fctr(prcp)3 -0.328 -0.049 0.091 0.680
## fctr(prcp)4 -0.142 -0.240 0.069 0.301 0.232
## bRating -0.939 0.103 0.247 0.214 0.150 0.102
## bNumReviews 0.137 -0.051 0.011 -0.181 -0.289 -0.004 -0.196
## ylpNghbrhdC -0.257 0.076 -0.024 -0.060 0.050 -0.107 0.101 -0.014
## nmDls:fc()2 0.131 -0.009 -0.435 -0.286 -0.189 -0.087 -0.059 -0.014 0.014
## nmDls:fc()3 0.096 -0.019 -0.267 -0.210 -0.257 -0.070 -0.032 0.002 0.010
## nmDls:fc()4 0.096 -0.005 -0.341 -0.138 -0.105 -0.224 -0.060 -0.003 0.005
## nmDls:bRtng 0.230 0.024 -0.946 -0.060 -0.024 -0.041 -0.251 -0.012 0.020
## nD:()2 nD:()3 nD:()4
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## ylpNghbrhdC
## nmDls:fc()2
## nmDls:fc()3 0.720
## nmDls:fc()4 0.468 0.383
## nmDls:bRtng 0.147 0.011 0.194
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances +
(numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 |
Phone) + numDeals * (yelpNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | Phone) + numDeals * (yelpNeighborhoodCount)
## Data: subset(periodDealRating, numDeals <= 2)
##
## REML criterion at convergence: 4273
##
## Random effects:
## Groups Name Variance Std.Dev.
## Phone (Intercept) 0.092 0.303
## Residual 0.854 0.924
## Number of obs: 1533, groups: Phone, 151
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.965461 0.330512 2.92
## yipit.appearances -0.018105 0.007236 -2.50
## numDeals -0.623178 0.600674 -1.04
## factor(pricepoint)2 -0.281853 0.108716 -2.59
## factor(pricepoint)3 -0.228674 0.135464 -1.69
## factor(pricepoint)4 0.108250 0.319361 0.34
## bRating 0.776448 0.085673 9.06
## bNumReviews 0.000671 0.000257 2.61
## yelpNeighborhoodCount -0.000935 0.000856 -1.09
## numDeals:factor(pricepoint)2 0.319299 0.203878 1.57
## numDeals:factor(pricepoint)3 0.660877 0.235564 2.81
## numDeals:factor(pricepoint)4 0.152065 0.385645 0.39
## numDeals:bRating 0.067150 0.156947 0.43
## numDeals:yelpNeighborhoodCount -0.000774 0.001470 -0.53
##
## Correlation of Fixed Effects:
## (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw ylpNgC
## yipt.pprncs -0.186
## numDeals -0.260 -0.025
## fctr(prcp)2 -0.424 -0.034 0.128
## fctr(prcp)3 -0.329 -0.049 0.093 0.679
## fctr(prcp)4 -0.141 -0.239 0.061 0.301 0.232
## bRating -0.939 0.103 0.244 0.213 0.150 0.101
## bNumReviews 0.137 -0.051 0.005 -0.180 -0.289 -0.003 -0.197
## ylpNghbrhdC -0.265 0.071 0.054 -0.064 0.053 -0.108 0.105 -0.017
## nmDls:fc()2 0.127 -0.009 -0.391 -0.287 -0.187 -0.088 -0.057 -0.015 0.028
## nmDls:fc()3 0.101 -0.018 -0.280 -0.207 -0.258 -0.068 -0.035 0.003 -0.011
## nmDls:fc()4 0.088 -0.007 -0.284 -0.139 -0.102 -0.224 -0.055 -0.005 0.033
## nmDls:bRtng 0.237 0.026 -0.937 -0.055 -0.027 -0.038 -0.253 -0.009 -0.021
## nmDls:ylpNC 0.065 0.010 -0.323 0.022 -0.019 0.015 -0.032 0.016 -0.236
## nD:()2 nD:()3 nD:()4 nmDl:R
## yipt.pprncs
## numDeals
## fctr(prcp)2
## fctr(prcp)3
## fctr(prcp)4
## bRating
## bNumReviews
## ylpNghbrhdC
## nmDls:fc()2
## nmDls:fc()3 0.710
## nmDls:fc()4 0.471 0.369
## nmDls:bRtng 0.134 0.025 0.170
## nmDls:ylpNC -0.062 0.087 -0.115 0.172