Finding the Groupon Effect in our data set

Introduction

This report aims to report on the Groupon effect found by Byers, Mitzenmacher, Zervas (2012). In the article, the authors found a discounuity at the start-date of the deal on the numerical ratings. The suggestion was to observe whether this discontinuity existed in our data set.

Methods

I proceed to reshape our data set to have numerical offsets and the averages of the restaurants with a deal.

Data transformation

The following steps are performed:

  1. Create a list of restaurants that have yipit appearances (at least one deal)
  2. restrict the individual reviews data (reviewsList) to only restaurants who did deals
  3. restrict the list of restaurants data (restaurantList) to only restaurants who did deals
  4. add the date of the first deal to the reviewslist
  5. now we can create the offset: Review Date - First Deal Date

Methods

Now we can plot the data as BWZ do. They actually have two graphs. The first graph shows the arrival of the reviews before and after the deal.

Groupon Effect Plot 1: The arrival of the reviews before and after the deal

# *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = length)) + geom_bar(stat = "identity") + 
    facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red") + xlab("Offset (in days)") + 
    ylab("Number of new reviews")

plot of chunk unnamed-chunk-4

## 100 offset
ggplot(data = subset(reviews.inyipit.summary, offset > -100 & offset < 100), 
    aes(x = offset, y = length)) + geom_bar(stat = "identity") + facet_grid(. ~ 
    after, scales = "free") + geom_smooth(colour = "red") + xlab("Offset (in days)") + 
    ylab("Number of new reviews")

plot of chunk unnamed-chunk-4

Note that there is indeed an increase in the number or reviews, BMZ term these reviews as “Groupon Reviews” becuase they contain the word 'Groupon' in them. We have not explored this yet. However, unlike their graph, there seem to be a sharp decrease in the number of reviews of these restaurants after the deal.

Groupon Effect Plot 2: The change in mean ratings

# Now let's find the Groupon effect *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = mean)) + geom_point() + 
    facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red", span = 1.5) + 
    xlab("Offset (in days)") + ylab("Mean rating of all Groupon restaurants") + 
    ggtitle("Groupon-Effect Graph 2: Mean ratings change")

plot of chunk unnamed-chunk-5

## 100 offset
ggplot(data = subset(reviews.inyipit.summary, offset > -100 & offset < 100), 
    aes(x = offset, y = mean)) + geom_point() + facet_grid(. ~ after, scales = "free") + 
    geom_smooth(colour = "red", span = 1.5) + xlab("Offset (in days)") + ylab("Mean rating of all Groupon restaurants") + 
    ggtitle("Groupon-Effect Graph 2: Mean ratings change")

plot of chunk unnamed-chunk-5

Note that the discontinuity is fairly small and probably not significant in comparisson to the BMZ graph.

Creating bins to smooth over 5 days

Perhaps we are unable to see the Groupon effect because we need to smooth over a longer period. I tried smoothing over 5 days

The arrival of reviews now looks like this:

# *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset.bin, y = length)) + geom_bar(stat = "identity") + 
    facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red")

plot of chunk unnamed-chunk-7

The mean ratings change plot now looks like this:

# Now let's find the Groupon effect *keep
ggplot(data = reviews.inyipit.summary, aes(x = offset.bin, y = mean)) + geom_point() + 
    facet_grid(. ~ after, scales = "free") + geom_smooth(colour = "red")

plot of chunk unnamed-chunk-8

Creating Quantiles for the number of reviews

Now let's observe how creating quantiles for the number of reviews changes the plots.

The plots for the number of reviews and mean changes now look like this:

# plot for quantiles for number of reviews and mean change with smoother
ggplot(data = reviews.inyipit.summary, aes(x = offset, y = length)) + geom_bar(stat = "identity") + 
    facet_grid(numreviews.group ~ after, scales = "free", space = "free_y") + 
    geom_smooth(colour = "red") + ggtitle("number of reviews in quantiles of number of reviews")

plot of chunk unnamed-chunk-10


ggplot(data = reviews.inyipit.summary, aes(x = offset, y = mean)) + geom_bar(stat = "identity") + 
    facet_grid(numreviews.group ~ after, scales = "free", space = "free_y") + 
    geom_smooth(colour = "red") + ggtitle("mean change rating in quantiles of number reviews")

plot of chunk unnamed-chunk-10


# Next step: new plot to show the histogram in 90 days to see if the trend
# is more over a long time

Calculating the right deal offset to avoid missing data bias

We need to correct the analysis because the data collection of reviews ends possibly before the offsets we have tried to far. We therefore examine the dates we have in our data set.

## The last deal start date is:
as.Date(max(restlist_temp$yipit.first.date.added), origin = "1970-01-01")
## [1] "2012-06-22"
## [1] '2012-06-22'

## The last restaurant review date is:
as.Date(max(reviews.inyipit$numdate), origin = "1970-01-01")
## [1] "2012-10-11"
# [1] '2012-10-11'

max(reviews.inyipit$numdate) - max(restlist_temp$yipit.first.date.added)
## [1] 111

# MAX offset should be around 100s as.Date(15513, origin = '1970-01-01')

Adding a Control (no deal)

Let's compare the Groupon effect graphs with a control, a group of restaurants that do not go into a deal. The offset is then selected arbitrarily to be median of the deal data period, which is “2012-03-27”

Groupon Effect Plot 1: Arrival of reviews (with control)

On the top facet, (NO-DEAL), you can see how reviews arrive on our data set before and after the arbitrary offset. On the bottom facet, (DEAL), you can see how reviews to our data before and after the deal. You can observe the spike (discontinuity) initially in the deal.

# *keep: facet for inyipit and colour for after the deal
ggplot(data = reviews.all.summary, aes(x = offset, y = length, color = after)) + 
    geom_bar(stat = "identity", position = "identity", alpha = 0.7) + facet_grid(inyipit ~ 
    ., scales = "free", space = "free_x") + geom_smooth() + xlab("Offset (in days)") + 
    ylab("Number of new reviews") + opts(panel.margin = rep(unit(0, "null"), 
    1)) + labs(title = "Number of reviews by offset. Restaurants without deals on top")

plot of chunk unnamed-chunk-14

Groupon Effect Plot 2: Ratings (with control)

In the top facet, (NO-DEAL), there is no apparent discontinuity in the rating. However, in the bottom facet, (DEAL), you can see a negative discontinuity in the rating.

ggplot(data = reviews.all.summary, aes(x = offset, y = mean, color = after)) + 
    geom_point(stat = "identity", position = "identity", alpha = 0.7) + facet_grid(inyipit ~ 
    ., scales = "free", space = "free_x") + geom_smooth() + xlab("Offset (in days)") + 
    ylab("Mean rating of all Groupon restaurants") + opts(panel.margin = rep(unit(0, 
    "null"), 1)) + labs(title = "Ratings by offset. Restaurants without deals on top.")

plot of chunk unnamed-chunk-15

Exploring the role of competition

The role of competition is unexplored. Let's first explore how competition can be measured in our data set, and start exploring how we can visualize it.

Measuring competition by location

Location effects are going to be part of competition. In its most basic form, co-located restaurants are competing with each other. The first way we can explore location is by neighborhood.

Neighborhood-level

Our data has neighborhood data, so the first thing we can do is observe whether the effects are more obvious in some neighborhoods than in others.

Two forms of basic forms of competition can be measured:

  1. How many restaurants are there in your neighborhood?
  2. How many restaurants with deals are there in your neighborhood?

Histogram of restaurants by neighborhood

Note that Adams Morgan, Dupont, Georgetown lead in number of restraurants and number of restaurant with deals

ggplot(data = nbhd.count) + geom_bar(aes(x = nbhd, y = all.count), fill = "red", 
    alpha = 0.7) + geom_bar(aes(x = nbhd, y = inyipit.count), fill = "blue", 
    alpha = 0.7) + theme(text = element_text(size = 20), axis.text.x = element_text(angle = 90, 
    vjust = 1)) + labs(title = "Restaurants by neighborhood (Blue for DEAL)")

plot of chunk unnamed-chunk-17

Groupon effect plot 1 (by neighborhood)

Group the restaurants in quantiles:

  1. Low number of restaurants
  2. Medium number of restaurants
  3. High number of restaurants

Keep in mind a large number of restaurants don't have a neighborhood, so NA is also shown.


ggplot(data = subset(reviews.all.summary.1, inyipit == TRUE), aes(x = offset, 
    y = length, color = after, fill = after)) + geom_bar(stat = "identity", 
    position = "identity", alpha = 0.5) + facet_grid(all.count.grp ~ ., scales = "free") + 
    geom_smooth() + xlab("Offset (in days)") + ylab("Number of new reviews") + 
    opts(panel.margin = rep(unit(0, "null"), 1)) + labs(title = "Num Reviews by # restaurants in neighborhood") + 
    xlim(-100, 100)

plot of chunk unnamed-chunk-19


ggplot(data = subset(reviews.all.summary.2, inyipit == TRUE), aes(x = offset, 
    y = length, color = after, fill = after)) + geom_bar(stat = "identity", 
    position = "identity", alpha = 0.5) + facet_grid(inyipit.count.grp ~ ., 
    scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Number of new reviews") + 
    opts(panel.margin = rep(unit(0, "null"), 1)) + labs(title = "Num Reviews by # restaurant with DEALS") + 
    xlim(-100, 100)

plot of chunk unnamed-chunk-19

Groupon Effect Plot 2: Ratings (by neighborhood competition)

Now let's explore the effect on the rating:

ggplot(data = subset(reviews.all.summary.1, inyipit == TRUE), aes(x = offset, 
    y = mean, color = after, fill = after)) + geom_point() + facet_grid(all.count.grp ~ 
    ., scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Ratings") + 
    labs(title = "Rating by # restaurants in neighborhood") + xlim(-100, 100)

plot of chunk unnamed-chunk-20


ggplot(data = subset(reviews.all.summary.2, inyipit == TRUE), aes(x = offset, 
    y = mean, color = after, fill = after)) + geom_point() + facet_grid(inyipit.count.grp ~ 
    ., scales = "free") + geom_smooth() + xlab("Offset (in days)") + ylab("Ratings") + 
    labs(title = "Rating by # restaurant with DEALS") + xlim(-100, 100)

plot of chunk unnamed-chunk-20

Time Series Model #1

Idea: Observe how rating/numreviews of NODEAL restaurants are affected by DEAL restaurants around them.

Dependent variables:

Pools of data for the dependent variable:

Independent variables:

Timeframe: 6 months of deal data set. The range is for deal start and for deal end:

as.Date(range(yipitall$Date.Added.Num), origin = "1970-01-01")
## [1] "2011-12-16" "2012-06-29"
as.Date(range(yipitall$Date.Ended.Num), origin = "1970-01-01")
## [1] "2012-01-01" "2012-06-30"

The histogram for date added:

qplot(data = yipitall, Date.Added.Num, geom = "histogram", binwidth = 1)

plot of chunk unnamed-chunk-22

ggplot(yipitall, aes(x = Date.Added.Num)) + geom_histogram(binwidth = 1, aes(fill = Date.Ended.Num))
## Error: 'x' and 'units' must have length > 0

Exploring a basic time series model

DVs = rating/reviews IVs = number of deals in time period

Time period is 1, 2 , or 4 weeks


## 1.  Restrict yipitall to contain only restaurants that have restaurant
## reviews

yipitall.inyelp <- yipitall

# restlist.inyipit has the restlist of restaurants that are in yipit
yipitall.inyelp$inyelp <- (yipitall$Phone %in% restlist.inyipit$restphones)
yipitall.inyelp <- (yipitall.inyelp[yipitall.inyelp$inyelp == TRUE, ])  #317 deals with 
summarise(yipitall.inyelp, count = n_distinct(Phone))
##   count
## 1   189
length(unique(yipitall.inyelp$Phone))  #204 unique restaurants in the deals set
## [1] 189
# 317-204 = 113 deals with repeated restaurants


## 2. Create Time Periods of 2, 4 weeks for both the start and end date of
## the deal
periodLength <- 2  #meaning 2 weeks

# min time (earliest deal date)
minDate <- min(min(yipitall.inyelp$Date.Added.Num), min(yipitall.inyelp$Date.Ended.Num))
yipitall.inyelp$Date.Added.Per <- yipitall.inyelp$Date.Added.Num - minDate + 
    1
yipitall.inyelp$Date.Added.Per <- ceiling(yipitall.inyelp$Date.Added.Per/(7 * 
    periodLength))

yipitall.inyelp$Date.Ended.Per <- yipitall.inyelp$Date.Ended.Num - minDate + 
    1
yipitall.inyelp$Date.Ended.Per <- ceiling(yipitall.inyelp$Date.Ended.Per/(7 * 
    periodLength))

# Let's look at deal duration by Deal Company
yipitall.inyelp$Date.Duration <- yipitall.inyelp$Date.Ended.Num - yipitall.inyelp$Date.Added.Num
yipitall.inyelp$Date.Duration.Weeks <- yipitall.inyelp$Date.Duration/7

qplot(data = yipitall.inyelp, y = Date.Duration.Weeks, x = Site, geom = "boxplot") + 
    ylim(0, 4)

plot of chunk unnamed-chunk-23


# let's summarize this
summarise(group_by(yipitall.inyelp, Site), duration = mean(Date.Duration))
## Source: local data frame [22 x 2]
## 
##                      Site duration
## 1            Amazon Local    3.000
## 2             Daily Candy   14.000
## 3            Deal Chicken    4.778
## 4                DealFind    2.333
## 5                Eversave    2.000
## 6               Gilt City   10.917
## 7           Google Offers    5.727
## 8  Google Offers Partners    3.733
## 9                 Groupon    3.571
## 10                HomeRun    3.500
## 11           LivingSocial    4.649
## 12              OpenTable    2.000
## 13                 Recoup    6.667
## 14              Rue La La    6.000
## 15              SalesVote    6.444
## 16                Savored   10.250
## 17               Scoutmob    3.000
## 18               Signpost   10.000
## 19          Specialicious    4.500
## 20       The Capitol Deal    4.867
## 21  Travelzoo Local Deals    8.400
## 22              kgb deals   10.500

## 3. Let's now create the matrix of deals in time periods

# how many periods do we have
maxPeriods <- max(max(yipitall.inyelp$Date.Added.Per), max(yipitall.inyelp$Date.Ended.Per))

# now let's create the matrix
matrixPeriods <- matrix(0, nrow(yipitall.inyelp), maxPeriods)
for (i in 1:nrow(yipitall.inyelp)) {
    matrixPeriods[i, ] <- 1:maxPeriods %in% yipitall.inyelp$Date.Added.Per[i]:yipitall.inyelp$Date.Ended.Per[i]
}
yipitall.inyelp <- (cbind(yipitall.inyelp, matrixPeriods))

# in theory melt from Reshape can help us get the data in a form we like
yipitall.melt <- yipitall.inyelp[, c("Phone", "Site", paste0(1:maxPeriods))]
yipitall.melt <- (melt(yipitall.melt, id = c("Phone", "Site")))
colnames(yipitall.melt)[3] <- "Period"

yipitall.melt <- group_by(yipitall.melt, Phone, Period)
# yipitall.melt <- subset(x=yipitall.melt, subset=value>0) #This would erase
# the 0s

periodDeals <- summarise(group_by(yipitall.melt, Phone, Period), numDeals = sum(value))

qplot(data = periodDeals, x = Period, y = numDeals, group = factor(Phone), color = factor(Phone), 
    geom = "line")

plot of chunk unnamed-chunk-23


# unique restaurants of phone numbers yipitRestUnique =
# unique(periodDeals$Phone) yipitRestSample <- sort(sample(yipitRestUnique,
# size=20, replace=FALSE))
periodDeals$numDeals[periodDeals$numDeals == 0] <- NA  # the NAs are 0
periodDeals$numDealsJitter <- periodDeals$numDeals + rnorm(n = nrow(periodDeals), 
    mean = 0, sd = 0.1)
ggplot(data = periodDeals, aes(x = Period, y = numDealsJitter, group = factor(Phone), 
    color = factor(Phone))) + geom_line() + geom_point(size = 2)

plot of chunk unnamed-chunk-23



## 4. Now let's get those depedent variables Objective is to have a time
## series of the new reviews and the avg. rating in the time period.

# copy reviews.inyipit since we will be modifying it

reviews.inyipit.subset <- reviews.inyipit

# a.  Truncate reviews to include only the restarurants that do deals.

length(unique(periodDeals$Phone))  #189
## [1] 189
length(unique(reviews.inyipit$restphones))  #186
## [1] 186
# We have reviews for 186 of the 189 deal restaurants, so we need to take
# out the periodDeal rows for the restaurants we don't have reviews for.

periodDeals <- (subset(periodDeals, Phone %in% reviews.inyipit$restphones))
length(unique(periodDeals$Phone))  #is now 186
## [1] 186

# b.  Create time period column Let's do the times in period times.
reviews.inyipit.subset$reviewsdatesNum <- as.numeric(as.Date(reviews.inyipit.subset$reviewsdates, 
    "%m/%d/%Y"))
reviews.inyipit.subset$reviewdatesPer <- reviews.inyipit.subset$reviewsdatesNum - 
    minDate + 1
reviews.inyipit.subset$reviewdatesPer <- ceiling(reviews.inyipit.subset$reviewdatesPer/(7 * 
    periodLength))

# 1.  Save the reviews.inyipit.subset into another variable so that we can
# compute before deal summaries
reviews.temp <- reviews.inyipit.subset  #before we remove the reviews before/after the deal period

# c.  Discard the negative periods and the ones over our yipit dataset
reviews.inyipit.subset <- subset(reviews.inyipit.subset, reviewdatesPer > 0)
reviews.inyipit.subset <- subset(reviews.inyipit.subset, reviewdatesPer <= maxPeriods)

# d.  Use group by to summarize
reviews.inyipit.subset <- group_by(reviews.inyipit.subset, restphones, reviewdatesPer)
reviews.inyipit.subset.summary <- summarize(reviews.inyipit.subset, meanRating = mean(reviewsrating), 
    numReviews = n(), sdRating = sd(reviewsrating))

names(reviews.inyipit.subset.summary)[1:2] <- c("Phone", "Period")

# e.  Merge with other deals table
periodDeals$numDeals[is.na(periodDeals$numDeals)] <- 0  #go back to adding the zeros
# HERE if u need to redo PeriodDealRating merging with restlist.period

periodDealRating <- (merge(periodDeals, reviews.inyipit.subset.summary, by = c("Phone", 
    "Period"), all = TRUE))

# Model1:
periodDealRating$Period <- as.numeric(as.character(periodDealRating$Period))

# explore: time relationships mean rating
tmp <- summarize(group_by(periodDealRating, Period), meanRating = mean(meanRating, 
    na.rm = TRUE))

ggplot(data = periodDealRating, aes(x = Period, y = meanRating)) + geom_line(aes(group = factor(Phone), 
    color = factor(Phone))) + geom_line(data = tmp, aes(x = Period, y = meanRating), 
    color = "black")
## Warning: Removed 415 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-24


# numReviews
tmp2 <- summarize(group_by(periodDealRating, Period), numReviews = mean(numReviews, 
    na.rm = TRUE))

ggplot(data = periodDealRating, aes(x = Period, y = numReviews)) + geom_line(aes(group = factor(Phone), 
    color = factor(Phone))) + geom_line(data = tmp2, aes(x = Period, y = numReviews), 
    color = "black")
## Warning: Removed 415 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-24


# stDev
tmp3 <- summarize(group_by(periodDealRating, Period), sdRating = mean(sdRating, 
    na.rm = TRUE))

ggplot(data = periodDealRating, aes(x = Period, y = sdRating)) + geom_line(aes(group = factor(Phone), 
    color = factor(Phone))) + geom_line(data = tmp3, aes(x = Period, y = sdRating), 
    color = "black")
## Warning: Removed 709 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-24


# meanRating and numDeals pdf('Figure: Boxplot of meanRating vs
# numDeals.pdf')
ggplot(data = periodDealRating, aes(x = numDeals, y = meanRating, group = factor(numDeals))) + 
    geom_boxplot()
## Warning: Removed 1004 rows containing non-finite values (stat_boxplot).

plot of chunk unnamed-chunk-24

# dev.off()

ggplot(data = periodDealRating, aes(x = numDeals, y = sdRating, group = factor(numDeals))) + 
    geom_boxplot()
## Warning: Removed 1748 rows containing non-finite values (stat_boxplot).

plot of chunk unnamed-chunk-24


ggplot(data = periodDealRating, aes(x = numDeals, y = numReviews, group = factor(numDeals))) + 
    geom_boxplot()
## Warning: Removed 1004 rows containing non-finite values (stat_boxplot).

plot of chunk unnamed-chunk-24


# Density helps us see the effect on numReviews from numDeals
ggplot(data = periodDealRating, aes(x = numReviews, color = numDeals, group = numDeals)) + 
    geom_density(adjust = 3)
## Warning: Removed 894 rows containing non-finite values (stat_density).
## Warning: Removed 94 rows containing non-finite values (stat_density).
## Warning: Removed 15 rows containing non-finite values (stat_density).
## Warning: Removed 1 rows containing non-finite values (stat_density).

plot of chunk unnamed-chunk-24



# now let's fit the model most basic model
periodDealRating$numReviews[is.na(periodDealRating$numReviews)] <- 0

lmer(data = periodDealRating, meanRating ~ numDeals + (1 | Phone))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ numDeals + (1 | Phone) 
##    Data: periodDealRating 
## REML criterion at convergence: 5065 
## Random effects:
##  Groups   Name        Std.Dev.
##  Phone    (Intercept) 0.540   
##  Residual             0.928   
## Number of obs: 1786, groups: Phone, 182
## Fixed Effects:
## (Intercept)     numDeals  
##      3.3025      -0.0582

lmer(data = periodDealRating, numReviews ~ numDeals + (1 | Phone))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ numDeals + (1 | Phone) 
##    Data: periodDealRating 
## REML criterion at convergence: 10772 
## Random effects:
##  Groups   Name        Std.Dev.
##  Phone    (Intercept) 1.95    
##  Residual             1.49    
## Number of obs: 2790, groups: Phone, 186
## Fixed Effects:
## (Intercept)     numDeals  
##       1.682        0.181

# Add controls: Starting Rating, Starting num reviews,

# Baseline summary is the summary before the deals get started

baseline.summary <- summarize(group_by(subset(reviews.temp, reviewsdatesNum < 
    minDate), restphones), bRating = mean(reviewsrating), bNumReviews = n())

names(baseline.summary)[1] <- "Phone"

# Expand periodDealRating with baseline numReviews and baseline Rating
periodDealRating <- (merge(periodDealRating, baseline.summary, by = "Phone", 
    all = TRUE))

# add 0s for num reviews NAs
periodDealRating$bNumReviews[is.na(periodDealRating$bNumReviews)] <- 0

# Now let's fit a model where we control for the starting Rating and
# Starting num Reviews
qplot(data = baseline.summary, x = bRating, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-24

qplot(data = baseline.summary, x = bNumReviews, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-24

qplot(data = periodDealRating, x = numDeals, geom = "histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-24

table(periodDealRating$numDeals)
## 
##    0    1    2    3    6 
## 2427  321   36    5    1

summary(lmer(data = periodDealRating, meanRating ~ +factor(numDeals) + bRating + 
    bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ +factor(numDeals) + bRating + bNumReviews + (1 |      Phone) 
##    Data: periodDealRating 
## 
## REML criterion at convergence: 4788 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.109    0.331   
##  Residual             0.863    0.929   
## Number of obs: 1722, groups: Phone, 170
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)        0.289939   0.268132    1.08
## factor(numDeals)1 -0.081799   0.069560   -1.18
## factor(numDeals)2 -0.517718   0.215857   -2.40
## factor(numDeals)3  1.316826   0.489018    2.69
## factor(numDeals)6  0.032589   0.965481    0.03
## bRating            0.872917   0.079695   10.95
## bNumReviews        0.000529   0.000251    2.10
## 
## Correlation of Fixed Effects:
##             (Intr) fc(D)1 fc(D)2 fc(D)3 fc(D)6 bRatng
## fctr(nmDl)1 -0.040                                   
## fctr(nmDl)2 -0.035  0.041                            
## fctr(nmDl)3 -0.053  0.023  0.020                     
## fctr(nmDl)6  0.040  0.005  0.001 -0.001              
## bRating     -0.984  0.008  0.025  0.046 -0.044       
## bNumReviews  0.024 -0.014 -0.007  0.009  0.015 -0.140
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ +factor(numDeals) + 
    bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ +factor(numDeals) + bRating + bNumReviews + (1 |      Phone) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4780 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.110    0.331   
##  Residual             0.864    0.929   
## Number of obs: 1717, groups: Phone, 170
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)        0.279958   0.268824    1.04
## factor(numDeals)1 -0.081297   0.069602   -1.17
## factor(numDeals)2 -0.515471   0.216009   -2.39
## bRating            0.875876   0.079902   10.96
## bNumReviews        0.000528   0.000252    2.10
## 
## Correlation of Fixed Effects:
##             (Intr) fc(D)1 fc(D)2 bRatng
## fctr(nmDl)1 -0.040                     
## fctr(nmDl)2 -0.036  0.041              
## bRating     -0.984  0.009  0.026       
## bNumReviews  0.025 -0.014 -0.007 -0.140
summary(lmer(data = subset(periodDealRating, numDeals <= 2), numReviews ~ +factor(numDeals) + 
    bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ +factor(numDeals) + bRating + bNumReviews + (1 |      Phone) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 9871 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 1.86     1.36    
##  Residual             2.21     1.49    
## Number of obs: 2589, groups: Phone, 173
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)       -0.268398   0.786449   -0.34
## factor(numDeals)1  0.156309   0.094649    1.65
## factor(numDeals)2  0.424482   0.275975    1.54
## bRating            0.236840   0.234539    1.01
## bNumReviews        0.011092   0.000837   13.25
## 
## Correlation of Fixed Effects:
##             (Intr) fc(D)1 fc(D)2 bRatng
## fctr(nmDl)1 -0.020                     
## fctr(nmDl)2 -0.020  0.037              
## bRating     -0.984  0.007  0.016       
## bNumReviews  0.018 -0.010  0.000 -0.135

Adding restaurant characteristics:

Let's add the price point, the neighborhood, and the previous yipit apperances (historical)

# 5. Now let's add restaurant characteristics.

# 5.0 Make sure we have the right rows and right columns
restlist.periods <- restlist.inyipit
sum((periodDealRating$Phone) %in% (restlist.periods$restphones))  #all in periodDealRating Covered
## [1] 2790
dim(restlist.periods)  #is 189 but should be 186
## [1] 189  49
restlist.periods <- (subset(restlist.periods, restlist.periods$restphones %in% 
    periodDealRating$Phone))
dim(restlist.periods)  #is now 186
## [1] 186  49
# select the columns I want to merge
restlist.periods <- (select(restlist.periods, restphones, restaddresses, restneighborhoods, 
    yipit.appearances, pricepoint, url.dummy))
# change the name of phone
names(restlist.periods)[1] <- "Phone"

Spatial Visualization and Competition


Let's add neighborhood variables. First, let's visualize how this data looks. First, we need to geocode the addresses of the restaurants

Plot neighborhood information:

q <- qmap("Washington, DC", zoom = 13)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Washington,+DC&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Washington,+DC&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
q + geom_point(data = restlist.periods, aes(x = lon, y = lat, color = factor(restneighborhoods), 
    size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))
## Warning: Removed 14 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-27


# It appears we should fix our NAs.
dim(restlist.periods[is.na(restlist.periods$restneighborhoods), c("restaddresses", 
    "restneighborhoods")])[1]/dim(restlist.periods)[1]
## [1] 0.3387
# 33% don't have a neighborhood.  Let's fix that!


## Now let's plot the ones that are still missing.
dfna <- restlist.periods[is.na(restlist.periods$restneighborhoods), ]
q + geom_text(data = dfna, aes(x = lon, y = lat), label = 1:dim(dfna)[1], color = "red")
## Warning: Removed 2 rows containing missing values (geom_text).

plot of chunk unnamed-chunk-27

# you can see how the ones missing are in border areas, so it makes sense
# that they are NAs
grestlist.periods <- group_by(restlist.periods, restneighborhoods)
summarize(grestlist.periods, count = n())
## Source: local data frame [24 x 2]
## 
##                                  restneighborhoods count
## 1                                     Adams Morgan    17
## 2                           Capitol Hill/Northeast     3
## 3                           Capitol Hill/Southeast     7
## 4                                      Chevy Chase     6
## 5                                        Chinatown     6
## 6                                   Cleveland Park     1
## 7                                 Columbia Heights     4
## 8                                    Dupont Circle    17
## 9                                 Federal Triangle     2
## 10                                    Foggy Bottom     2
## 11                                      Georgetown    12
## 12                                     Glover Park     1
## 13 H Street Corridor/Atlas District/Near Northeast     7
## 14                                    Lincoln Park     1
## 15                                    Logan Circle     2
## 16                                  Mount Pleasant     1
## 17                                       Park View     2
## 18                                    Penn Quarter     3
## 19                                            Shaw     4
## 20                                      Tenleytown     6
## 21                               U Street Corridor    10
## 22                           Van Ness/Forest Hills     3
## 23                                    Woodley Park     6
## 24                                              NA    63

Let's fix the neighborhoods: Foggy Bottom and Penn Quarter


# 5.1 Fix neighborhoods option 1: draw lines and whomever is inside is in
# that neighborhood Foggy Bottom first

# Plot Foggy Bottom ones
subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & (lon >= -77.052573) & 
    (lon <= -77.036651))
##           Phone                                  restaddresses
## 2570 2022962333             2157 P St NW, Washington, DC 20037
## 2598 2022931425             1915 I St NW, Washington, DC 20006
## 1246 2022932765             1924 I St NW, Washington, DC 20006
## 1944 2029730404             2001 L St NW, Washington, DC 20036
## 1353 2025305500             1915 I St NW, Washington, DC 20006
## 2007 2022937760       1725 Desales St NW, Washington, DC 20036
## 2130 2023317574          1132 19th St NW, Washington, DC 20036
## 1377 2025586188             2153 P St NW, Washington, DC 20037
## 2038 2029559494             1909 K St NW, Washington, DC 20006
## 2024 2027851110             1825 M St NW, Washington, DC 20036
## 1972 2028728700             2030 M St NW, Washington, DC 20036
## 2087 2027755660             1810 K St NW, Washington, DC 20006
## 2155 2022231818          1103 19th St NW, Washington, DC 20036
## 2163 2023313232           919 19th St NW, Washington, DC 20006
## 2192 2023315800             1900 I St NW, Washington, DC 20006
## 2255 2024571111            1815 M St  NW, Washington, DC 20036
## 4953 2022233600             2013 I St NW, Washington, DC 20006
## 2676 2022967700 2000 Pennsylvania Ave NW, Washington, DC 20006
## 1851 2022960125             2140 F St NW, Washington, DC 20037
## 1982 2022988488             1716 I St NW, Washington, DC 20006
## 2015 2026592696             1720 I St NW, Washington, DC 20050
## 2021 2024291701          1020 19th St NW, Washington, DC 20036
## 2123 2022932057 2100 Pennsylvania Ave NW, Washington, DC 20037
## 2128 2029744260           836 17th St NW, Washington, DC 20006
## 2758 2023272252          1120 19th St NW, Washington, DC 20036
##      restneighborhoods yipit.appearances pricepoint url.dummy    lon   lat
## 2570              <NA>                 1          1      TRUE -77.05 38.90
## 2598              <NA>                 3          1     FALSE -77.04 38.90
## 1246              <NA>                 1          2      TRUE -77.04 38.90
## 1944              <NA>                12          1      TRUE -77.05 38.90
## 1353              <NA>                 4          3      TRUE -77.04 38.90
## 2007              <NA>                14          3      TRUE -77.04 38.90
## 2130              <NA>                 1          2      TRUE -77.04 38.90
## 1377              <NA>                 4          2      TRUE -77.05 38.90
## 2038              <NA>                 1          3      TRUE -77.04 38.90
## 2024              <NA>                 2          2      TRUE -77.04 38.91
## 1972              <NA>                 6          3      TRUE -77.05 38.91
## 2087              <NA>                21          2      TRUE -77.04 38.90
## 2155              <NA>                 5          2      TRUE -77.04 38.90
## 2163              <NA>                 4          3      TRUE -77.04 38.90
## 2192              <NA>                14          2      TRUE -77.04 38.90
## 2255              <NA>                 1          1      TRUE -77.04 38.91
## 4953              <NA>                10          1     FALSE -77.05 38.90
## 2676      Foggy Bottom                 2          3      TRUE -77.05 38.90
## 1851      Foggy Bottom                 3          1      TRUE -77.05 38.90
## 1982              <NA>                 2          1      TRUE -77.04 38.90
## 2015              <NA>                 9          2      TRUE -77.04 38.90
## 2021              <NA>                 5          2     FALSE -77.04 38.90
## 2123              <NA>                 3          1      TRUE -77.05 38.90
## 2128              <NA>                 1          2      TRUE -77.04 38.90
## 2758              <NA>                 2          1      TRUE -77.04 38.90

q + geom_point(data = subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & 
    (lon >= -77.052573) & (lon <= -77.036651)), aes(x = lon, y = lat, color = factor(restneighborhoods), 
    size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))

plot of chunk unnamed-chunk-28


restlist.periods$restneighborhoods[(restlist.periods$lat <= 38.906) & (restlist.periods$lat >= 
    38.892373) & (restlist.periods$lon >= -77.052573) & (restlist.periods$lon <= 
    -77.036651)] <- "Foggy Bottom"

# Penn Quarter now
subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & (lon >= -77.036652) & 
    (lon <= -77.01455))
##           Phone                                  restaddresses
## 490  2026823123              465 K St NW, Washington, DC 20001
## 1280 2027377275           920 14th St Nw, Washington, DC 20005
## 1296 2027892800             1401 K St NW, Washington, DC 20005
## 1352 2023794366          1155 14th St NW, Washington, DC 20005
## 1386 2023932292      1004 Vermont Ave NW, Washington, DC 20005
## 4522 2022346870             1100 P St NW, Washington, DC 20005
## 3686 2027837007            444 7th St NW, Washington, DC 20004
## 3692 2025562050            410 7th St NW, Washington, DC 20004
## 3703 2026399727              650 F St NW, Washington, DC 20004
## 3713 2026380800            701 9th St NW, Washington, DC 20001
## 3719 2022893600            707 6th St NW, Washington, DC 20001
## 3724 2023932929            781 7th St NW, Washington, DC 20001
## 3729 2022169550        1100 New York Ave, Washington, DC 20005
## 3731 2026383434              901 F St NW, Washington, DC 20004
## 3742 2023932905               701 7th St, Washington, DC 20001
## 3746 2023935444           518 10th St NW, Washington, DC 20004
## 3751 2027370101           734 11th St NW, Washington, DC 20001
## 3756 2022891001              617 H St NW, Washington, DC 20001
## 3769 2028421405              507 H St NW, Washington, DC 20001
## 3781 2026294355              915 E St NW, Washington, DC 20004
## 3790 2025062888              608 H St NW, Washington, DC 20001
## 3814 2025891504     1201 New York Ave NW, Washington, DC 20005
## 3816 2024081288              611 H St NW, Washington, DC 20001
## 3822 2024089292            817 7th St NW, Washington, DC 20001
## 3835 2023712295 1100 Pennsylvania Ave NW, Washington, DC 20004
## 3840 2026280980           555 11th St NW, Washington, DC 20004
## 3859 2022897482 1100 Pennsylvania Ave NW, Washington, DC 20004
## 3985 2026399830     1099 New York Ave NW, Washington, DC 20001
## 3994 2026379770             1108 K St NW, Washington, DC 20005
## 4147 2026829333          800 K Street NW, Washington, DC 20001
##      restneighborhoods yipit.appearances pricepoint url.dummy    lon   lat
## 490               <NA>                 2          3      TRUE -77.02 38.90
## 1280              <NA>                 2          2      TRUE -77.03 38.90
## 1296              <NA>                 3          3      TRUE -77.03 38.90
## 1352              <NA>                17          3      TRUE -77.03 38.91
## 1386              <NA>                21          2     FALSE -77.03 38.90
## 4522              <NA>                 2          2      TRUE -77.03 38.90
## 3686      Penn Quarter                 1          3      TRUE -77.02 38.90
## 3692      Penn Quarter                 6          2      TRUE -77.02 38.90
## 3703      Penn Quarter                12          1      TRUE -77.02 38.90
## 3713              <NA>                 1          3      TRUE -77.02 38.90
## 3719              <NA>                 1          3      TRUE -77.02 38.90
## 3724         Chinatown                22          3      TRUE -77.02 38.90
## 3729              <NA>                 1          3      TRUE -77.03 38.90
## 3731              <NA>                 1          2      TRUE -77.02 38.90
## 3742              <NA>                 6          2      TRUE -77.02 38.90
## 3746              <NA>                 4          2      TRUE -77.03 38.90
## 3751              <NA>                 1          3      TRUE -77.03 38.90
## 3756         Chinatown                 1          2     FALSE -77.02 38.90
## 3769         Chinatown                 1          2      TRUE -77.02 38.90
## 3781              <NA>                 8          2      TRUE -77.02 38.90
## 3790         Chinatown                 1          2      TRUE -77.02 38.90
## 3814              <NA>                 1          3      TRUE -77.03 38.90
## 3816         Chinatown                 1          2      TRUE -77.02 38.90
## 3822         Chinatown                11          2      TRUE -77.02 38.90
## 3835  Federal Triangle                 2          1      TRUE -77.03 38.89
## 3840              <NA>                19          2      TRUE -77.03 38.90
## 3859  Federal Triangle                 1          1     FALSE -77.03 38.89
## 3985              <NA>                12          3      TRUE -77.03 38.90
## 3994              <NA>                 1          1      TRUE -77.03 38.90
## 4147              <NA>                 3          1     FALSE -77.02 38.90

q + geom_point(data = subset(restlist.periods, (lat <= 38.906) & (lat >= 38.892373) & 
    (lon >= -77.036652) & (lon <= -77.01455)), aes(x = lon, y = lat, color = factor(restneighborhoods), 
    size = yipit.appearances), alpha = 0.7) + scale_size(range = c(3, 15))

plot of chunk unnamed-chunk-28


restlist.periods$restneighborhoods[(restlist.periods$lat <= 38.906) & (restlist.periods$lat >= 
    38.892373) & (restlist.periods$lon >= -77.036652) & (restlist.periods$lon <= 
    -77.01455)] <- "Penn Quarter"

### Add yipit-per-neighborhood count
grestlist.periods <- group_by(restlist.periods, restneighborhoods)
sumtemp <- summarize(grestlist.periods, count = n())
restlist.periods$yipitNeighborhoodCount <- 0
for (i in 1:nrow(restlist.periods)) {
    restlist.periods$yipitNeighborhoodCount[i] <- sumtemp$count[sumtemp$restneighborhoods == 
        restlist.periods$restneighborhoods[i]][1]
}

# 5.2 Add Yelp count of restaurants by neighborhood.

grestlist_all <- group_by(restlist_all_unique, restneighborhoods)
sumtemp <- summarize(grestlist_all, count = n())

restlist.periods$yelpNeighborhoodCount <- 0
for (i in 1:nrow(restlist.periods)) {
    restlist.periods$yelpNeighborhoodCount[i] <- sumtemp$count[sumtemp$restneighborhoods == 
        restlist.periods$restneighborhoods[i]][1]
}


# now that our restlist.periods looks good, we merge it with
# periodDealRating
periodDealRating <- (merge(x = periodDealRating, y = restlist.periods, by = "Phone", 
    all = TRUE))

Model1: meanRating ~ restaurant characteristics

# Model for ratings
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4776 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.0987   0.314   
##  Residual             0.8602   0.927   
## Number of obs: 1717, groups: Phone, 170
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                   0.911385   0.313670    2.91
## yipit.appearances            -0.016805   0.007193   -2.34
## numDeals                     -0.898868   0.561460   -1.60
## factor(pricepoint)2          -0.305429   0.099048   -3.08
## factor(pricepoint)3          -0.238002   0.127767   -1.86
## factor(pricepoint)4           0.041826   0.320895    0.13
## bRating                       0.778135   0.083943    9.27
## bNumReviews                   0.000676   0.000259    2.61
## numDeals:factor(pricepoint)2  0.409118   0.180225    2.27
## numDeals:factor(pricepoint)3  0.726391   0.215447    3.37
## numDeals:factor(pricepoint)4  0.211875   0.374038    0.57
## numDeals:bRating              0.110708   0.153149    0.72
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw nD:()2
## yipt.pprncs -0.182                                                        
## numDeals    -0.259 -0.016                                                 
## fctr(prcp)2 -0.452 -0.058  0.133                                          
## fctr(prcp)3 -0.328 -0.094  0.088  0.655                                   
## fctr(prcp)4 -0.160 -0.238  0.059  0.270  0.219                            
## bRating     -0.960  0.115  0.245  0.251  0.183  0.113                     
## bNumReviews  0.132 -0.061  0.014 -0.199 -0.312 -0.006 -0.184              
## nmDls:fc()2  0.137 -0.011 -0.442 -0.277 -0.174 -0.074 -0.069 -0.010       
## nmDls:fc()3  0.099 -0.016 -0.253 -0.191 -0.245 -0.058 -0.042  0.003  0.675
## nmDls:fc()4  0.093 -0.007 -0.326 -0.119 -0.088 -0.218 -0.062 -0.003  0.425
## nmDls:bRtng  0.239  0.017 -0.958 -0.064 -0.034 -0.037 -0.247 -0.016  0.195
##             nD:()3 nD:()4
## yipt.pprncs              
## numDeals                 
## fctr(prcp)2              
## fctr(prcp)3              
## fctr(prcp)4              
## bRating                  
## bNumReviews              
## nmDls:fc()2              
## nmDls:fc()3              
## nmDls:fc()4  0.330       
## nmDls:bRtng  0.040  0.212

# Model for number of deals
summary(lmer(data = subset(periodDealRating, numDeals <= 2), numReviews ~ +(numDeals) + 
    yipit.appearances + bRating + bNumReviews + (1 | Phone)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: numReviews ~ +(numDeals) + yipit.appearances + bRating + bNumReviews +      (1 | Phone) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 9877 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 1.87     1.37    
##  Residual             2.21     1.49    
## Number of obs: 2589, groups: Phone, 173
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)       -0.189476   0.811270   -0.23
## numDeals           0.174294   0.079385    2.20
## yipit.appearances -0.008755   0.021597   -0.41
## bRating            0.224525   0.236861    0.95
## bNumReviews        0.011112   0.000841   13.22
## 
## Correlation of Fixed Effects:
##             (Intr) numDls ypt.pp bRatng
## numDeals    -0.023                     
## yipt.pprncs -0.236 -0.016              
## bRating     -0.977  0.012  0.121       
## bNumReviews  0.032 -0.007 -0.063 -0.141

This is a good baseline model. Note the interaction effects between numDeals and Price of the restaurant and numDeals and baseline rating

Model2: Add competition as a factor(neighborhood)


# Let's add neighborhood variables
summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone) + factor(restneighborhoods)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4261 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.113    0.336   
##  Residual             0.852    0.923   
## Number of obs: 1533, groups: Phone, 151
## 
## Fixed effects:
##                                                                           Estimate
## (Intercept)                                                               0.994810
## yipit.appearances                                                        -0.019866
## numDeals                                                                 -0.743955
## factor(pricepoint)2                                                      -0.336186
## factor(pricepoint)3                                                      -0.223962
## factor(pricepoint)4                                                       0.131833
## bRating                                                                   0.740444
## bNumReviews                                                               0.000799
## factor(restneighborhoods)Capitol Hill/Northeast                           0.082580
## factor(restneighborhoods)Capitol Hill/Southeast                           0.286942
## factor(restneighborhoods)Chevy Chase                                      0.143144
## factor(restneighborhoods)Cleveland Park                                   0.260625
## factor(restneighborhoods)Columbia Heights                                 0.061772
## factor(restneighborhoods)Dupont Circle                                   -0.058277
## factor(restneighborhoods)Foggy Bottom                                    -0.006993
## factor(restneighborhoods)Georgetown                                       0.040817
## factor(restneighborhoods)Glover Park                                     -0.240572
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast  0.120371
## factor(restneighborhoods)Lincoln Park                                     0.294341
## factor(restneighborhoods)Logan Circle                                     0.127744
## factor(restneighborhoods)Mount Pleasant                                  -0.063818
## factor(restneighborhoods)Park View                                        0.504486
## factor(restneighborhoods)Penn Quarter                                    -0.008578
## factor(restneighborhoods)Shaw                                             0.017048
## factor(restneighborhoods)Tenleytown                                       0.324642
## factor(restneighborhoods)U Street Corridor                                0.135683
## factor(restneighborhoods)Van Ness/Forest Hills                            0.248572
## factor(restneighborhoods)Woodley Park                                     0.098378
## numDeals:factor(pricepoint)2                                              0.334358
## numDeals:factor(pricepoint)3                                              0.689189
## numDeals:factor(pricepoint)4                                              0.147056
## numDeals:bRating                                                          0.081658
##                                                                          Std. Error
## (Intercept)                                                                0.395146
## yipit.appearances                                                          0.008152
## numDeals                                                                   0.575307
## factor(pricepoint)2                                                        0.122963
## factor(pricepoint)3                                                        0.148352
## factor(pricepoint)4                                                        0.393186
## bRating                                                                    0.101535
## bNumReviews                                                                0.000294
## factor(restneighborhoods)Capitol Hill/Northeast                            0.349102
## factor(restneighborhoods)Capitol Hill/Southeast                            0.208556
## factor(restneighborhoods)Chevy Chase                                       0.230459
## factor(restneighborhoods)Cleveland Park                                    0.446505
## factor(restneighborhoods)Columbia Heights                                  0.278673
## factor(restneighborhoods)Dupont Circle                                     0.162642
## factor(restneighborhoods)Foggy Bottom                                      0.154332
## factor(restneighborhoods)Georgetown                                        0.183997
## factor(restneighborhoods)Glover Park                                       0.532294
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast   0.216449
## factor(restneighborhoods)Lincoln Park                                      0.598233
## factor(restneighborhoods)Logan Circle                                      0.319339
## factor(restneighborhoods)Mount Pleasant                                    0.471066
## factor(restneighborhoods)Park View                                         0.395421
## factor(restneighborhoods)Penn Quarter                                      0.150577
## factor(restneighborhoods)Shaw                                              0.378374
## factor(restneighborhoods)Tenleytown                                        0.248162
## factor(restneighborhoods)U Street Corridor                                 0.193107
## factor(restneighborhoods)Van Ness/Forest Hills                             0.362898
## factor(restneighborhoods)Woodley Park                                      0.238282
## numDeals:factor(pricepoint)2                                               0.205128
## numDeals:factor(pricepoint)3                                               0.235840
## numDeals:factor(pricepoint)4                                               0.383884
## numDeals:bRating                                                           0.156131
##                                                                          t value
## (Intercept)                                                                 2.52
## yipit.appearances                                                          -2.44
## numDeals                                                                   -1.29
## factor(pricepoint)2                                                        -2.73
## factor(pricepoint)3                                                        -1.51
## factor(pricepoint)4                                                         0.34
## bRating                                                                     7.29
## bNumReviews                                                                 2.72
## factor(restneighborhoods)Capitol Hill/Northeast                             0.24
## factor(restneighborhoods)Capitol Hill/Southeast                             1.38
## factor(restneighborhoods)Chevy Chase                                        0.62
## factor(restneighborhoods)Cleveland Park                                     0.58
## factor(restneighborhoods)Columbia Heights                                   0.22
## factor(restneighborhoods)Dupont Circle                                     -0.36
## factor(restneighborhoods)Foggy Bottom                                      -0.05
## factor(restneighborhoods)Georgetown                                         0.22
## factor(restneighborhoods)Glover Park                                       -0.45
## factor(restneighborhoods)H Street Corridor/Atlas District/Near Northeast    0.56
## factor(restneighborhoods)Lincoln Park                                       0.49
## factor(restneighborhoods)Logan Circle                                       0.40
## factor(restneighborhoods)Mount Pleasant                                    -0.14
## factor(restneighborhoods)Park View                                          1.28
## factor(restneighborhoods)Penn Quarter                                      -0.06
## factor(restneighborhoods)Shaw                                               0.05
## factor(restneighborhoods)Tenleytown                                         1.31
## factor(restneighborhoods)U Street Corridor                                  0.70
## factor(restneighborhoods)Van Ness/Forest Hills                              0.68
## factor(restneighborhoods)Woodley Park                                       0.41
## numDeals:factor(pricepoint)2                                                1.63
## numDeals:factor(pricepoint)3                                                2.92
## numDeals:factor(pricepoint)4                                                0.38
## numDeals:bRating                                                            0.52
## 
## Correlation matrix not shown by default, as p = 32 > 20.
## Use print(x, correlation=TRUE)  or
##     vcov(x)   if you need it

Does not look too helpful. No significant terms. What if we limit the data to the big neighborhoods?

# big neigbordhoods
summary(lmer(data = subset(periodDealRating, numDeals <= 2 & restneighborhoods %in% 
    c("Adams Morgan", "Dupont Circle", "Foggy Bottom", "Georgetown", "Penn Quarter")), 
    meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) + numDeals * 
        bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + factor(restneighborhoods) 
##    Data: subset(periodDealRating, numDeals <= 2 & restneighborhoods %in%      c("Adams Morgan", "Dupont Circle", "Foggy Bottom", "Georgetown",          "Penn Quarter")) 
## 
## REML criterion at convergence: 2744 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.100    0.316   
##  Residual             0.854    0.924   
## Number of obs: 984, groups: Phone, 95
## 
## Fixed effects:
##                                         Estimate Std. Error t value
## (Intercept)                             1.010432   0.459823    2.20
## yipit.appearances                      -0.007702   0.009781   -0.79
## numDeals                               -0.317202   0.745289   -0.43
## factor(pricepoint)2                    -0.402797   0.139644   -2.88
## factor(pricepoint)3                    -0.230625   0.160660   -1.44
## factor(pricepoint)4                    -0.619083   0.517688   -1.20
## bRating                                 0.745109   0.119065    6.26
## bNumReviews                             0.000854   0.000337    2.54
## factor(restneighborhoods)Dupont Circle -0.040756   0.159402   -0.26
## factor(restneighborhoods)Foggy Bottom  -0.059413   0.151317   -0.39
## factor(restneighborhoods)Georgetown     0.017384   0.178911    0.10
## factor(restneighborhoods)Penn Quarter  -0.063607   0.147740   -0.43
## numDeals:factor(pricepoint)2            0.314164   0.260438    1.21
## numDeals:factor(pricepoint)3            0.785105   0.275405    2.85
## numDeals:factor(pricepoint)4            0.399445   0.544052    0.73
## numDeals:bRating                       -0.088142   0.200154   -0.44
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw fc()DC
## yipt.pprncs -0.102                                                        
## numDeals    -0.248 -0.022                                                 
## fctr(prcp)2 -0.451 -0.026  0.128                                          
## fctr(prcp)3 -0.232 -0.058  0.073  0.658                                   
## fctr(prcp)4 -0.209 -0.443  0.083  0.299  0.212                            
## bRating     -0.939  0.052  0.250  0.258  0.100  0.189                     
## bNumReviews  0.149 -0.003  0.003 -0.190 -0.327 -0.054 -0.200              
## fctr(rst)DC -0.323 -0.019  0.021 -0.071 -0.050 -0.117  0.189 -0.050       
## fctr(rst)FB -0.311 -0.108  0.012  0.151 -0.036  0.102  0.107  0.047  0.538
## fctr(rstn)G -0.166 -0.056 -0.080 -0.001 -0.040  0.025  0.019 -0.026  0.458
## fctr(rst)PQ -0.275 -0.140  0.002  0.066 -0.083  0.088  0.107 -0.175  0.572
## nmDls:fc()2  0.125  0.006 -0.534 -0.256 -0.171 -0.070 -0.065 -0.004 -0.021
## nmDls:fc()3  0.086 -0.007 -0.280 -0.199 -0.236 -0.052 -0.023  0.003 -0.024
## nmDls:fc()4  0.128  0.013 -0.501 -0.122 -0.087 -0.217 -0.107 -0.001 -0.016
## nmDls:bRtng  0.232  0.019 -0.951 -0.058 -0.014 -0.065 -0.258 -0.004 -0.013
##             fc()FB fct()G fc()PQ nD:()2 nD:()3 nD:()4
## yipt.pprncs                                          
## numDeals                                             
## fctr(prcp)2                                          
## fctr(prcp)3                                          
## fctr(prcp)4                                          
## bRating                                              
## bNumReviews                                          
## fctr(rst)DC                                          
## fctr(rst)FB                                          
## fctr(rstn)G  0.492                                   
## fctr(rst)PQ  0.632  0.514                            
## nmDls:fc()2 -0.029  0.048 -0.023                     
## nmDls:fc()3 -0.038  0.032 -0.030  0.740              
## nmDls:fc()4 -0.018  0.044 -0.011  0.479  0.361       
## nmDls:bRtng  0.002  0.072  0.009  0.277  0.024  0.390

No. Nothing significant on the neighborhoods.

Model2: Add competition as the number of Yipit restaurants in neighborhood.

summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone) + (yipitNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + (yipitNeighborhoodCount) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4257 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.0921   0.304   
##  Residual             0.8524   0.923   
## Number of obs: 1533, groups: Phone, 151
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                   1.047509   0.332924    3.15
## yipit.appearances            -0.016961   0.007218   -2.35
## numDeals                     -0.750246   0.567913   -1.32
## factor(pricepoint)2          -0.323581   0.110118   -2.94
## factor(pricepoint)3          -0.201480   0.135702   -1.48
## factor(pricepoint)4           0.001166   0.319539    0.00
## bRating                       0.771336   0.085550    9.02
## bNumReviews                   0.000741   0.000260    2.85
## yipitNeighborhoodCount       -0.007235   0.003939   -1.84
## numDeals:factor(pricepoint)2  0.329384   0.203417    1.62
## numDeals:factor(pricepoint)3  0.691606   0.234628    2.95
## numDeals:factor(pricepoint)4  0.147422   0.382831    0.39
## numDeals:bRating              0.083008   0.154441    0.54
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw yptNgC
## yipt.pprncs -0.157                                                        
## numDeals    -0.259 -0.021                                                 
## fctr(prcp)2 -0.482 -0.035  0.141                                          
## fctr(prcp)3 -0.289 -0.050  0.092  0.659                                   
## fctr(prcp)4 -0.201 -0.235  0.068  0.310  0.228                            
## bRating     -0.932  0.093  0.250  0.234  0.137  0.123                     
## bNumReviews  0.174 -0.045  0.009 -0.203 -0.272 -0.022 -0.207              
## yptNghbrhdC -0.290 -0.032  0.008  0.172 -0.081  0.114  0.093 -0.150       
## nmDls:fc()2  0.143 -0.009 -0.435 -0.287 -0.186 -0.089 -0.063 -0.008 -0.035
## nmDls:fc()3  0.109 -0.018 -0.267 -0.213 -0.254 -0.073 -0.037  0.008 -0.039
## nmDls:fc()4  0.103 -0.005 -0.341 -0.139 -0.103 -0.225 -0.062  0.000 -0.023
## nmDls:bRtng  0.231  0.023 -0.945 -0.056 -0.026 -0.038 -0.253 -0.013  0.008
##             nD:()2 nD:()3 nD:()4
## yipt.pprncs                     
## numDeals                        
## fctr(prcp)2                     
## fctr(prcp)3                     
## fctr(prcp)4                     
## bRating                         
## bNumReviews                     
## yptNghbrhdC                     
## nmDls:fc()2                     
## nmDls:fc()3  0.720              
## nmDls:fc()4  0.469  0.384       
## nmDls:bRtng  0.147  0.010  0.194

summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone) + numDeals * (yipitNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + numDeals *      (yipitNeighborhoodCount) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4259 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.0962   0.310   
##  Residual             0.8477   0.921   
## Number of obs: 1533, groups: Phone, 151
## 
## Fixed effects:
##                                  Estimate Std. Error t value
## (Intercept)                      0.974261   0.336954    2.89
## yipit.appearances               -0.016752   0.007289   -2.30
## numDeals                        -0.303012   0.595549   -0.51
## factor(pricepoint)2             -0.313174   0.111135   -2.82
## factor(pricepoint)3             -0.213396   0.137043   -1.56
## factor(pricepoint)4              0.025281   0.323016    0.08
## bRating                          0.778514   0.086329    9.02
## bNumReviews                      0.000732   0.000263    2.78
## yipitNeighborhoodCount          -0.004729   0.004114   -1.15
## numDeals:factor(pricepoint)2     0.319371   0.203056    1.57
## numDeals:factor(pricepoint)3     0.863292   0.244463    3.53
## numDeals:factor(pricepoint)4     0.047913   0.384060    0.12
## numDeals:bRating                 0.033816   0.155418    0.22
## numDeals:yipitNeighborhoodCount -0.018130   0.007446   -2.43
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw yptNgC
## yipt.pprncs -0.157                                                        
## numDeals    -0.270 -0.016                                                 
## fctr(prcp)2 -0.483 -0.035  0.144                                          
## fctr(prcp)3 -0.284 -0.050  0.075  0.656                                   
## fctr(prcp)4 -0.202 -0.235  0.073  0.310  0.226                            
## bRating     -0.931  0.093  0.245  0.234  0.136  0.124                     
## bNumReviews  0.174 -0.045  0.003 -0.204 -0.272 -0.023 -0.207              
## yptNghbrhdC -0.301 -0.028  0.085  0.176 -0.087  0.117  0.097 -0.149       
## nmDls:fc()2  0.143 -0.009 -0.421 -0.285 -0.183 -0.089 -0.063 -0.008 -0.039
## nmDls:fc()3  0.078 -0.014 -0.155 -0.191 -0.251 -0.061 -0.026  0.002  0.037
## nmDls:fc()4  0.111 -0.006 -0.356 -0.141 -0.097 -0.224 -0.064  0.002 -0.049
## nmDls:bRtng  0.237  0.021 -0.932 -0.060 -0.021 -0.041 -0.252 -0.010 -0.025
## nmDls:yptNC  0.086 -0.012 -0.307 -0.038  0.037 -0.029 -0.030  0.017 -0.254
##             nD:()2 nD:()3 nD:()4 nmDl:R
## yipt.pprncs                            
## numDeals                               
## fctr(prcp)2                            
## fctr(prcp)3                            
## fctr(prcp)4                            
## bRating                                
## bNumReviews                            
## yptNghbrhdC                            
## nmDls:fc()2                            
## nmDls:fc()3  0.683                     
## nmDls:fc()4  0.468  0.335              
## nmDls:bRtng  0.148 -0.027  0.205       
## nmDls:yptNC  0.022 -0.288  0.107  0.128

Model2: Add competition as the number of Yelp restaurants in neighborhood.

summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone) + (yelpNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + (yelpNeighborhoodCount) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4262 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.092    0.303   
##  Residual             0.854    0.924   
## Number of obs: 1533, groups: Phone, 151
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                   0.976830   0.329742    2.96
## yipit.appearances            -0.018067   0.007234   -2.50
## numDeals                     -0.725251   0.568410   -1.28
## factor(pricepoint)2          -0.280568   0.108669   -2.58
## factor(pricepoint)3          -0.230046   0.135414   -1.70
## factor(pricepoint)4           0.110757   0.319268    0.35
## bRating                       0.775029   0.085614    9.05
## bNumReviews                   0.000674   0.000257    2.62
## yelpNeighborhoodCount        -0.001042   0.000832   -1.25
## numDeals:factor(pricepoint)2  0.312644   0.203437    1.54
## numDeals:factor(pricepoint)3  0.671711   0.234608    2.86
## numDeals:factor(pricepoint)4  0.128673   0.382986    0.34
## numDeals:bRating              0.081387   0.154563    0.53
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw ylpNgC
## yipt.pprncs -0.187                                                        
## numDeals    -0.253 -0.023                                                 
## fctr(prcp)2 -0.427 -0.035  0.143                                          
## fctr(prcp)3 -0.328 -0.049  0.091  0.680                                   
## fctr(prcp)4 -0.142 -0.240  0.069  0.301  0.232                            
## bRating     -0.939  0.103  0.247  0.214  0.150  0.102                     
## bNumReviews  0.137 -0.051  0.011 -0.181 -0.289 -0.004 -0.196              
## ylpNghbrhdC -0.257  0.076 -0.024 -0.060  0.050 -0.107  0.101 -0.014       
## nmDls:fc()2  0.131 -0.009 -0.435 -0.286 -0.189 -0.087 -0.059 -0.014  0.014
## nmDls:fc()3  0.096 -0.019 -0.267 -0.210 -0.257 -0.070 -0.032  0.002  0.010
## nmDls:fc()4  0.096 -0.005 -0.341 -0.138 -0.105 -0.224 -0.060 -0.003  0.005
## nmDls:bRtng  0.230  0.024 -0.946 -0.060 -0.024 -0.041 -0.251 -0.012  0.020
##             nD:()2 nD:()3 nD:()4
## yipt.pprncs                     
## numDeals                        
## fctr(prcp)2                     
## fctr(prcp)3                     
## fctr(prcp)4                     
## bRating                         
## bNumReviews                     
## ylpNghbrhdC                     
## nmDls:fc()2                     
## nmDls:fc()3  0.720              
## nmDls:fc()4  0.468  0.383       
## nmDls:bRtng  0.147  0.011  0.194

summary(lmer(data = subset(periodDealRating, numDeals <= 2), meanRating ~ yipit.appearances + 
    (numDeals) * factor(pricepoint) + numDeals * bRating + bNumReviews + (1 | 
    Phone) + numDeals * (yelpNeighborhoodCount)))
## Linear mixed model fit by REML ['lmerMod']
## Formula: meanRating ~ yipit.appearances + (numDeals) * factor(pricepoint) +      numDeals * bRating + bNumReviews + (1 | Phone) + numDeals *      (yelpNeighborhoodCount) 
##    Data: subset(periodDealRating, numDeals <= 2) 
## 
## REML criterion at convergence: 4273 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Phone    (Intercept) 0.092    0.303   
##  Residual             0.854    0.924   
## Number of obs: 1533, groups: Phone, 151
## 
## Fixed effects:
##                                 Estimate Std. Error t value
## (Intercept)                     0.965461   0.330512    2.92
## yipit.appearances              -0.018105   0.007236   -2.50
## numDeals                       -0.623178   0.600674   -1.04
## factor(pricepoint)2            -0.281853   0.108716   -2.59
## factor(pricepoint)3            -0.228674   0.135464   -1.69
## factor(pricepoint)4             0.108250   0.319361    0.34
## bRating                         0.776448   0.085673    9.06
## bNumReviews                     0.000671   0.000257    2.61
## yelpNeighborhoodCount          -0.000935   0.000856   -1.09
## numDeals:factor(pricepoint)2    0.319299   0.203878    1.57
## numDeals:factor(pricepoint)3    0.660877   0.235564    2.81
## numDeals:factor(pricepoint)4    0.152065   0.385645    0.39
## numDeals:bRating                0.067150   0.156947    0.43
## numDeals:yelpNeighborhoodCount -0.000774   0.001470   -0.53
## 
## Correlation of Fixed Effects:
##             (Intr) ypt.pp numDls fct()2 fct()3 fct()4 bRatng bNmRvw ylpNgC
## yipt.pprncs -0.186                                                        
## numDeals    -0.260 -0.025                                                 
## fctr(prcp)2 -0.424 -0.034  0.128                                          
## fctr(prcp)3 -0.329 -0.049  0.093  0.679                                   
## fctr(prcp)4 -0.141 -0.239  0.061  0.301  0.232                            
## bRating     -0.939  0.103  0.244  0.213  0.150  0.101                     
## bNumReviews  0.137 -0.051  0.005 -0.180 -0.289 -0.003 -0.197              
## ylpNghbrhdC -0.265  0.071  0.054 -0.064  0.053 -0.108  0.105 -0.017       
## nmDls:fc()2  0.127 -0.009 -0.391 -0.287 -0.187 -0.088 -0.057 -0.015  0.028
## nmDls:fc()3  0.101 -0.018 -0.280 -0.207 -0.258 -0.068 -0.035  0.003 -0.011
## nmDls:fc()4  0.088 -0.007 -0.284 -0.139 -0.102 -0.224 -0.055 -0.005  0.033
## nmDls:bRtng  0.237  0.026 -0.937 -0.055 -0.027 -0.038 -0.253 -0.009 -0.021
## nmDls:ylpNC  0.065  0.010 -0.323  0.022 -0.019  0.015 -0.032  0.016 -0.236
##             nD:()2 nD:()3 nD:()4 nmDl:R
## yipt.pprncs                            
## numDeals                               
## fctr(prcp)2                            
## fctr(prcp)3                            
## fctr(prcp)4                            
## bRating                                
## bNumReviews                            
## ylpNghbrhdC                            
## nmDls:fc()2                            
## nmDls:fc()3  0.710                     
## nmDls:fc()4  0.471  0.369              
## nmDls:bRtng  0.134  0.025  0.170       
## nmDls:ylpNC -0.062  0.087 -0.115  0.172