library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
In November 2005, Michelin published its first ever guide to hotels and restaurants in New York City (Anonymous, 2005). According to the guide, inclusion in the guide is based on Michelin’s “meticulous and highly confidential evaluation process (in which) Michelin inspectors – American and European – conducted anonymous visits to New York City restaurants and hotels. … Inside the premier edition of the Michelin Guide New York City you’ll find a selection of restaurants by level of comfort; those with the best cuisine have been awarded our renowned Michelin stars. … From the best casual, neighborhood eateries to the city’s most impressive gourmet restaurants, the Michelin Guide New York City provides trusted advice for an unbeatable experience, every time.”
On the other hand, the Zagat Survey 2006: New York City Restaurants (Gathje and Diuguid, 2005) is purely based on views submitted by customers using mail-in or online surveys.
We shall restrict our comparison of the two restaurant guides to the 164 French restaurants that are included in the Zagat Survey 2006: New York City Restaurants. We want to be able to model p, the probability that a French restaurant is included in the 2006 Michelin Guide New York City, based on customer views from the Zagat Survey 2006: New York City Restaurants. We begin looking at the effect of x, customer ratings of food on p. Below table (head(MichelinFood)) classifies the 164 French restaurants included in the Zagat Survey 2006: New York City Restaurants according to whether they were included in the Michelin Guide New York City for each value of the food rating.
In the following, I’ll read Michelin’s data, plot the scatter points and finally produce a logistic graph.
Reading data:
MichelinFood <- read.table("./MichelinFood.txt", header=TRUE)
attach(MichelinFood)
head(MichelinFood)
## Food InMichelin NotInMichelin mi proportion
## 1 15 0 1 1 0.00
## 2 16 0 1 1 0.00
## 3 17 0 8 8 0.00
## 4 18 2 13 15 0.13
## 5 19 5 13 18 0.28
## 6 20 8 25 33 0.24
Scatter points.
plot(Food, proportion, ylab = "Sample proportion", xlab = "Zagat Food Rating")
Let x denote the Zagat food rating for a given restaurant and p(x) be the probability that this restaurant is included in Michelin guide. Then, our logistic regression model for the response, p(x) based on the predictor variable is given by p(x) = 1 / (1 + e^-(B0 + B1.x)).
The following output is the R model of the plot, using the well known glm.
m1 <- glm(cbind(InMichelin, NotInMichelin) ~ Food, family = binomial)
summary(m1)
##
## Call:
## glm(formula = cbind(InMichelin, NotInMichelin) ~ Food, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4850 -0.7987 -0.1679 0.5913 1.5889
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.84154 1.86236 -5.821 5.84e-09 ***
## Food 0.50124 0.08768 5.717 1.08e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 61.427 on 13 degrees of freedom
## Residual deviance: 11.368 on 12 degrees of freedom
## AIC: 41.491
##
## Number of Fisher Scoring iterations: 4
And this (below) is the logistic function.
x <- seq(15, 28, 0.05)
y <- 1/(1+exp(-1 * (m1$coeff[1] + m1$coeff[2] * x)))
plot(Food, proportion, ylab = "Probability of inclusion in the Michelin Guide", xlab = "Zagat Food Rating")
lines(x, y)