Restaurants and Regression

Load the data. I have embedded it in the code and hidden it. When compiled, the document has data in it.

> summary(Restaurants)

     Location      Food.           Décor          Service     
 City    :50   Min.   :16.00   Min.   : 8.00   Min.   :14.00  
 Suburban:50   1st Qu.:20.00   1st Qu.:16.00   1st Qu.:18.00  
               Median :21.00   Median :19.00   Median :20.00  
               Mean   :21.51   Mean   :18.84   Mean   :20.15  
               3rd Qu.:23.25   3rd Qu.:21.00   3rd Qu.:22.00  
               Max.   :26.00   Max.   :27.00   Max.   :26.00  
 Summated.Rating Coded.Location      Cost      
 Min.   :46.0    Min.   :0.0    Min.   :22.00  
 1st Qu.:55.0    1st Qu.:0.0    1st Qu.:38.75  
 Median :61.0    Median :0.5    Median :46.50  
 Mean   :60.5    Mean   :0.5    Mean   :46.98  
 3rd Qu.:65.0    3rd Qu.:1.0    3rd Qu.:55.00  
 Max.   :76.0    Max.   :1.0    Max.   :85.00

Almost all of the variables are quantitative, save Location. Location has Coded.Location that is 0/1. In short, all of the data are quantitatively represented.
- Location is split half and half between City and Suburb.
- Restaurant Costs range from $22 to $85 with a mean and median between $46.50 and $47.
- It is not clear from the data what the scale for the various ratings are. They are integers and appear to range between 0 and 30 with the average and median around 20 for each.
- The summated rating has mean and median around 60 reflecting the previous; this is just the sum of Food, Decor, and Service.

Graphical Description

Let me generate two scatterplot matrices to examine these relationships. First, I will look at a generic scatterplot matrix. It is worth noting this occasionally fails when there are “character” variables in the data. If needed, remove them from the data. Here, that could be Location. The way to do that is to use the select comand with negation, e.g. select everything not Location… select(-Location)

> library(car)

Loading required package: carData

> library(gplots)


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

> library(tidyverse)

-- Attaching packages ------------------- tidyverse 1.2.1 --

v ggplot2 3.1.0     v purrr   0.2.5
v tibble  1.4.2     v dplyr   0.7.8
v tidyr   0.8.2     v stringr 1.3.1
v readr   1.2.1     v forcats 0.3.0

-- Conflicts ---------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
x dplyr::recode() masks car::recode()
x purrr::some()   masks car::some()

> Restaurants2 <- Restaurants %>% select(-Location)
> plot(Restaurants2)

Cost is on the y-axis throughout the last row of the scatterplot matrix. It decreases in suburbs, increases in Decor and Service and increases a bit less in Food. The summated rating is increasing as cost is increasing. Now to some basic summary information.

Next let me visualize the correlations among the variables. Note, this requires that I eliminate character variables.

> library(corrplot)

corrplot 0.84 loaded

> corrplot(cor(Restaurants2), method="circle")

Detailed Summary of Costs

Let’s first look at costs overall and then broken down between cities and suburbs.

> library(skimr)
> Restaurants %>% group_by(Location) %>% skim()

Skim summary statistics
 n obs: 100 
 n variables: 7 
 group variables: Location 

-- Variable type:integer -----------------------------------
 Location        variable missing complete  n  mean    sd p0   p25  p50
     City  Coded.Location       0       50 50  0     0     0  0     0  
     City            Cost       0       50 50 47.68 12.31 24 39.25 46.5
     City           Décor       0       50 50 18.84  3.47 11 16    19  
     City           Food.       0       50 50 21.36  2.35 16 20    21  
     City         Service       0       50 50 19.9   2.28 14 18    20  
     City Summated.Rating       0       50 50 60.1   6.81 46 54.25 61.5
 Suburban  Coded.Location       0       50 50  1     0     1  1     1  
 Suburban            Cost       0       50 50 46.28 11.5  22 38.25 46.5
 Suburban           Décor       0       50 50 18.84  4.18  8 16    18  
 Suburban           Food.       0       50 50 21.66  2.58 16 20    22  
 Suburban         Service       0       50 50 20.4   2.15 16 19    20  
 Suburban Summated.Rating       0       50 50 60.9   7.21 47 55    60.5
   p75 p100     hist
  0       0 <U+2581><U+2581><U+2581><U+2587><U+2581><U+2581><U+2581><U+2581>
 55      85 <U+2581><U+2587><U+2587><U+2587><U+2585><U+2582><U+2581><U+2581>
 21      26 <U+2581><U+2582><U+2587><U+2586><U+2587><U+2586><U+2582><U+2583>
 23      26 <U+2581><U+2582><U+2582><U+2587><U+2582><U+2583><U+2582><U+2582>
 21      26 <U+2581><U+2585><U+2585><U+2587><U+2587><U+2585><U+2581><U+2581>
 65      75 <U+2582><U+2586><U+2586><U+2583><U+2587><U+2586><U+2583><U+2582>
  1       1 <U+2581><U+2581><U+2581><U+2587><U+2581><U+2581><U+2581><U+2581>
 53      71 <U+2582><U+2582><U+2587><U+2583><U+2587><U+2583><U+2582><U+2582>
 21.75   27 <U+2581><U+2581><U+2585><U+2587><U+2586><U+2586><U+2583><U+2583>
 24      26 <U+2582><U+2582><U+2582><U+2587><U+2583><U+2583><U+2585><U+2583>
 22      24 <U+2583><U+2586><U+2585><U+2587><U+2585><U+2585><U+2586><U+2582>
 66.75   76 <U+2582><U+2586><U+2586><U+2587><U+2587><U+2585><U+2583><U+2583>

Let’s describe average costs and ask whether or not they differ by location. Restaurants are not paired; these are independent samples of city and suburban restaurants.

> with(Restaurants, (t.test(Cost,alternative='two.sided', mu=0.0, conf.level=.95)))


    One Sample t-test

data:  Cost
t = 39.58, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 44.6248 49.3352
sample estimates:
mean of x 
    46.98

> t.test(Cost~Location, alternative='two.sided', conf.level=.95, var.equal=FALSE, data=Restaurants)


    Welch Two Sample t-test

data:  Cost by Location
t = 0.58779, df = 97.547, p-value = 0.558
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.326909  6.126909
sample estimates:
    mean in group City mean in group Suburban 
                 47.68                  46.28

Overall, the mean of cost is $46.98 with a 95% confidence interval ranging from $44.62 to $49.34. Average cost shows little evidence of being systematically different between cities and suburbs with no other information taken into account. Cities could be $3.33 less or $6.12 more than suburbs with 95% confidence. We can utilize the plot of means to display this with options deploying the confidence intervals at 95%.

> plotmeans(Cost~Location, data=Restaurants, ci.label = T, ylim=c(30,60))

A First Cut: Summated Ratings

This section takes apart three questions.
1. Are summated ratings related to Cost? 2. Net of ratings, are cities and suburbs different? 3. Is the relationship between Summated Ratings and Cost the same or different in cities and suburbs.

Let me start with the scatterplot.

> with(Restaurants, plot(y=Cost,Summated.Rating))
> abline(lm(Cost~Summated.Rating, data=Restaurants))

And the regression to measure the least squares line.

> SRatings <- lm(Cost~Summated.Rating, data=Restaurants)
> summary(SRatings)


Call:
lm(formula = Cost ~ Summated.Rating, data = Restaurants)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.794  -5.232  -0.148   4.801  25.710 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -31.1862     6.7756  -4.603 1.25e-05 ***
Summated.Rating   1.2920     0.1113  11.612  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.74 on 98 degrees of freedom
Multiple R-squared:  0.5791,    Adjusted R-squared:  0.5748 
F-statistic: 134.8 on 1 and 98 DF,  p-value: < 2.2e-16

The estimated equation is:

\[ Cost = -31.1862 dollars plus 1.292 dollars per summated rating + error \]

If a restaurant had no rating, the meal would come with cash attached. For each rating point, costs increase by about $1.30. Overall, summated ratings explain 58% of the variation in costs. The summated ratings produce 50% of predictions to within $5.23 of the actual value and a residual standard error of $7.74 [in part driven by unexplained costs of $25.71 and -$16.80]. Let’s now look at the residuals.

Residual Analysis

> Restaurants<- within(Restaurants, {
+   residuals.SRatings <- residuals(SRatings) 
+ })

> qqPlot(Restaurants$residuals.SRatings)

[1] 47 36

> shapiro.test(Restaurants$residuals.SRatings)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.SRatings
W = 0.97124, p-value = 0.02753

The residuals are not normal. There are a few too many costs that are much higher than predicted [top right]. These points will require explanation. Let’s first see if that is simply a result of city/suburb differences.

> SRandLoc <- lm(Cost~Coded.Location+Summated.Rating, data=Restaurants)
> summary(SRandLoc)


Call:
lm(formula = Cost ~ Coded.Location + Summated.Rating, data = Restaurants)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.0603  -5.3309   0.2408   4.8460  24.5649 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -30.5759     6.7347  -4.540 1.62e-05 ***
Coded.Location   -2.4417     1.5386  -1.587    0.116    
Summated.Rating   1.3021     0.1106  11.774  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.68 on 97 degrees of freedom
Multiple R-squared:  0.5898,    Adjusted R-squared:  0.5813 
F-statistic: 69.73 on 2 and 97 DF,  p-value: < 2.2e-16

The estimated equation is:

\[ Cost = -30.58 dollars plus 1.302 dollars per summated rating - $2.44*Suburban + error \]

> Restaurants<- within(Restaurants, {
+   residuals.SRandLoc <- residuals(SRandLoc) 
+ })

> qqPlot(Restaurants$residuals.SRandLoc)

[1] 47 36

> shapiro.test(Restaurants$residuals.SRandLoc)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.SRandLoc
W = 0.97983, p-value = 0.1289

That seems to have fixed that problem. Now let me turn to a few residual diagnostics.

Residuals

First, in the top left, the residuals are plotted against the fitted values. The mean of the residuals is zero and if the model is randomly missing above and below the actual values, then these should be evenly spread left to right. They are; we are not systematically under or overpredicting at low or high values.
Second, in the top right, we see the normal quantile-quantile plot. This is the same plot we previously used to assess normality.
The bottom left plot shows scale and location; the x axis provides the fitted valus and the y-axis shows the square root of the standardized [z-scored] residuals. Like the plot above it, these should be independent of x with relatively even spread. These are.
The bottom right plot shows the standardized residuals [measured as standard deviation] against leverage - a measure of the impact of a point on the estimated slopes. The red line measures Cook’s distance, a quantification of how much the slope changes when that case is excluded. When multiple Cook’s distance lines are visible, there are influence points.

Everything seems to check out. So what does the relationship look like?

> # The following code plots four panels in a graphic and returns the graphics settings to a single panel.
> oldpar <- par(oma=c(0,0,3,0), mfrow=c(2,2))
> plot(SRandLoc)

> par(oldpar)

A Graph of Summated Ratings, Location, and Cost

> library(effects)
> plot(allEffects(SRandLoc, partial.residuals=TRUE), span=1)

The line represents the best guess and the shaded region is the 95% confidence interval for the average given a summated rating.

The confidence interval

> Confint(SRandLoc, level=0.95)

                  Estimate     2.5 %      97.5 %
(Intercept)     -30.575857 -43.94241 -17.2093020
Coded.Location   -2.441675  -5.49540   0.6120497
Summated.Rating   1.302094   1.08260   1.5215879

Each summated rating point is worth $1.08 to $1.52 in cost. A focus on the summated ratings assumes that the slope on Decor, Food, and Service are the same. Is this true?

Do all ratings matter the same?

> ThreeRatings <- lm(Cost~Décor+Food.+Service, data=Restaurants)
> summary(ThreeRatings)


Call:
lm(formula = Cost ~ Décor + Food. + Service, data = Restaurants)

Residuals:
    Min      1Q  Median      3Q     Max 
-13.945  -5.635  -0.910   5.082  22.812 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -20.70814    7.06629  -2.931 0.004228 ** 
Décor         1.81781    0.22536   8.066 2.06e-12 ***
Food.        -0.09981    0.39941  -0.250 0.803200    
Service       1.76613    0.49427   3.573 0.000554 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.173 on 96 degrees of freedom
Multiple R-squared:  0.6459,    Adjusted R-squared:  0.6348 
F-statistic: 58.37 on 3 and 96 DF,  p-value: < 2.2e-16

The slopes are not the same. It seems not. Let’s compare the two regressions.

> anova(SRatings, ThreeRatings)

Analysis of Variance Table

Model 1: Cost ~ Summated.Rating
Model 2: Cost ~ Décor + Food. + Service
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
1     98 5870.4                                 
2     96 4939.2  2    931.15 9.0491 0.000251 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F tests the claim that the models explain the same amount of variance. The alternative is that the separate ratings slopes are better; a very low p-value [probability of the null] suggests that the ratings have different impacts on cost.

Residuals

> Restaurants<- within(Restaurants, {
+   residuals.ThreeRatings <- residuals(ThreeRatings) 
+ })

Plots and a statistic

> qqPlot(Restaurants$residuals.ThreeRatings)

[1] 47 36

> shapiro.test(Restaurants$residuals.ThreeRatings)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.ThreeRatings
W = 0.97196, p-value = 0.03131

The residuals are not normal. Perhaps we need to again account for the suburbs.

Modifying the Model

> RegModel.5 <- lm(Cost~Coded.Location+Décor+Food.+Service, data=Restaurants)
> summary(RegModel.5)


Call:
lm(formula = Cost ~ Coded.Location + Décor + Food. + Service, 
    data = Restaurants)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1604  -5.7500  -0.5895   4.9411  21.7674 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -20.6592     7.0098  -2.947 0.004035 ** 
Coded.Location  -2.2964     1.4367  -1.598 0.113275    
Décor            1.7909     0.2242   7.988 3.19e-12 ***
Food.           -0.1145     0.3963  -0.289 0.773258    
Service          1.8616     0.4939   3.769 0.000285 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.115 on 95 degrees of freedom
Multiple R-squared:  0.6552,    Adjusted R-squared:  0.6406 
F-statistic: 45.12 on 4 and 95 DF,  p-value: < 2.2e-16

A Model Comparison

> anova(ThreeRatings, RegModel.5)

Analysis of Variance Table

Model 1: Cost ~ Décor + Food. + Service
Model 2: Cost ~ Coded.Location + Décor + Food. + Service
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     96 4939.2                           
2     95 4809.9  1    129.35 2.5549 0.1133

Residuals

> Restaurants<- within(Restaurants, {
+   residuals.RegModel.5 <- residuals(RegModel.5) 
+ })

> qqPlot(Restaurants$residuals.RegModel.5)

[1] 47 24

> shapiro.test(Restaurants$residuals.RegModel.5)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.RegModel.5
W = 0.98143, p-value = 0.1715

That works. Now I have a model. I can use prediction intervals to predict the mean or distribution given a set of ratings. I can also describe the confidence intervals. Let me do that here.

Confidence intervals and effects

> confint(RegModel.5)

                     2.5 %     97.5 %
(Intercept)    -34.5753968 -6.7429184
Coded.Location  -5.1486597  0.5557957
Décor            1.3457949  2.2359313
Food.           -0.9013008  0.6722778
Service          0.8809753  2.8421666

Decor and Service influence costs. Food appears not to matter. Suburbs are a bit less expensive, but there s much uncertainty and the 95 percent confidence interval includes 0. I should also mention that Decor and Service could have the same effect; there are common values of the cost per point.

> plot(allEffects(RegModel.5, partial.residuals=TRUE), span=1)

Do Decor and Service have the same effect?

I will create a new variable that is the sum of Decor and Service. I call it Ambience. Let me fit a regression of cost as a function of location, Food, and Ambience, and compare it to the regression with three ratings.

> Restaurants$Ambience <- with(Restaurants, Décor+ Service)
> AmbienceLocF <- lm(Cost~Ambience+Coded.Location+Food., data=Restaurants)
> summary(AmbienceLocF)


Call:
lm(formula = Cost ~ Ambience + Coded.Location + Food., data = Restaurants)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1923  -5.8603  -0.6167   4.9205  21.7746 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -20.42085    6.64832  -3.072  0.00277 ** 
Ambience         1.80884    0.15656  11.554  < 2e-16 ***
Coded.Location  -2.27670    1.41863  -1.605  0.11181    
Food.           -0.09241    0.34252  -0.270  0.78790    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.079 on 96 degrees of freedom
Multiple R-squared:  0.6551,    Adjusted R-squared:  0.6443 
F-statistic: 60.78 on 3 and 96 DF,  p-value: < 2.2e-16

> anova(AmbienceLocF, RegModel.5)

Analysis of Variance Table

Model 1: Cost ~ Ambience + Coded.Location + Food.
Model 2: Cost ~ Coded.Location + Décor + Food. + Service
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     96 4810.5                           
2     95 4809.9  1   0.64209 0.0127 0.9106

It would seem that Ambience is a useful simplification.

> Restaurants<- within(Restaurants, {
+   residuals.AmbienceLocF <- residuals(AmbienceLocF) 
+ })

> qqPlot(Restaurants$residuals.AmbienceLocF)

[1] 47 24

> shapiro.test(Restaurants$residuals.AmbienceLocF)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.AmbienceLocF
W = 0.98087, p-value = 0.1554

From here, it is important to point out a few things. We can use this model for prediction or for characterization but Food remains a part of the model but it does not have a nonzero slope. Can we eliminate it?

Eliminating Food

> Ambience.NoF <- lm(Cost ~ Location +Ambience, data=Restaurants)
> summary(Ambience.NoF)


Call:
lm(formula = Cost ~ Location + Ambience, data = Restaurants)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.4358  -5.8003  -0.5707   4.8410  22.0012 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -21.526      5.211  -4.131 7.67e-05 ***
LocationSuburban   -2.293      1.411  -1.626    0.107    
Ambience            1.786      0.132  13.530  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.045 on 97 degrees of freedom
Multiple R-squared:  0.6548,    Adjusted R-squared:  0.6477 
F-statistic: 92.02 on 2 and 97 DF,  p-value: < 2.2e-16

> anova(Ambience.NoF,AmbienceLocF)

Analysis of Variance Table

Model 1: Cost ~ Location + Ambience
Model 2: Cost ~ Ambience + Coded.Location + Food.
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 4814.1                           
2     96 4810.5  1    3.6472 0.0728 0.7879

> Restaurants<- within(Restaurants, {
+   residuals.Ambience.NoF <- residuals(Ambience.NoF) 
+ })

> qqPlot(Restaurants$residuals.Ambience.NoF)

[1] 47 24

> shapiro.test(Restaurants$residuals.Ambience.NoF)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.Ambience.NoF
W = 0.98263, p-value = 0.212

Does the effect of Ambience vary by location?

> LinearModel.12 <- lm(Cost ~ Location + Ambience:Location, data=Restaurants)
> summary(LinearModel.12)


Call:
lm(formula = Cost ~ Location + Ambience:Location, data = Restaurants)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.212  -5.797  -0.558   4.719  21.900 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -20.4745     7.3943  -2.769  0.00675 ** 
LocationSuburban           -4.3778    10.4445  -0.419  0.67604    
LocationCity:Ambience       1.7593     0.1891   9.303 4.68e-15 ***
LocationSuburban:Ambience   1.8127     0.1862   9.733 5.57e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.08 on 96 degrees of freedom
Multiple R-squared:  0.655, Adjusted R-squared:  0.6442 
F-statistic: 60.75 on 3 and 96 DF,  p-value: < 2.2e-16

> anova(Ambience.NoF, LinearModel.12)

Analysis of Variance Table

Model 1: Cost ~ Location + Ambience
Model 2: Cost ~ Location + Ambience:Location
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 4814.1                           
2     96 4812.1  1    2.0343 0.0406 0.8408

No.

We have a final model. Ambience and Location determine Cost.

It is important to note that we could remove Location, but if we do, the residuals will become concentrated in the lower tail because suburban restaurants are typically though not universally cheaper.

> RegModel.13 <- lm(Cost~Ambience, data=Restaurants)
> summary(RegModel.13)


Call:
lm(formula = Cost ~ Ambience, data = Restaurants)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.2088  -5.6033  -0.7277   5.3314  23.1077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -22.281      5.234  -4.257 4.75e-05 ***
Ambience       1.776      0.133  13.357  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.104 on 98 degrees of freedom
Multiple R-squared:  0.6454,    Adjusted R-squared:  0.6418 
F-statistic: 178.4 on 1 and 98 DF,  p-value: < 2.2e-16

> anova(Ambience.NoF, RegModel.13)

Analysis of Variance Table

Model 1: Cost ~ Location + Ambience
Model 2: Cost ~ Ambience
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 4814.1                           
2     98 4945.3 -1   -131.18 2.6432 0.1072

> Restaurants<- within(Restaurants, {
+   residuals.RegModel.13 <- residuals(RegModel.13) 
+ })
> qqPlot(Restaurants$residuals.RegModel.13)

[1] 47 36

and

> shapiro.test(Restaurants$residuals.RegModel.13)


    Shapiro-Wilk normality test

data:  Restaurants$residuals.RegModel.13
W = 0.97429, p-value = 0.0475

> plot(density(Restaurants$residuals.RegModel.13))

Our Final Model

It appears as though Decor and Service are the key drivers and it is not clear that they have different effects. Location does not clearly matter but a few high priced city restaurants skew the residuals when Location is not taken into account. A simple graphical depiction is shown below. With 95% confidence, the confidence intervals are:

> confint(Ambience.NoF)

                      2.5 %      97.5 %
(Intercept)      -31.868961 -11.1830439
LocationSuburban  -5.092707   0.5062849
Ambience           1.524363   2.0484818

> plot(allEffects(Ambience.NoF, partial.residuals=TRUE), span=0.95)

Restaurants: An Investigation

Robert W. Walker

Restaurants and Regression

Graphical Description

Detailed Summary of Costs

A First Cut: Summated Ratings

Residual Analysis

Residuals

A Graph of Summated Ratings, Location, and Cost

The confidence interval

Do all ratings matter the same?

Residuals

Plots and a statistic

Modifying the Model

A Model Comparison

Residuals

Confidence intervals and effects

Do Decor and Service have the same effect?

Eliminating Food

Does the effect of Ambience vary by location?

Our Final Model