Lab 2 - Josh Goldhaber

## # A tibble: 6 × 17
##    ...1 index date        week weekday area   count rabate price operator driver
##   <dbl> <dbl> <date>     <dbl>   <dbl> <chr>  <dbl> <lgl>  <dbl> <chr>    <chr> 
## 1     1     1 2014-03-01     9       6 Camden     5 TRUE    65.7 Rhonda   Taylor
## 2     2     2 2014-03-01     9       6 Westm…     2 FALSE   27.0 Rhonda   Butch…
## 3     3     3 2014-03-01     9       6 Westm…     3 FALSE   41.0 Allanah  Butch…
## 4     4     4 2014-03-01     9       6 Brent      2 FALSE   26.0 Allanah  Taylor
## 5     5     5 2014-03-01     9       6 Brent      5 TRUE    57.6 Rhonda   Carter
## 6     6     6 2014-03-01     9       6 Camden     1 FALSE   14.0 Allanah  Taylor
## # ℹ 6 more variables: delivery_min <dbl>, temperature <dbl>,
## #   wine_ordered <dbl>, wine_delivered <dbl>, wrongpizza <lgl>, quality <chr>

##  [1] "...1"           "index"          "date"           "week"          
##  [5] "weekday"        "area"           "count"          "rabate"        
##  [9] "price"          "operator"       "driver"         "delivery_min"  
## [13] "temperature"    "wine_ordered"   "wine_delivered" "wrongpizza"    
## [17] "quality"

Hypothesis:

price = 5 + 20wine_ordered + .5delivery_min +15count - 10rabate + error Wine_ordered: $20 per bottle of wine, Delivery_min: add half the delivery mins, Pizza: +15 per pizza, +$10 if rabate is true + error.

Interpretation: There are no values with concerning levels of correlation, and certainly no perfect colinearity. - high corr for some

## 
## Call:
## lm(formula = price ~ wine_ordered, data = pizza)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.570 -14.007  -1.017  10.359  63.041 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   43.9870     0.5859   75.08   <2e-16 ***
## wine_ordered  30.3535     1.4822   20.48   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.62 on 1195 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.2598, Adjusted R-squared:  0.2592 
## F-statistic: 419.4 on 1 and 1195 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = price ~ wine_ordered + count, data = pizza)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.4712  -3.1707   0.2092   3.5142  20.0108 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.6008     0.4086   11.26   <2e-16 ***
## wine_ordered  31.4329     0.4545   69.16   <2e-16 ***
## count         11.3885     0.1061  107.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.708 on 1194 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.9305, Adjusted R-squared:  0.9304 
## F-statistic:  7991 on 2 and 1194 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = price ~ wine_ordered + count + delivery_min, data = pizza)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.0633  -3.2107   0.2249   3.4361  20.3481 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.37159    0.55027   6.127 1.21e-09 ***
## wine_ordered 31.31640    0.45399  68.980  < 2e-16 ***
## count        11.37502    0.10574 107.575  < 2e-16 ***
## delivery_min  0.05043    0.01520   3.318 0.000935 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.684 on 1193 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.9311, Adjusted R-squared:  0.9309 
## F-statistic:  5376 on 3 and 1193 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = price ~ wine_ordered + count + delivery_min + rabate, 
##     data = pizza)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -21.6048  -2.8197   0.0533   3.0642  17.9538 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.37691    0.51734   2.662  0.00788 ** 
## wine_ordered 31.63795    0.41417  76.388  < 2e-16 ***
## count        13.06341    0.14464  90.317  < 2e-16 ***
## delivery_min  0.03615    0.01388   2.605  0.00931 ** 
## rabateTRUE   -7.03613    0.44956 -15.651  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.179 on 1192 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.9429, Adjusted R-squared:  0.9427 
## F-statistic:  4917 on 4 and 1192 DF,  p-value: < 2.2e-16

## R² values:

## Model 1: 0.2597706

## Model 2: 0.9304851

## Model 3: 0.9311206

## Model 4: 0.9428623

## 
## Regression Results
## =================================================================================================================================
##                                                                  Dependent variable:                                             
##                     -------------------------------------------------------------------------------------------------------------
##                                                                         Price                                                    
##                             Wine Only                Wine + Pizza                Add Delivery                 Full Model         
##                                (1)                        (2)                         (3)                         (4)            
## ---------------------------------------------------------------------------------------------------------------------------------
## wine_ordered                30.354***                  31.433***                   31.316***                   31.638***         
##                              (1.482)                    (0.455)                     (0.454)                     (0.414)          
## count                                                  11.388***                   11.375***                   13.063***         
##                                                         (0.106)                     (0.106)                     (0.145)          
## delivery_min                                                                       0.050***                    0.036***          
##                                                                                     (0.015)                     (0.014)          
## rabate                                                                                                         -7.036***         
##                                                                                                                 (0.450)          
## Constant                    43.987***                  4.601***                    3.372***                    1.377***          
##                              (0.586)                    (0.409)                     (0.550)                     (0.517)          
## ---------------------------------------------------------------------------------------------------------------------------------
## Observations                  1,197                      1,197                       1,197                       1,197           
## R2                            0.260                      0.930                       0.931                       0.943           
## Adjusted R2                   0.259                      0.930                       0.931                       0.943           
## Residual Std. Error    18.619 (df = 1195)          5.708 (df = 1194)           5.684 (df = 1193)           5.179 (df = 1192)     
## F Statistic         419.364*** (df = 1; 1195) 7,991.087*** (df = 2; 1194) 5,375.712*** (df = 3; 1193) 4,917.472*** (df = 4; 1192)
## =================================================================================================================================
## Note:                                                                                                 *p<0.1; **p<0.05; ***p<0.01

4a: full model: price = wine_ordered + count + delivery_min + rabate, data

Each additional bottle of wine increases the bill by about $31.64, while each additional pizza adds about $13.06.

Delivery time has only a small effect, with each extra minute adding roughly 3.6 cents.

If a rebate is applied, the total bill decreases by about $7.04.

The intercept of about $1.38 is just a baseline prediction when no items are ordered and has little practical meaning.

4b: The model that fit best is the full model, this can be concluded as it has the highest r^2 of .94. This conclusion can also be made by considering statisticial signifigance, (p value), which is also lowest for the full model.

4c:

4d: The residual plot shows heteroskedasticity, as residuals increase as the values increase. You can tell by the cone shape of the data.

Gauss-Markov assumptions

Linearity in parameters: The model is linear in its coefficients, so this assumption is satisfied.
Random sampling and independent observations: Because I don’t know how the data were gathered, I can’t be sure if this holds. Orders from the same store, customer, or time period could be correlated.
No perfect multicollinearity: My correlation plot showed no extreme relationships, and the model ran without errors, so this assumption appears to hold.
Zero conditional mean (exogeneity): The QQ plot of residuals is roughly linear through the middle, indicating approximate normality. However, slight deviations in the tails suggest that a few extreme values may cause mild non-normality, though the assumption is largely reasonable for inference.
Homoskedasticity: The residual plot shows residuals spread out slightly more for higher fitted values, suggesting mild heteroskedasticity. The model still works, but standard errors might be less reliable unless robust ones are used.

The model performs well overall. It explains about 94% of the variation in price, and the coefficients all make sense. However, because some assumptions like exogeneity and homoskedasticity may not fully hold, I would trust the model mainly for describing relationships and predicting prices, not for making strong claims.

Lab 2 - Josh Goldhaber

2025-10-01