library(s20x)
McDonalds.df = read.table("McDonalds.txt", header = TRUE)
plot(Energy ~ Price, data = McDonalds.df)
McDonalds.fit = lm(Energy ~ Price, data = McDonalds.df)
plot(McDonalds.fit,which=1)
normcheck(McDonalds.fit)
cooks20x(McDonalds.fit)
summary(McDonalds.fit)
Call:
lm(formula = Energy ~ Price, data = McDonalds.df)
Residuals:
Min 1Q Median 3Q Max
-801.23 -294.28 60.52 191.58 865.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 168.11 154.58 1.088 0.289
Price 284.37 35.67 7.973 6.24e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 402.5 on 22 degrees of freedom
Multiple R-squared: 0.7429, Adjusted R-squared: 0.7312
F-statistic: 63.57 on 1 and 22 DF, p-value: 6.245e-08
confint(McDonalds.fit)
2.5 % 97.5 %
(Intercept) -152.4617 488.6825
Price 210.4043 358.3437
McDonalds.fit2<- lm(Energy ~ Price, data = McDonalds.df[-5,])
summary(McDonalds.fit2)
Call:
lm(formula = Energy ~ Price, data = McDonalds.df[-5, ])
Residuals:
Min 1Q Median 3Q Max
-664.34 -252.76 -23.76 230.66 943.97
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 271.99 151.53 1.795 0.0871 .
Price 245.54 37.79 6.497 1.95e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 373.6 on 21 degrees of freedom
Multiple R-squared: 0.6678, Adjusted R-squared: 0.652
F-statistic: 42.21 on 1 and 21 DF, p-value: 1.946e-06
plot(McDonalds.fit2, which = 1)
normcheck(McDonalds.fit2)
pred.df = data.frame(Price = c(5.9))
predict(McDonalds.fit2, pred.df, interval = "prediction")
fit lwr upr
1 1720.677 903.7747 2537.579
plot(Energy ~ Price, main = "Energy versus Price",data = McDonalds.df)
abline(McDonalds.fit2)
Method and Assumptions:
A scatter plot of energy in the form of caloric content vs product price showed a linear association with approximately constant scatter and so a linear model was fitted.
Observation 5 showed up of concern in the Cook’s distance plot and it does in fact change both coefficients by more than 1 standard error, hence it was discarded.(For β1 the coefficient changes from 284.37 to 245.54; a drop of 38.83 compared to a standard error of 35.67.
The equality of variance, normality and linearity assumptions appear to be satisfied in the new fitted data.
The final model is Energy = Caloriei = β0 + β1 x Pricei + Ԑi where Ԑi ~(iid) N(0, σ2).
Executive Summary:
A professional teaching fellow from the Engineering department wanted to predict the energy content of McDonald’s food items using the price of the item. It was determined that the higher the price of the product, the larger the energy content.
The estimation is that for each additional dollar spent on McDonald’s, a gain of between 170 and 320 kilojoules will result.
The model explains 67% of the variation in energy content in McDonalds food, and therefore should be a moderately effective model for prediction. It predicts that a food item worth $5.90 would contain between 900 and 2540 kilojoules.