To run this notebook, please first download the data
A1.RData from Canvas. Also, make sure you have loaded the
MSR package into your work space.
Question 1
Question 1a
For Question 1a, you can either report the histograms or association between Sales and the IVs. Here, I will show both.
Price vs. Sales
## [1] -0.8336852
Feature vs. Sales
As Feature is a discrete variable, we cannot use simple correlation. Instead, you can run an ANOVA to test whether Sales differ significantly between the featured vs. non-featured days.
## Df Sum Sq Mean Sq F value Pr(>F)
## Feature 1 256515 256515 54.47 3.4e-10 ***
## Residuals 66 310795 4709
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can do the same analysis for Display.
## Df Sum Sq Mean Sq F value Pr(>F)
## Display 1 174016 174016 29.2 9.62e-07 ***
## Residuals 66 393293 5959
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Question 1c
To obtain the VIF values of the variables in the data, we use the
vif function. The function takes in the names of the
variables and the data frame.
## FeatureYes DisplayYes Price
## 1.733758 1.138592 1.679551
Question 1d-1h
In Question 1, we run two regressions. The first regression is for Question 1b to Question 1i. The regression equation is like this:
\[Sales = \beta_0 + \beta_1*Price + \beta_2*Feature + \beta_3*Display + e\]
The first regression:
##
## Call:
## lm(formula = Sales ~ Price + Feature + Display, data = cadbury)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.681 -18.362 1.126 18.994 114.695
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 714.113 68.773 10.384 2.34e-15 ***
## Price -41.542 4.492 -9.248 2.09e-13 ***
## FeatureYes 33.897 14.550 2.330 0.023 *
## DisplayYes 75.399 13.645 5.526 6.44e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 40.2 on 64 degrees of freedom
## Multiple R-squared: 0.8177, Adjusted R-squared: 0.8091
## F-statistic: 95.68 on 3 and 64 DF, p-value: < 2.2e-16
Question 1i and 1j
The second regression is for Question 1j and 1k. The regression equation is like this:
\[Sales = \beta_0 + \beta_1*Price + \beta_2*Feature + \beta_3*Display + \beta_4*Sunny + \beta_5*Cloudy + e\]
To run the second regression, we first need to set the baseline of
Weather to Rainy. For this, we use the relevel
function.
After setting the baseline, we run the second regression.
##
## Call:
## lm(formula = Sales ~ Price + Feature + Display + Weather, data = cadbury)
##
## Residuals:
## Min 1Q Median 3Q Max
## -98.209 -23.309 -2.101 20.927 98.209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 698.273 67.641 10.323 4.30e-15 ***
## Price -41.257 4.409 -9.357 1.82e-13 ***
## FeatureYes 33.858 14.437 2.345 0.0222 *
## DisplayYes 75.844 13.302 5.702 3.53e-07 ***
## WeatherCloudy 27.539 12.069 2.282 0.0259 *
## WeatherSunny 16.552 11.229 1.474 0.1455
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39.08 on 62 degrees of freedom
## Multiple R-squared: 0.8331, Adjusted R-squared: 0.8196
## F-statistic: 61.89 on 5 and 62 DF, p-value: < 2.2e-16
Question 2
First, following the instructions in Question 1a, we will run a
regression of ratings on the following variables:
form, noapply, disinfect,
bio, and price.
##
## Call:
## lm(formula = ratings ~ form + noapply + disinfect + bio + price,
## data = cleanser)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.193 -1.354 0.178 1.098 4.328
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0204 0.3187 15.751 < 2e-16 ***
## formConcentrate 0.3665 0.2398 1.528 0.1274
## formPremix -0.2978 0.2569 -1.159 0.2472
## noapply100 times -0.1180 0.2471 -0.478 0.6333
## noapply50 times -0.4726 0.2565 -1.843 0.0663 .
## disinfectYes 0.9433 0.2154 4.379 1.62e-05 ***
## bioYes 0.1497 0.2103 0.712 0.4771
## price49 cents -1.3947 0.2512 -5.553 5.90e-08 ***
## price79 cents -2.8187 0.2502 -11.264 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.817 on 321 degrees of freedom
## Multiple R-squared: 0.3139, Adjusted R-squared: 0.2968
## F-statistic: 18.36 on 8 and 321 DF, p-value: < 2.2e-16
From the results, you can apply the 3 rules to transform the coefficients into partworths. Please see the suggested solutions for more details.