Sameer Mathur (2018)
Regression Analysis
Sales CompPrice Income Advertising Population Price ShelveLoc Age
1 9.50 138 73 11 276 120 0-Bad 42
2 4.15 141 64 3 340 128 0-Bad 38
3 10.81 124 113 13 501 72 0-Bad 78
4 9.01 121 78 9 150 100 0-Bad 26
5 10.14 145 119 16 294 113 0-Bad 42
6 2.99 103 74 0 359 97 0-Bad 55
Education Urban US Revenue Profit
1 17 Yes Yes 1140.00 228.00
2 13 Yes No 531.20 106.24
3 16 No Yes 778.32 155.66
4 10 No Yes 901.00 180.20
5 12 Yes Yes 1145.82 229.16
6 11 Yes Yes 290.03 58.01
# data structure of the dataframe
str(carSeats.df)
'data.frame': 400 obs. of 13 variables:
$ Sales : num 9.5 4.15 10.81 9.01 10.14 ...
$ CompPrice : int 138 141 124 121 145 103 104 130 119 157 ...
$ Income : int 73 64 113 78 119 74 99 60 98 53 ...
$ Advertising: int 11 3 13 9 16 0 15 0 0 0 ...
$ Population : int 276 340 501 150 294 359 226 144 18 403 ...
$ Price : int 120 128 72 100 113 97 102 138 126 124 ...
$ ShelveLoc : Factor w/ 3 levels "0-Bad","1-Medium",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Age : int 42 38 78 26 42 55 58 38 73 58 ...
$ Education : int 17 13 16 10 12 11 17 10 17 16 ...
$ Urban : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 2 2 1 1 2 ...
$ US : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 1 1 1 ...
$ Revenue : num 1140 531 778 901 1146 ...
$ Profit : num 228 106 156 180 229 ...
Call:
lm(formula = p0, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-171.85 -34.38 -3.78 35.97 168.85
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 147.3290 4.0320 36.540 < 2e-16 ***
Advertising 3.0660 0.4295 7.139 4.49e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 57.05 on 398 degrees of freedom
Multiple R-squared: 0.1135, Adjusted R-squared: 0.1113
F-statistic: 50.97 on 1 and 398 DF, p-value: 4.493e-12
IMPACT OF ADVERTISING ON PROFIT
Call:
lm(formula = p1, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-163.350 -33.330 -1.365 31.033 153.050
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 120.041 4.808 24.97 < 2e-16 ***
ShelveLoc1-Medium 43.309 5.767 7.51 3.93e-13 ***
ShelveLoc2-Good 112.558 7.016 16.04 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 47.11 on 397 degrees of freedom
Multiple R-squared: 0.3971, Adjusted R-squared: 0.394
F-statistic: 130.7 on 2 and 397 DF, p-value: < 2.2e-16
IMPACT OF SHELF LOCATION ON PROFIT
Call:
lm(formula = p2, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-145.446 -25.160 0.039 24.796 104.054
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 103.014 4.886 21.085 < 2e-16 ***
Advertising 2.738 0.328 8.347 1.18e-15 ***
ShelveLoc1-Medium 42.433 5.325 7.968 1.73e-14 ***
ShelveLoc2-Good 109.453 6.489 16.867 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 43.5 on 396 degrees of freedom
Multiple R-squared: 0.4873, Adjusted R-squared: 0.4834
F-statistic: 125.4 on 3 and 396 DF, p-value: < 2.2e-16
IMPACT OF ADVERTISING ON PROFIT
IMPACT OF SHELF LOCATION ON PROFIT
A change in Shelf Location from Bad to Good, increases the profit by $109,453
Notice that when we jointly consider the role of Advertising and Shelf Location, the 'answers' are different.
We were 'over-predicting' the impact of Advertising when we ignored the impact of Shelf Location (See Model 0)
We were 'over-predicting' the impact of Shelf Location when we ignored the impact of Advertising (See Model 1)
Call:
lm(formula = p3, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-143.299 -24.554 1.032 24.480 106.201
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.6051 6.1743 17.590 < 2e-16 ***
Advertising 1.8390 0.6903 2.664 0.00803 **
ShelveLoc1-Medium 34.6937 7.4206 4.675 4.04e-06 ***
ShelveLoc2-Good 103.1563 9.3093 11.081 < 2e-16 ***
Advertising:ShelveLoc1-Medium 1.2276 0.8190 1.499 0.13472
Advertising:ShelveLoc2-Good 0.9950 0.9812 1.014 0.31118
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 43.48 on 394 degrees of freedom
Multiple R-squared: 0.4902, Adjusted R-squared: 0.4837
F-statistic: 75.77 on 5 and 394 DF, p-value: < 2.2e-16
p4 <- Profit ~ Advertising + ShelveLoc + CompPrice + Population + Income + Age + Education + Urban + US
fitp4 = lm(p4 ,data=carSeats.df )
summary (fitp4)
Call:
lm(formula = p4, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-159.654 -17.856 2.075 20.401 68.740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -60.136334 18.171281 -3.309 0.00102 **
Advertising 2.959638 0.334864 8.838 < 2e-16 ***
ShelveLoc1-Medium 44.625673 3.797403 11.752 < 2e-16 ***
ShelveLoc2-Good 109.212841 4.608355 23.699 < 2e-16 ***
CompPrice 1.563453 0.101946 15.336 < 2e-16 ***
Population 0.005803 0.011149 0.521 0.60299
Income 0.370657 0.055560 6.671 8.73e-11 ***
Age -0.952238 0.095685 -9.952 < 2e-16 ***
Education -0.679343 0.593883 -1.144 0.25337
UrbanYes 2.981043 3.402003 0.876 0.38143
USYes -5.783463 4.511614 -1.282 0.20064
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.69 on 389 degrees of freedom
Multiple R-squared: 0.7493, Adjusted R-squared: 0.7429
F-statistic: 116.3 on 10 and 389 DF, p-value: < 2.2e-16
NOTES
# Print the summary of the two regression models
summary(fitp2)
Call:
lm(formula = p2, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-145.446 -25.160 0.039 24.796 104.054
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 103.014 4.886 21.085 < 2e-16 ***
Advertising 2.738 0.328 8.347 1.18e-15 ***
ShelveLoc1-Medium 42.433 5.325 7.968 1.73e-14 ***
ShelveLoc2-Good 109.453 6.489 16.867 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 43.5 on 396 degrees of freedom
Multiple R-squared: 0.4873, Adjusted R-squared: 0.4834
F-statistic: 125.4 on 3 and 396 DF, p-value: < 2.2e-16
#summary (fitp4)
# Print the summary of the two regression models
#summary(fitp2)
summary (fitp4)
Call:
lm(formula = p4, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-159.654 -17.856 2.075 20.401 68.740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -60.136334 18.171281 -3.309 0.00102 **
Advertising 2.959638 0.334864 8.838 < 2e-16 ***
ShelveLoc1-Medium 44.625673 3.797403 11.752 < 2e-16 ***
ShelveLoc2-Good 109.212841 4.608355 23.699 < 2e-16 ***
CompPrice 1.563453 0.101946 15.336 < 2e-16 ***
Population 0.005803 0.011149 0.521 0.60299
Income 0.370657 0.055560 6.671 8.73e-11 ***
Age -0.952238 0.095685 -9.952 < 2e-16 ***
Education -0.679343 0.593883 -1.144 0.25337
UrbanYes 2.981043 3.402003 0.876 0.38143
USYes -5.783463 4.511614 -1.282 0.20064
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.69 on 389 degrees of freedom
Multiple R-squared: 0.7493, Adjusted R-squared: 0.7429
F-statistic: 116.3 on 10 and 389 DF, p-value: < 2.2e-16
Method 1: Compare the Adjusted R Squared
(summary(fitp2))$adj.r.squared
[1] 0.483383
(summary(fitp4))$adj.r.squared
[1] 0.7428985
Method 2: Compare the AIC
AIC(fitp2)
[1] 4159.3
AIC(fitp4)
[1] 3887.034
Method 3: ANOVA (since Model 2 is nested within Model 4)
anova(fitp2, fitp4)
Analysis of Variance Table
Model 1: Profit ~ Advertising + ShelveLoc
Model 2: Profit ~ Advertising + ShelveLoc + CompPrice + Population + Income +
Age + Education + Urban + US
Res.Df RSS Df Sum of Sq F Pr(>F)
1 396 749257
2 389 366287 7 382970 58.103 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p5 <- Profit ~ Advertising*Income + ShelveLoc + CompPrice + Population + Age + Education + Urban + US
fitp5 = lm(p5 ,data=carSeats.df )
summary (fitp5)
Call:
lm(formula = p5, data = carSeats.df)
Residuals:
Min 1Q Median 3Q Max
-159.421 -17.994 1.991 19.878 73.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -53.675367 18.315500 -2.931 0.00358 **
Advertising 1.650310 0.679306 2.429 0.01558 *
Income 0.248046 0.078290 3.168 0.00165 **
ShelveLoc1-Medium 44.793177 3.779306 11.852 < 2e-16 ***
ShelveLoc2-Good 109.503315 4.587353 23.871 < 2e-16 ***
CompPrice 1.572174 0.101517 15.487 < 2e-16 ***
Population 0.004853 0.011102 0.437 0.66225
Age -0.937659 0.095438 -9.825 < 2e-16 ***
Education -0.700463 0.591011 -1.185 0.23667
UrbanYes 3.243896 3.387197 0.958 0.33881
USYes -5.243940 4.495835 -1.166 0.24417
Advertising:Income 0.018512 0.008370 2.212 0.02756 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.53 on 388 degrees of freedom
Multiple R-squared: 0.7525, Adjusted R-squared: 0.7454
F-statistic: 107.2 on 11 and 388 DF, p-value: < 2.2e-16
Method 1: Compare the Adjusted R Squared
(summary(fitp4))$adj.r.squared
[1] 0.7428985
(summary(fitp5))$adj.r.squared
[1] 0.7454454
Method 2: Compare the AIC
AIC(fitp4)
[1] 3887.034
AIC(fitp5)
[1] 3884.022
Method 3: ANOVA (since Model 4 is nested within Model 5)
anova(fitp4, fitp5)
Analysis of Variance Table
Model 1: Profit ~ Advertising + ShelveLoc + CompPrice + Population + Income +
Age + Education + Urban + US
Model 2: Profit ~ Advertising * Income + ShelveLoc + CompPrice + Population +
Age + Education + Urban + US
Res.Df RSS Df Sum of Sq F Pr(>F)
1 389 366287
2 388 361726 1 4560.8 4.8921 0.02756 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1