In this project, multiple linear regression model will be developed to test the variable influecing the consumption of ice cream.
1. Database data select:
selecting a dataset from “100+ Interesting Data Sets for Statistics”. http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html
variable:
independent variable: there are three variables.
Price: price of ice cream per pint in dollars;
Income: weekly family income in dollars;
Temp: mean temperature in degress F.
dependent variable: IC: ice cream consumption in pints per capita.
All these data is gathered form 30 four-week periods, from 03/18/51 to 07/11/53.
Objective:
Finding the influencing factors on consumption of ice cream.
null hypothesis
Ho(1): the consumption of ice cream canbe explained by price;
Ho(2): the consumption of ice cream canbe explained by income;
Ho(3): the consumption of ice cream canbe explained by temperature;
2. Read the dataset
data
data.table<-read.csv("C:\\Users\\Echo\\Desktop\\2015 Spring\\Regression\\Project 2\\ice cream.csv", header=T)
head(data.table)
## IC price income temp
## 1 0.386 0.270 78 41
## 2 0.374 0.282 79 56
## 3 0.393 0.277 81 63
## 4 0.425 0.280 80 68
## 5 0.406 0.272 76 69
## 6 0.344 0.262 78 65
summary(data.table)
## IC price income temp
## Min. :0.2560 Min. :0.2620 Min. :76.00 Min. :24.00
## 1st Qu.:0.3090 1st Qu.:0.2700 1st Qu.:79.00 1st Qu.:32.00
## Median :0.3440 Median :0.2770 Median :83.00 Median :47.00
## Mean :0.3529 Mean :0.2758 Mean :84.41 Mean :48.34
## 3rd Qu.:0.3860 3rd Qu.:0.2820 3rd Qu.:87.00 3rd Qu.:63.00
## Max. :0.4700 Max. :0.2920 Max. :96.00 Max. :72.00
plot
plot(data.table)
attach(data.table)
correlation
cor(data.table)
## IC price income temp
## IC 1.00000000 -0.09138949 -0.04878008 0.78564133
## price -0.09138949 1.00000000 -0.05501268 -0.02308821
## income -0.04878008 -0.05501268 1.00000000 -0.38317129
## temp 0.78564133 -0.02308821 -0.38317129 1.00000000
3. variable entry techniques
(1) Entry-wise
All variables entered simultaneously.
ice.lme<-lm(IC~price+income+temp)
summary(ice.lme)
##
## Call:
## lm(formula = IC ~ price + income + temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.059405 -0.015665 0.005229 0.017157 0.070515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0877445 0.2447400 0.359 0.7230
## price -0.3863577 0.7830856 -0.493 0.6261
## income 0.0026176 0.0010765 2.432 0.0225 *
## temp 0.0031191 0.0004168 7.483 7.78e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03291 on 25 degrees of freedom
## Multiple R-squared: 0.6948, Adjusted R-squared: 0.6582
## F-statistic: 18.97 on 3 and 25 DF, p-value: 1.256e-06
(2) Hierarchical
ice.lm2<-lm(IC~price)
summary(ice.lm2)
##
## Call:
## lm(formula = IC ~ price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.09617 -0.03871 -0.01017 0.03529 0.11976
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5311 0.3738 1.421 0.167
## price -0.6460 1.3546 -0.477 0.637
##
## Residual standard error: 0.05709 on 27 degrees of freedom
## Multiple R-squared: 0.008352, Adjusted R-squared: -0.02838
## F-statistic: 0.2274 on 1 and 27 DF, p-value: 0.6373
ice.lm3<-lm(IC~price+income)
summary(ice.lm3)
##
## Call:
## lm(formula = IC ~ price + income)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.098772 -0.034685 -0.009381 0.034351 0.117713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5777990 0.4161919 1.388 0.177
## price -0.6669640 1.3804937 -0.483 0.633
## income -0.0004845 0.0017534 -0.276 0.784
##
## Residual standard error: 0.05809 on 26 degrees of freedom
## Multiple R-squared: 0.01126, Adjusted R-squared: -0.0648
## F-statistic: 0.148 on 2 and 26 DF, p-value: 0.8632
ice.lm4<-lm(IC~price+income+temp)
summary(ice.lm4)
##
## Call:
## lm(formula = IC ~ price + income + temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.059405 -0.015665 0.005229 0.017157 0.070515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0877445 0.2447400 0.359 0.7230
## price -0.3863577 0.7830856 -0.493 0.6261
## income 0.0026176 0.0010765 2.432 0.0225 *
## temp 0.0031191 0.0004168 7.483 7.78e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03291 on 25 degrees of freedom
## Multiple R-squared: 0.6948, Adjusted R-squared: 0.6582
## F-statistic: 18.97 on 3 and 25 DF, p-value: 1.256e-06
anova(ice.lm2,ice.lm3,ice.lm4)
## Analysis of Variance Table
##
## Model 1: IC ~ price
## Model 2: IC ~ price + income
## Model 3: IC ~ price + income + temp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 27 0.087999
## 2 26 0.087741 1 0.000258 0.2379 0.63
## 3 25 0.027085 1 0.060656 55.9880 7.778e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(3) Stepwise
library(MASS)
fit<-ice.lme
step<-stepAIC(fit,direction="both")
## Start: AIC=-194.31
## IC ~ price + income + temp
##
## Df Sum of Sq RSS AIC
## - price 1 0.000264 0.027348 -196.03
## <none> 0.027085 -194.31
## - income 1 0.006406 0.033490 -190.15
## - temp 1 0.060656 0.087741 -162.22
##
## Step: AIC=-196.03
## IC ~ income + temp
##
## Df Sum of Sq RSS AIC
## <none> 0.027348 -196.03
## + price 1 0.000264 0.027085 -194.31
## - income 1 0.006618 0.033967 -191.74
## - temp 1 0.061180 0.088529 -163.96
step$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## IC ~ price + income + temp
##
## Final Model:
## IC ~ income + temp
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 25 0.02708455 -194.3065
## 2 - price 1 0.0002637195 26 0.02734827 -196.0255
Based on analysis of three methods, the variables temp and income are included as independent varible, whic have significant influence on consumption of ince cream.
4. Regression
plot(price,IC,main="IC vs price")
ip.lm<-lm(IC~price)
abline(ip.lm$coef,lwd=2)
plot(income,IC,pch=21,main="IC vs income")
ii.lm<-lm(IC~income)
abline(ii.lm$coef,lwd=2)
plot(temp,IC,pch=21,main="IC vs temperature")
it.lm<-lm(IC~temp)
abline(it.lm$coef,lwd=2)
confint(fit, level=0.95)
## 2.5 % 97.5 %
## (Intercept) -0.4163070855 0.591796012
## price -1.9991527273 1.226437341
## income 0.0004005437 0.004834700
## temp 0.0022605618 0.003977594
5. summary
model.lm<-lm(IC~income+temp)
summary(model.lm)
##
## Call:
## lm(formula = IC ~ income + temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.059121 -0.021892 0.003275 0.020605 0.073075
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0224003 0.0988245 -0.227 0.8225
## income 0.0026544 0.0010582 2.508 0.0187 *
## temp 0.0031289 0.0004103 7.627 4.28e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03243 on 26 degrees of freedom
## Multiple R-squared: 0.6918, Adjusted R-squared: 0.6681
## F-statistic: 29.18 on 2 and 26 DF, p-value: 2.262e-07