In this project, multiple linear regression model will be developed to test the variable influecing the consumption of ice cream.

1. Database data select:

selecting a dataset from “100+ Interesting Data Sets for Statistics”. http://lib.stat.cmu.edu/DASL/Datafiles/IceCream.html

variable:

independent variable: there are three variables.

  1. Price: price of ice cream per pint in dollars;

  2. Income: weekly family income in dollars;

  3. Temp: mean temperature in degress F.

dependent variable: IC: ice cream consumption in pints per capita.

All these data is gathered form 30 four-week periods, from 03/18/51 to 07/11/53.

Objective:

Finding the influencing factors on consumption of ice cream.

null hypothesis

Ho(1): the consumption of ice cream canbe explained by price;

Ho(2): the consumption of ice cream canbe explained by income;

Ho(3): the consumption of ice cream canbe explained by temperature;

2. Read the dataset

data

data.table<-read.csv("C:\\Users\\Echo\\Desktop\\2015 Spring\\Regression\\Project 2\\ice cream.csv", header=T)
head(data.table)
##      IC price income temp
## 1 0.386 0.270     78   41
## 2 0.374 0.282     79   56
## 3 0.393 0.277     81   63
## 4 0.425 0.280     80   68
## 5 0.406 0.272     76   69
## 6 0.344 0.262     78   65
summary(data.table)
##        IC             price            income           temp      
##  Min.   :0.2560   Min.   :0.2620   Min.   :76.00   Min.   :24.00  
##  1st Qu.:0.3090   1st Qu.:0.2700   1st Qu.:79.00   1st Qu.:32.00  
##  Median :0.3440   Median :0.2770   Median :83.00   Median :47.00  
##  Mean   :0.3529   Mean   :0.2758   Mean   :84.41   Mean   :48.34  
##  3rd Qu.:0.3860   3rd Qu.:0.2820   3rd Qu.:87.00   3rd Qu.:63.00  
##  Max.   :0.4700   Max.   :0.2920   Max.   :96.00   Max.   :72.00

plot

plot(data.table)

attach(data.table)

correlation

cor(data.table)
##                 IC       price      income        temp
## IC      1.00000000 -0.09138949 -0.04878008  0.78564133
## price  -0.09138949  1.00000000 -0.05501268 -0.02308821
## income -0.04878008 -0.05501268  1.00000000 -0.38317129
## temp    0.78564133 -0.02308821 -0.38317129  1.00000000

3. variable entry techniques

(1) Entry-wise

All variables entered simultaneously.

ice.lme<-lm(IC~price+income+temp)
summary(ice.lme)
## 
## Call:
## lm(formula = IC ~ price + income + temp)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059405 -0.015665  0.005229  0.017157  0.070515 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0877445  0.2447400   0.359   0.7230    
## price       -0.3863577  0.7830856  -0.493   0.6261    
## income       0.0026176  0.0010765   2.432   0.0225 *  
## temp         0.0031191  0.0004168   7.483 7.78e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03291 on 25 degrees of freedom
## Multiple R-squared:  0.6948, Adjusted R-squared:  0.6582 
## F-statistic: 18.97 on 3 and 25 DF,  p-value: 1.256e-06

(2) Hierarchical

ice.lm2<-lm(IC~price)
summary(ice.lm2)
## 
## Call:
## lm(formula = IC ~ price)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09617 -0.03871 -0.01017  0.03529  0.11976 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.5311     0.3738   1.421    0.167
## price        -0.6460     1.3546  -0.477    0.637
## 
## Residual standard error: 0.05709 on 27 degrees of freedom
## Multiple R-squared:  0.008352,   Adjusted R-squared:  -0.02838 
## F-statistic: 0.2274 on 1 and 27 DF,  p-value: 0.6373
ice.lm3<-lm(IC~price+income)
summary(ice.lm3)
## 
## Call:
## lm(formula = IC ~ price + income)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.098772 -0.034685 -0.009381  0.034351  0.117713 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.5777990  0.4161919   1.388    0.177
## price       -0.6669640  1.3804937  -0.483    0.633
## income      -0.0004845  0.0017534  -0.276    0.784
## 
## Residual standard error: 0.05809 on 26 degrees of freedom
## Multiple R-squared:  0.01126,    Adjusted R-squared:  -0.0648 
## F-statistic: 0.148 on 2 and 26 DF,  p-value: 0.8632
ice.lm4<-lm(IC~price+income+temp)
summary(ice.lm4)
## 
## Call:
## lm(formula = IC ~ price + income + temp)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059405 -0.015665  0.005229  0.017157  0.070515 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0877445  0.2447400   0.359   0.7230    
## price       -0.3863577  0.7830856  -0.493   0.6261    
## income       0.0026176  0.0010765   2.432   0.0225 *  
## temp         0.0031191  0.0004168   7.483 7.78e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03291 on 25 degrees of freedom
## Multiple R-squared:  0.6948, Adjusted R-squared:  0.6582 
## F-statistic: 18.97 on 3 and 25 DF,  p-value: 1.256e-06
anova(ice.lm2,ice.lm3,ice.lm4)
## Analysis of Variance Table
## 
## Model 1: IC ~ price
## Model 2: IC ~ price + income
## Model 3: IC ~ price + income + temp
##   Res.Df      RSS Df Sum of Sq       F    Pr(>F)    
## 1     27 0.087999                                   
## 2     26 0.087741  1  0.000258  0.2379      0.63    
## 3     25 0.027085  1  0.060656 55.9880 7.778e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(3) Stepwise

library(MASS)
fit<-ice.lme
step<-stepAIC(fit,direction="both")
## Start:  AIC=-194.31
## IC ~ price + income + temp
## 
##          Df Sum of Sq      RSS     AIC
## - price   1  0.000264 0.027348 -196.03
## <none>                0.027085 -194.31
## - income  1  0.006406 0.033490 -190.15
## - temp    1  0.060656 0.087741 -162.22
## 
## Step:  AIC=-196.03
## IC ~ income + temp
## 
##          Df Sum of Sq      RSS     AIC
## <none>                0.027348 -196.03
## + price   1  0.000264 0.027085 -194.31
## - income  1  0.006618 0.033967 -191.74
## - temp    1  0.061180 0.088529 -163.96
step$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## IC ~ price + income + temp
## 
## Final Model:
## IC ~ income + temp
## 
## 
##      Step Df     Deviance Resid. Df Resid. Dev       AIC
## 1                                25 0.02708455 -194.3065
## 2 - price  1 0.0002637195        26 0.02734827 -196.0255

Based on analysis of three methods, the variables temp and income are included as independent varible, whic have significant influence on consumption of ince cream.

4. Regression

plot(price,IC,main="IC vs price")
ip.lm<-lm(IC~price)
abline(ip.lm$coef,lwd=2)

plot(income,IC,pch=21,main="IC vs income")
ii.lm<-lm(IC~income)
abline(ii.lm$coef,lwd=2)

plot(temp,IC,pch=21,main="IC vs temperature")
it.lm<-lm(IC~temp)
abline(it.lm$coef,lwd=2)

confint(fit, level=0.95)
##                     2.5 %      97.5 %
## (Intercept) -0.4163070855 0.591796012
## price       -1.9991527273 1.226437341
## income       0.0004005437 0.004834700
## temp         0.0022605618 0.003977594

5. summary

model.lm<-lm(IC~income+temp)
summary(model.lm)
## 
## Call:
## lm(formula = IC ~ income + temp)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059121 -0.021892  0.003275  0.020605  0.073075 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.0224003  0.0988245  -0.227   0.8225    
## income       0.0026544  0.0010582   2.508   0.0187 *  
## temp         0.0031289  0.0004103   7.627 4.28e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03243 on 26 degrees of freedom
## Multiple R-squared:  0.6918, Adjusted R-squared:  0.6681 
## F-statistic: 29.18 on 2 and 26 DF,  p-value: 2.262e-07