This code is used to compare models and check the relationship between variables
dat <- read.csv("http://www.cknudson.com/data/MacNaturalGas.csv")
mod1<- lm(therms~month,data=dat)
with(dat,plot(therms~month))
abline(mod1)
dat$monthsquared<- (dat$month)^2
mod2<- lm(therms~month+monthsquared,data=dat)
anova(mod1,mod2)
## Analysis of Variance Table
##
## Model 1: therms ~ month
## Model 2: therms ~ month + monthsquared
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 97 1709752
## 2 96 444221 1 1265531 273.49 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A low P value in the anova test means the more complex model is better
AIC(mod1)
## [1] 1252.867
AIC(mod2)
## [1] 1121.437
BIC(mod1)
## [1] 1260.652
BIC(mod2)
## [1] 1131.817
AIC and BIC are two ways to compare models The smaller the AIC and BIC the better. A change of 10 in AIC is considered significant enough for proof of a better modedl
dat$monthsqrt<- (dat$month)^.5
mod3<- lm(therms~month+monthsqrt,data=dat)
AIC(mod3)
## [1] 1180.611
mod1 is nested in mod2 and mod3 For not nested models use AIC or BIC Use ANOVA and LRT for nested models