R Markdown

This code is used to compare models and check the relationship between variables

dat <- read.csv("http://www.cknudson.com/data/MacNaturalGas.csv")

mod1<- lm(therms~month,data=dat)
with(dat,plot(therms~month))
abline(mod1)

dat$monthsquared<- (dat$month)^2
mod2<- lm(therms~month+monthsquared,data=dat)
anova(mod1,mod2)
## Analysis of Variance Table
## 
## Model 1: therms ~ month
## Model 2: therms ~ month + monthsquared
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     97 1709752                                  
## 2     96  444221  1   1265531 273.49 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A low P value in the anova test means the more complex model is better

AIC(mod1)
## [1] 1252.867
AIC(mod2)
## [1] 1121.437
BIC(mod1)
## [1] 1260.652
BIC(mod2)
## [1] 1131.817

AIC and BIC are two ways to compare models The smaller the AIC and BIC the better. A change of 10 in AIC is considered significant enough for proof of a better modedl

dat$monthsqrt<- (dat$month)^.5
mod3<- lm(therms~month+monthsqrt,data=dat)
AIC(mod3)
## [1] 1180.611

mod1 is nested in mod2 and mod3 For not nested models use AIC or BIC Use ANOVA and LRT for nested models