library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
attach(airpass)
plot(pass ~ year, type = "l")

We want the seasonal variation to be more consistent so let’s transform the data.

plot(log(pass)~year, type = "l")

This is much better so we’ll use this for our model.

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12)) 
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")  

This created a month category for each data point. This can help us detect if the seasonal variation is significant or not. Let’s take a summary just to make sure the code worked right.

summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##  12  12  12  12  12  12  12  12  12  12  12  12

We have equal observations of every month which is perfect. Now we can add the months as variables into the data set and create our model.

airpass$justyear <- justyear
airpass$mofactor <- mofactor
mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
summary(mod)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

Notice that January isn’t on the list of predictors. This is January is the intercept, meaning that if no other months are present, January is the only option. It also means the t-tests in the summary are done under the assumption that the given month is insignificant in addition to January. With the p-values we got, this means that December, March, and February are all insignificant additions to January. Let’s plot the model to see just how well it fits using each month as a predictor.

plot(log(pass)~ year, type="l" )
lines(airpass$year, mod$fitted.values,col="red")

It’s not perfect but the fit is very good in most spots and fitted almost perfectly in many others.

Also in our class today, we peer reviewed project papers. As always, it is helpful to get a second opinion on your work. My partner had a lot of helpful questions and suggestions that tweaked our paper. For example, she had the great idea of discussing the possible multicollinearity of the data in more detail. Another thing she helped with was structure. She suggested we rearrange a couple paragraphs to make the paper sound more cohesive, which was something I didn’t notice. We also found points in each others papers that seemed a bit excessive and we were both able to offer helpful (I hope) advice for areas that could be trimmed. For future peer reviews in this class, my best suggestion is to include as much as possible before the peer review, as it will always be easier for your partner to know what to cut than what to add.