We learned about how to transform seasonal data that does not have a constant variation, We can use dummy variables or possibly use trig transformations to model these. Our models assumptions do not hold if there is not constant variation.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
attach(airpass)
plot(pass~year, type = "l")

The data is seasonally increasing but the swings get more dramatic each year. We can n try to transform the data to get it better looking.

plot(sqrt(pass)~year, type = "l")

plot(log(pass)~year, type = "l")

the log transformation looks better and the swings seem to be about the same. Now we can build a model. We can use dummy variables to group the seasons.

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))
head(cbind(airpass$year, mofactor))
##               mofactor
## [1,] 49.08333        2
## [2,] 49.16667        3
## [3,] 49.25000        4
## [4,] 49.33333        5
## [5,] 49.41667        6
## [6,] 49.50000        7
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")
airpass$justyear <- justyear
airpass$mofactor <- mofactor
head(airpass)
##   pass     year justyear mofactor
## 1  112 49.08333       49      Feb
## 2  118 49.16667       49      Mar
## 3  132 49.25000       49      Apr
## 4  129 49.33333       49      May
## 5  121 49.41667       49      Jun
## 6  135 49.50000       49      Jul

Now we have the months and years in a more usable sense. and can build our model

mod <- lm(log(pass)~justyear + mofactor)
summary(mod)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

We can see that the coeffcients from different seasons are similar so we could try and group these together into the seasons. We’ll olit tgus ainst the data to see how it compares.

plot(log(pass)~year,  type = "l")
lines(airpass$year, mod$fitted.values, type = "l", col = "4")

The model looks pretty good. and allows us to accurately predict how many passengers ould have flown in a month in our time frame.

During our peer review we found most of our errors to be organizational, ww ewere given lots of good feedback on how to effciently and intuitively place things in our paper and how things could logiclly flow. We were also given some feedback on other possible facotrs to look into or even some transformations to our data to make it more intuitive.