We learned about how to transform seasonal data that does not have a constant variation, We can use dummy variables or possibly use trig transformations to model these. Our models assumptions do not hold if there is not constant variation.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)
attach(airpass)
plot(pass~year, type = "l")
The data is seasonally increasing but the swings get more dramatic each year. We can n try to transform the data to get it better looking.
plot(sqrt(pass)~year, type = "l")
plot(log(pass)~year, type = "l")
the log transformation looks better and the swings seem to be about the same. Now we can build a model. We can use dummy variables to group the seasons.
justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))
head(cbind(airpass$year, mofactor))
## mofactor
## [1,] 49.08333 2
## [2,] 49.16667 3
## [3,] 49.25000 4
## [4,] 49.33333 5
## [5,] 49.41667 6
## [6,] 49.50000 7
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")
airpass$justyear <- justyear
airpass$mofactor <- mofactor
head(airpass)
## pass year justyear mofactor
## 1 112 49.08333 49 Feb
## 2 118 49.16667 49 Mar
## 3 132 49.25000 49 Apr
## 4 129 49.33333 49 May
## 5 121 49.41667 49 Jun
## 6 135 49.50000 49 Jul
Now we have the months and years in a more usable sense. and can build our model
mod <- lm(log(pass)~justyear + mofactor)
summary(mod)
##
## Call:
## lm(formula = log(pass) ~ justyear + mofactor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.156370 -0.041016 0.003677 0.044069 0.132324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.214998 0.081277 -14.949 < 2e-16 ***
## justyear 0.120826 0.001432 84.399 < 2e-16 ***
## mofactorFeb 0.031390 0.024253 1.294 0.198
## mofactorMar 0.019404 0.024253 0.800 0.425
## mofactorApr 0.159700 0.024253 6.585 1.00e-09 ***
## mofactorMay 0.138500 0.024253 5.711 7.19e-08 ***
## mofactorJun 0.146196 0.024253 6.028 1.58e-08 ***
## mofactorJul 0.278411 0.024253 11.480 < 2e-16 ***
## mofactorAug 0.392422 0.024253 16.180 < 2e-16 ***
## mofactorSep 0.393196 0.024253 16.212 < 2e-16 ***
## mofactorOct 0.258630 0.024253 10.664 < 2e-16 ***
## mofactorNov 0.130541 0.024253 5.382 3.28e-07 ***
## mofactorDec -0.003108 0.024253 -0.128 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared: 0.9835, Adjusted R-squared: 0.982
## F-statistic: 649.4 on 12 and 131 DF, p-value: < 2.2e-16
We can see that the coeffcients from different seasons are similar so we could try and group these together into the seasons. We’ll olit tgus ainst the data to see how it compares.
plot(log(pass)~year, type = "l")
lines(airpass$year, mod$fitted.values, type = "l", col = "4")
The model looks pretty good. and allows us to accurately predict how many passengers ould have flown in a month in our time frame.
During our peer review we found most of our errors to be organizational, ww ewere given lots of good feedback on how to effciently and intuitively place things in our paper and how things could logiclly flow. We were also given some feedback on other possible facotrs to look into or even some transformations to our data to make it more intuitive.