Seasonal Variation

Seasonal variation is the magnitude of swing constant through time. All of the models we talk about in this class are going to have constant seasonal variation. To ensure this, if we encounter nonconstant seasonal variation, we’re going to transform the data to make it constant.

require(faraway)
## Loading required package: faraway
## Warning: package 'faraway' was built under R version 3.3.3
data(airpass)

plot(pass ~ year, airpass, type = "l")

In the plot above, the seasonal variation is increasing. We don’t want this, so let’s look at ways to transform the variable “pass” and make the seasonal variance constant

par(mfrow = c(2,2))
plot((pass^0.75) ~ year, airpass, type = "l")
plot((pass^0.5) ~ year, airpass, type = "l")
plot((pass^0.25) ~ year, airpass, type = "l")
plot(log(pass) ~ year, airpass, type = "l")

par(mfrow = c(1, 2))
plot(pass ~ year, airpass, type = "l")
plot(log(pass) ~ year, airpass, type = "l")

It looks like taking the natural log of the pass variable allows us to work with constant seasonal variance. Yay! Note that really small lambdas also look more constant, but we may as well go with using the log transformation, since they look pretty similar and natural log is easier to interpret.

Modeling Seasonal Variation

Dummy Variables

Let’s say we’re working with a model that is just the trend, the seasonal component, and the error component. We will have a dummy variable for each unit of time. This means that if we’re looking at 12 months, we’ll have 12 dummy’s, if we’re looking at 4 seasons (or quarters) we’ll have 4 dummy’s, etc.

Let’s use airpass data to creat an example again. First we must clean up the data a bit (borrowed from in-class activity):

# Need to get just the month
justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear  #month as a decimal

mofactor <-factor(round(modecimal*12))  #month as a factor ie categorical

# Changes levels of factor to be mo names 
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")  

Model time

model2 <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(model2)
##  (Intercept)     justyear  mofactorFeb  mofactorMar  mofactorApr 
## -1.214997885  0.120825657  0.031389870  0.019403852  0.159699778 
##  mofactorMay  mofactorJun  mofactorJul  mofactorAug  mofactorSep 
##  0.138499729  0.146195892  0.278410898  0.392422029  0.393195995 
##  mofactorOct  mofactorNov  mofactorDec 
##  0.258630198  0.130540762 -0.003108143
summary(model2)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

Note that all the dummy variables are specifically in comparison to January. This means that the difference between February and January, March and January, and December and January are not significantly different (statistically), but the other months are.

Let’s look at seasons instead of months:

levels(mofactor) <- list(Winter = c("Dec", "Jan", "Feb"), Spring = c("Mar", "Apr", "May"), Summer = c("Jun", "Jul", "Aug"), Fall = c("Sep", "Oct", "Nov"))

model3 <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(model3)
##    (Intercept)       justyear mofactorSpring mofactorSummer   mofactorFall 
##    -1.20197254     0.12076004     0.09641867     0.26289382     0.25133987
summary(model3)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27061 -0.07119  0.00114  0.06893  0.21364 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.201973   0.132877  -9.046 1.17e-15 ***
## justyear        0.120760   0.002404  50.229  < 2e-16 ***
## mofactorSpring  0.096419   0.023543   4.095 7.12e-05 ***
## mofactorSummer  0.262894   0.023543  11.167  < 2e-16 ***
## mofactorFall    0.251340   0.023543  10.676  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09983 on 139 degrees of freedom
## Multiple R-squared:  0.9503, Adjusted R-squared:  0.9489 
## F-statistic: 664.4 on 4 and 139 DF,  p-value: < 2.2e-16
plot(model3)

Not totally perfect, but now we’ve got a pretty good model with mostly constant seasonal variance and 4 significant dummy variables related to season! But let’s actually check and see if this seasonal model is better than the monthly model:

anova(model2, model3)
## Analysis of Variance Table
## 
## Model 1: log(pass) ~ justyear + mofactor
## Model 2: log(pass) ~ justyear + mofactor
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    131 0.46072                                  
## 2    139 1.38515 -8  -0.92443 32.857 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a statistically significant difference between the two models, meaning we should actually just go with the 12-month model (model2) unless we have a really good reason to examine seasons (we don’t).