Seasonal Variation

Constant (or Additive) Seasonal Variation: magnitude of swing is contant over time - good

Increasing Seasonal Variation: magnitude of swing is changing over time - bad so find a transformation so that you can have constant seasonal variation

library(faraway)

## Warning: package 'faraway' was built under R version 3.4.4

data(airpass)
attach(airpass)

We will look at the number of passengers over time. We want the swings in the line to be relatively equal.

Type = “l” makes the line rather than just the points on the graph.

plot(pass ~ year, type = "l")

This is pretty obviously increasing seasonal variation. We need to use transformations to make it constant.

I tried log first.

plot(log(pass) ~ year, type = "l")

This actually makes it much better. The magnitude looks to be about the same throughout the entire graph.

Modeling Seasonal Variation

A dummy variable is a yes or no variable or something binary. E.g. is it spring or not?

Assume the model has this general form:

\[{y_t} = {TR_t} + {SN_t} + {\epsilon_t}\]

y is the value of the time series at time t TR is the trend at time t SN is the seasonal factor at time t the epsilon is the error at time t

Using dummy variable to model seasonality: create a dummy varaible for each month (11) or season (3), transform if necessary to get constant varaition first

Let’s use dummy variables for the different seasons for the data above.

First we need to get only the year out of each data point.

justyear <- floor(airpass$year)
summary(justyear)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   49.00   52.00   55.00   54.58   58.00   61.00

As you can see, these show only the year and not the month.

Now we can get the months in decimals.

modecimal <- airpass$year - justyear
summary(modecimal)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2292  0.4583  0.4583  0.6875  0.9167

We can make each month a different category by simply factoring modecimal. We can also add labels to each of the levels or months.

mofactor <-factor(round(modecimal*12))
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
summary(mofactor)

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##  12  12  12  12  12  12  12  12  12  12  12  12

Make sure you check using head(mofactor) and tail(mofactor) if the months are correct.

We can now look at the model.

mod <- lm(log(pass) ~ 0+justyear + mofactor, data=airpass)
coef(mod)

##    justyear mofactorJan mofactorFeb mofactorMar mofactorApr mofactorMay 
##   0.1208257  -1.2149979  -1.1836080  -1.1955940  -1.0552981  -1.0764982 
## mofactorJun mofactorJul mofactorAug mofactorSep mofactorOct mofactorNov 
##  -1.0688020  -0.9365870  -0.8225759  -0.8218019  -0.9563677  -1.0844571 
## mofactorDec 
##  -1.2181060

summary(mod)

## 
## Call:
## lm(formula = log(pass) ~ 0 + justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## justyear     0.120826   0.001432   84.40   <2e-16 ***
## mofactorJan -1.214998   0.081277  -14.95   <2e-16 ***
## mofactorFeb -1.183608   0.079878  -14.82   <2e-16 ***
## mofactorMar -1.195594   0.079878  -14.97   <2e-16 ***
## mofactorApr -1.055298   0.079878  -13.21   <2e-16 ***
## mofactorMay -1.076498   0.079878  -13.48   <2e-16 ***
## mofactorJun -1.068802   0.079878  -13.38   <2e-16 ***
## mofactorJul -0.936587   0.079878  -11.72   <2e-16 ***
## mofactorAug -0.822576   0.079878  -10.30   <2e-16 ***
## mofactorSep -0.821802   0.079878  -10.29   <2e-16 ***
## mofactorOct -0.956368   0.079878  -11.97   <2e-16 ***
## mofactorNov -1.084457   0.079878  -13.58   <2e-16 ***
## mofactorDec -1.218106   0.079878  -15.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 9.734e+04 on 13 and 131 DF,  p-value: < 2.2e-16

Now, we can uses seasons instead of months.

I chose to use Dec, Jan and Feb as winter, Mar, Apr and May as spring, etc. You can make the seasons whatever you want.

season <- mofactor
levels(season) <- list(winter=c("Dec", "Jan", "Feb"), spring=c("Mar", "Apr", "May"), summer=c("Jun", "Jul", "Aug"), fall=c("Sep", "Oct", "Nov"))
summary(season)

## winter spring summer   fall 
##     36     36     36     36

We can look at the new model now.

mod2 <- lm(log(pass) ~ 0+justyear + season, data=airpass)
coef(mod2)

##     justyear seasonwinter seasonspring seasonsummer   seasonfall 
##    0.1207600   -1.2019725   -1.1055539   -0.9390787   -0.9506327

summary(mod2)

## 
## Call:
## lm(formula = log(pass) ~ 0 + justyear + season, data = airpass)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27061 -0.07119  0.00114  0.06893  0.21364 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## justyear      0.120760   0.002404  50.229  < 2e-16 ***
## seasonwinter -1.201973   0.132877  -9.046 1.17e-15 ***
## seasonspring -1.105554   0.132082  -8.370 5.52e-14 ***
## seasonsummer -0.939079   0.132082  -7.110 5.58e-11 ***
## seasonfall   -0.950633   0.132082  -7.197 3.50e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09983 on 139 degrees of freedom
## Multiple R-squared:  0.9997, Adjusted R-squared:  0.9997 
## F-statistic: 8.93e+04 on 5 and 139 DF,  p-value: < 2.2e-16

We can also look at graph with both models on it. We can compare the two models that way.

with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")
lines(airpass$year, mod2$fitted.values,col="red")

We can also compare the models with numbers as well. That might be easier to see which one is better.

anova(mod, mod2)

## Analysis of Variance Table
## 
## Model 1: log(pass) ~ 0 + justyear + mofactor
## Model 2: log(pass) ~ 0 + justyear + season
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    131 0.46072                                  
## 2    139 1.38515 -8  -0.92443 32.857 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This suggests the monthly model is better. From the graph, you can also see that the blue line maps the data a little better than the red line.

Learning Log Day 17

Megan Blaschko

April 8, 2018

Seasonal Variation

Modeling Seasonal Variation

Peer Review