Learning Log

Concepts

Today, we began class by discussing the issue of increasing seasonal variation and how we can make that increasing seasonal variation constant using transformations. We then discussed how to model this seasonal variation using dummy variables and briefly touched on using trigonometric functions to model the seasonal variation, too, but did not get to talk extensively about this method.

Generally, increasing seasonal variation is when the seasonal variation in a data set increases as time increases, so you see a fanning out of data points. Constant seasonal variation is when that seasonal variation remains constant so there is no fanning out. Using dummy variables in a time series analysis is very similar to using dummy variables in a multiple linear regression.

Of course, we use these modeling techniques any time we see seasonal variation in our time series data and use the transformations any time the seasonal variation in a data set is not constant.

Example

To illustrate these concepts in R, I’ll complete an example using the data set airpass from the R package faraway. This data contains information on the number of passengers (in thousands) traveling by plane per month from 1949 to 1951.

Increasing or Constant Seasonal Variation

First, we’ll need to plot the data to see whether it has constant or increasing seasonal variation.

library(faraway)
data(airpass)
plot(pass~year, data = airpass, type = "l")

This data set has very obvious increasing seasonal variation. To fix this, we can try either a square root or logarithmic transformation on our passenger variable. Let’s compare them.

plot(sqrt(pass)~year, data = airpass, type = "l")

plot(log(pass)~year, data = airpass, type = "l")

We can see that the log transform removes most of the increasing seasonal variation from our data so we will use that in any further models created.

Building the Model

Now, we’ll discuss how to add seasonal variation to the trend model. Since we didn’t get to talk much about modeling seasonal variation using trigonometric functions, I will only illustrate how to model seasonal variation using dummy variables.

head(airpass)

##   pass     year
## 1  112 49.08333
## 2  118 49.16667
## 3  132 49.25000
## 4  129 49.33333
## 5  121 49.41667
## 6  135 49.50000

Looking at our data set, we see that the data is monthly, but the time is not in factors, so we’ll need to make our months into factors instead of leaving them as decimals. We can do that in the following way:

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))
head(cbind(airpass$year, mofactor))

##               mofactor
## [1,] 49.08333        2
## [2,] 49.16667        3
## [3,] 49.25000        4
## [4,] 49.33333        5
## [5,] 49.41667        6
## [6,] 49.50000        7

levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")
airpass$justyear <- justyear
airpass$mofactor <- mofactor

So, now, if we view our the first few entries of our data set, we can see that the months are now factors instead of decimals.

head(airpass)

##   pass     year justyear mofactor
## 1  112 49.08333       49      Feb
## 2  118 49.16667       49      Mar
## 3  132 49.25000       49      Apr
## 4  129 49.33333       49      May
## 5  121 49.41667       49      Jun
## 6  135 49.50000       49      Jul

Then, let’s make our model. This is similar to making a multiple linear regression model.

monthmod <- lm(log(pass)~justyear + mofactor, data = airpass)

We can look at the summary of the model and notice that all of the coefficients corresponding to the different months are in relation to the month of January. Thus, we can compare how the number of passengers traveling by plane changes with respect to January for each month.

summary(monthmod)

## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

Instead of using the monthly data as our dummy variable, we could aslo have collapsed the factors down to seasonal data. However, we will leave our model at this stage since models usually benefit from having as detailed data as possible.

Finally, we can plot our model to see how it compares to the actual trend of the data.

plot(log(pass)~year, data = airpass, type = "l")
lines(airpass$year, monthmod$fitted.values, type = "l", col = "red")

The red line is our model and the black line idicates the actual data. The model seems to fit our data pretty well.

Comparison to Topics and Course

Since these concepts are extremely interrelated to time series analysis, this topic fits very well into our course. These concepts also fit really well with what we have learned with regards to time series modelling because they allow us to add complexity to our model so that it more accurately presents the message of our data.

Learning Log Day 17

Sydney Benson