6.3 Seasonal Variation

In today’s class, we began section 6.3, which is about seasonal variation. When it comes to seasonal variation, we want it to be constant, meaning the magnitude of the swing is constant over time. We do not want increasing variation because this means the magnitude of the swing changes over time.

If we see evidence of increasing variation, we can try to use transformations to make it more constant.

Example 1

In this first example we will use data from the alr3 package. We plot the data with a line to see the trend.

library(alr3)

## Warning: package 'alr3' was built under R version 3.4.3

## Loading required package: car

## Warning: package 'car' was built under R version 3.4.3

data(Mitchell)
attach(Mitchell)

plot(Temp ~ Month, type = "l")

In this example, we see pretty constant variation so we do not perform any transformations.

Example 2

Our next example will be from the faraway package.

library(faraway)

## Warning: package 'faraway' was built under R version 3.4.3

## 
## Attaching package: 'faraway'

## The following objects are masked from 'package:alr3':
## 
##     cathedral, pipeline, twins

## The following objects are masked from 'package:car':
## 
##     logit, vif

data(airpass)
attach(airpass)

plot(pass ~ year, type = "l")

As we can see from the graph, we have evidence of increasing variation. We will perform transformations to try to fix this.

First we’ll plot the square of the function:

plot(pass^2 ~ year, type = "l")

This does not fix the variation, so we will try the log of the function:

plot(log(pass) ~ year, type = "l")

This looks a lot better. Our variation is more close to constant.

6.4 Modeling Seasonal Variation

We’ll now use a dummy variable to model seasonal variation.

library(faraway)
data(airpass)

The airpass data shows months as decimals, but we want them to be a factor we can work with. We change them from decimals to factors in the following way:

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear 
mofactor <-factor(round(modecimal*12))
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")  

summary(mofactor)

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##  12  12  12  12  12  12  12  12  12  12  12  12

airpass$justyear <- justyear
airpass$mofactor <- mofactor
head(airpass)

##   pass     year justyear mofactor
## 1  112 49.08333       49      Feb
## 2  118 49.16667       49      Mar
## 3  132 49.25000       49      Apr
## 4  129 49.33333       49      May
## 5  121 49.41667       49      Jun
## 6  135 49.50000       49      Jul

Now we have our months in factors. From here, we can create a model.

mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(mod)

##  (Intercept)     justyear  mofactorFeb  mofactorMar  mofactorApr 
## -1.214997885  0.120825657  0.031389870  0.019403852  0.159699778 
##  mofactorMay  mofactorJun  mofactorJul  mofactorAug  mofactorSep 
##  0.138499729  0.146195892  0.278410898  0.392422029  0.393195995 
##  mofactorOct  mofactorNov  mofactorDec 
##  0.258630198  0.130540762 -0.003108143

summary(mod)

## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

In our summary, all of our variables are compared to January. Under the estimate column, we can tell that summer months have more passengers than January because they have larger values. The winter months have passengers close to the amount that January does because their estimates are closer to 0.

We will now plot this model. We make our plotted data blue and the actual data black to more easily compare the two.

with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")

From the graph, our model fits the original data pretty well.

Peer Review

After having our paper peer reviewed, I am realizing that our paper needs to sound a bit more like a paper and less like an R guide. I think we put a bit too much time into explaining the workings behind the tests instead of actually interpretting the results in a way that makes sense to a general audience. Moving forward, I am going to look at each section we currently have and try to read it through a more general lense instead of the lense of someone who is in a stats class. I think it would be helpful to read it to a roommate or friend who is not in the class and see if they can understand where I am coming from. I also think we need to include more graphs to help the audience visualize our findings.

Learning Log Day 17 - 6.3

Cara Christianson

April 5, 2018

6.3 Seasonal Variation

Example 1

Example 2

6.4 Modeling Seasonal Variation

Peer Review