In today’s class, we began section 6.3, which is about seasonal variation. When it comes to seasonal variation, we want it to be constant, meaning the magnitude of the swing is constant over time. We do not want increasing variation because this means the magnitude of the swing changes over time.
If we see evidence of increasing variation, we can try to use transformations to make it more constant.
In this first example we will use data from the alr3 package. We plot the data with a line to see the trend.
library(alr3)
## Warning: package 'alr3' was built under R version 3.4.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data(Mitchell)
attach(Mitchell)
plot(Temp ~ Month, type = "l")
In this example, we see pretty constant variation so we do not perform any transformations.
Our next example will be from the faraway package.
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.3
##
## Attaching package: 'faraway'
## The following objects are masked from 'package:alr3':
##
## cathedral, pipeline, twins
## The following objects are masked from 'package:car':
##
## logit, vif
data(airpass)
attach(airpass)
plot(pass ~ year, type = "l")
As we can see from the graph, we have evidence of increasing variation. We will perform transformations to try to fix this.
First we’ll plot the square of the function:
plot(pass^2 ~ year, type = "l")
This does not fix the variation, so we will try the log of the function:
plot(log(pass) ~ year, type = "l")
This looks a lot better. Our variation is more close to constant.
We’ll now use a dummy variable to model seasonal variation.
library(faraway)
data(airpass)
The airpass data shows months as decimals, but we want them to be a factor we can work with. We change them from decimals to factors in the following way:
justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")
summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 12 12 12 12 12 12 12 12 12 12 12 12
airpass$justyear <- justyear
airpass$mofactor <- mofactor
head(airpass)
## pass year justyear mofactor
## 1 112 49.08333 49 Feb
## 2 118 49.16667 49 Mar
## 3 132 49.25000 49 Apr
## 4 129 49.33333 49 May
## 5 121 49.41667 49 Jun
## 6 135 49.50000 49 Jul
Now we have our months in factors. From here, we can create a model.
mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(mod)
## (Intercept) justyear mofactorFeb mofactorMar mofactorApr
## -1.214997885 0.120825657 0.031389870 0.019403852 0.159699778
## mofactorMay mofactorJun mofactorJul mofactorAug mofactorSep
## 0.138499729 0.146195892 0.278410898 0.392422029 0.393195995
## mofactorOct mofactorNov mofactorDec
## 0.258630198 0.130540762 -0.003108143
summary(mod)
##
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.156370 -0.041016 0.003677 0.044069 0.132324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.214998 0.081277 -14.949 < 2e-16 ***
## justyear 0.120826 0.001432 84.399 < 2e-16 ***
## mofactorFeb 0.031390 0.024253 1.294 0.198
## mofactorMar 0.019404 0.024253 0.800 0.425
## mofactorApr 0.159700 0.024253 6.585 1.00e-09 ***
## mofactorMay 0.138500 0.024253 5.711 7.19e-08 ***
## mofactorJun 0.146196 0.024253 6.028 1.58e-08 ***
## mofactorJul 0.278411 0.024253 11.480 < 2e-16 ***
## mofactorAug 0.392422 0.024253 16.180 < 2e-16 ***
## mofactorSep 0.393196 0.024253 16.212 < 2e-16 ***
## mofactorOct 0.258630 0.024253 10.664 < 2e-16 ***
## mofactorNov 0.130541 0.024253 5.382 3.28e-07 ***
## mofactorDec -0.003108 0.024253 -0.128 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared: 0.9835, Adjusted R-squared: 0.982
## F-statistic: 649.4 on 12 and 131 DF, p-value: < 2.2e-16
In our summary, all of our variables are compared to January. Under the estimate column, we can tell that summer months have more passengers than January because they have larger values. The winter months have passengers close to the amount that January does because their estimates are closer to 0.
We will now plot this model. We make our plotted data blue and the actual data black to more easily compare the two.
with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")
From the graph, our model fits the original data pretty well.
After having our paper peer reviewed, I am realizing that our paper needs to sound a bit more like a paper and less like an R guide. I think we put a bit too much time into explaining the workings behind the tests instead of actually interpretting the results in a way that makes sense to a general audience. Moving forward, I am going to look at each section we currently have and try to read it through a more general lense instead of the lense of someone who is in a stats class. I think it would be helpful to read it to a roommate or friend who is not in the class and see if they can understand where I am coming from. I also think we need to include more graphs to help the audience visualize our findings.