Today our main focus was on sections 6.3 and 6.4. Section 6.3 focused on seasonal variation, specifically constant seasonal variation and increasing seasonal variation. Constant seasonal variation means that the magnitude of our upward and downward swings are constant over time. Constant seasonal variation is something that shows up a lot in data and is a normal thing to find. It isn’t a bad thing and is not something that needs to be corrected. Increasing seasonal variation on the other hand, is not something we want to see in our data. When we experience increasing seasonal variation, this means that the magnitude of our swings are not constant over time and need to be corrected. If we find increasing seasonal variation, we can use transformations to try and correct and make it look like constant seasonal variation. The most common transformations we use are log or square root.
In section 6.4, we learned about using dummy variables to model seasonal variation, and started to look at using trig functions as well. Using dummy variables with seasonal variation is similar to using indicator varaibles in multiple linear regression. These dummy variables are a great tool in helping us model seasonal variation.
We can do an example using the air passangers data from the faraway package. We can start by plotting our data without any transformation.
data(airpass, package = "faraway")
plot(pass~year, data = airpass, type = "l")
This plot clearly displays seasonal variation, but our trend is not consistint, as the magnitude of the trend increases over time. This means we have increasing seasonal variation, and need to try to use a transformation to correct for it. One type of transformation we can try is taking the squareroot of the passangers.
plot(pass^.5~year, data = airpass, type = "l")
This looks slightly better, but there is still a shift in magnitude on the right side of the graph. Another transformation we can try is a log transformation on passangers.
plot(log(pass)~year, data = airpass, type = "l")
This is more like what we’re looking for. There is still a clear upward trend in our data over time, but the variation between seasons is constant and resembles constant seasonal variation.
After learning how to correct for increasing seasonal variation, we also covered using dummy variables to help to model our data. This was a bit more of a complex R process, so we used an example produced by Dr. Knudson to explain it. First we took our data and assigned month factors to the years data.
justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")
summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 12 12 12 12 12 12 12 12 12 12 12 12
Then we added these in to the existing data to use in our model building.
airpass$justyear <- justyear
airpass$mofactor <- mofactor
head(airpass)
## pass year justyear mofactor
## 1 112 49.08333 49 Feb
## 2 118 49.16667 49 Mar
## 3 132 49.25000 49 Apr
## 4 129 49.33333 49 May
## 5 121 49.41667 49 Jun
## 6 135 49.50000 49 Jul
Once these had been added in we used them to create our model.
mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
summary(mod)
##
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.156370 -0.041016 0.003677 0.044069 0.132324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.214998 0.081277 -14.949 < 2e-16 ***
## justyear 0.120826 0.001432 84.399 < 2e-16 ***
## mofactorFeb 0.031390 0.024253 1.294 0.198
## mofactorMar 0.019404 0.024253 0.800 0.425
## mofactorApr 0.159700 0.024253 6.585 1.00e-09 ***
## mofactorMay 0.138500 0.024253 5.711 7.19e-08 ***
## mofactorJun 0.146196 0.024253 6.028 1.58e-08 ***
## mofactorJul 0.278411 0.024253 11.480 < 2e-16 ***
## mofactorAug 0.392422 0.024253 16.180 < 2e-16 ***
## mofactorSep 0.393196 0.024253 16.212 < 2e-16 ***
## mofactorOct 0.258630 0.024253 10.664 < 2e-16 ***
## mofactorNov 0.130541 0.024253 5.382 3.28e-07 ***
## mofactorDec -0.003108 0.024253 -0.128 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared: 0.9835, Adjusted R-squared: 0.982
## F-statistic: 649.4 on 12 and 131 DF, p-value: < 2.2e-16
Finally, we plotted our model against the original data to see how well it predicted the trends.
with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")
After lecture today, we also had peer review and partnered up with someone to look at each others paper. I recieved lots of great feedback from Jimmy and my group definitely has some things to go back and change. A lot of Jimmy’s comments were about organizational things and how we could clean up the paper to make things clearer to the reader. One thing he mentioned was that we talked about transformations in our paper, but never went into detail about what we tried and why we didn’t use any sort of transformation. We had some plots detailing the residuals for both our transformed and non-transformed models, but they were relatively small, so his advice was to make those larger and go into more detail about what to look for and why we did what we did.
Another place Jimmy thought there was room for improvement was in the discussion. While we talked about some of our limitations and the potential for future research, we could have done a better job of relating our results back with our lit review. I plan on going back and correcting this by doing a better job of relating our research to the prior research that we found. I will also expalin how our research exapanded off of those studies and advanced research on this topic.