Learning Log 17

Seasonal Variation

Today in class we covered the topic of seasonal variation. We learned that we want our time series model to have constant (or “additive”) seasonal variation, where the magnitude of swing is constant throughout time. In contrast, a model could have increasing seasonal variation where the magnitude of swing increases as time increases. Increasing seasonal variation is bad though, so we would use transformations to crete a better model if ours model had increasing seasonal variation.

We can imagine our time series model to be in the following form:

\[y_t = TR_t + SN_t + {\epsilon_t}\] where:

\(y_t\) is our measurement at time t

\(TR_t\) si our trend at time t

\(SN_t\) is our seasonal factor at time t

\({\epsilon_t}\) is our error factor at time t.

We can write the approximation for this equation in the following form: \[\hat{y_t} = TR_t + SN_t\]

where \(\hat{y_t}\) is the prediction at time t.

Examples

Constant Seasonal Variation

Below is an example of a model with constant seasonal variation. We will use the Mitchell dataset, and we will be comparing our response, temperature (Temp) to our predictor, month (Month)

First, call and attach the data.

library(alr3)
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data(Mitchell)
attach(Mitchell)

Next, let’s plot our data in the fomr of a line graph.

plot(Temp ~ Month, type = "l")

The variation of this data already looks constant, so no transformation is needed.

Increasing Seasonal Variation

Now we will look at an example where increasing seasonal variation is present.

Let’s look at the seasonal variability for the airpass data in the faraway package. Our response is the number of passaengers in thousands (pass), and our predictor is the date (year).

library(faraway)
## 
## Attaching package: 'faraway'
## The following objects are masked from 'package:alr3':
## 
##     cathedral, pipeline, twins
## The following objects are masked from 'package:car':
## 
##     logit, vif
data(airpass)
attach(airpass)
plot(pass ~ year, type= "l")

From the plot we see that this data has increasing seasonal variation. Therefore, we should use some transofrmations to obtain constant seasonal variation.

First, let’s raise our response to varying powers.

par(mfrow = c(2,2))
plot((pass)^.75 ~ year, type= "l")
plot((pass)^.5 ~ year, type= "l")
plot((pass)^.25 ~ year, type= "l")
plot((pass)^.10 ~ year, type= "l")

The graph with pass raised to the smallest power seems to have the most constant seasonalvariation. To compare, let’s plot a model with the natural log of our response.

plot(log(pass) ~ year, type= "l")

This model seems to have pretty constant seasonal variation. Just for comparison, lets try taking the log of our response and predictor.

plot(log(pass) ~ log(year), type= "l")

This looks about the same as the as the graph for the model with the log function only on the response, so, for ease in interpretation, we will use the model with only the log function applied to the response.

Seasonal Variation with Dummy Variables

Now let’s create a model that has year as the trend factor and month (eventually season) as the seasonal factor.

First, we need to separate the year and month in our data.

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear

Now, let’s convert month to a factor (category) instead of a decimal.

mofactor <-factor(round(modecimal*12))

Check your work to see if each month is assigned a number.

cbind(airpass$year, mofactor)
##                 mofactor
##   [1,] 49.08333        2
##   [2,] 49.16667        3
##   [3,] 49.25000        4
##   [4,] 49.33333        5
##   [5,] 49.41667        6
##   [6,] 49.50000        7
##   [7,] 49.58333        8
##   [8,] 49.66667        9
##   [9,] 49.75000       10
##  [10,] 49.83333       11
##  [11,] 49.91667       12
##  [12,] 50.00000        1
##  [13,] 50.08333        2
##  [14,] 50.16667        3
##  [15,] 50.25000        4
##  [16,] 50.33333        5
##  [17,] 50.41667        6
##  [18,] 50.50000        7
##  [19,] 50.58333        8
##  [20,] 50.66667        9
##  [21,] 50.75000       10
##  [22,] 50.83333       11
##  [23,] 50.91667       12
##  [24,] 51.00000        1
##  [25,] 51.08333        2
##  [26,] 51.16667        3
##  [27,] 51.25000        4
##  [28,] 51.33333        5
##  [29,] 51.41667        6
##  [30,] 51.50000        7
##  [31,] 51.58333        8
##  [32,] 51.66667        9
##  [33,] 51.75000       10
##  [34,] 51.83333       11
##  [35,] 51.91667       12
##  [36,] 52.00000        1
##  [37,] 52.08333        2
##  [38,] 52.16667        3
##  [39,] 52.25000        4
##  [40,] 52.33333        5
##  [41,] 52.41667        6
##  [42,] 52.50000        7
##  [43,] 52.58333        8
##  [44,] 52.66667        9
##  [45,] 52.75000       10
##  [46,] 52.83333       11
##  [47,] 52.91667       12
##  [48,] 53.00000        1
##  [49,] 53.08333        2
##  [50,] 53.16667        3
##  [51,] 53.25000        4
##  [52,] 53.33333        5
##  [53,] 53.41667        6
##  [54,] 53.50000        7
##  [55,] 53.58333        8
##  [56,] 53.66667        9
##  [57,] 53.75000       10
##  [58,] 53.83333       11
##  [59,] 53.91667       12
##  [60,] 54.00000        1
##  [61,] 54.08333        2
##  [62,] 54.16667        3
##  [63,] 54.25000        4
##  [64,] 54.33333        5
##  [65,] 54.41667        6
##  [66,] 54.50000        7
##  [67,] 54.58333        8
##  [68,] 54.66667        9
##  [69,] 54.75000       10
##  [70,] 54.83333       11
##  [71,] 54.91667       12
##  [72,] 55.00000        1
##  [73,] 55.08333        2
##  [74,] 55.16667        3
##  [75,] 55.25000        4
##  [76,] 55.33333        5
##  [77,] 55.41667        6
##  [78,] 55.50000        7
##  [79,] 55.58333        8
##  [80,] 55.66667        9
##  [81,] 55.75000       10
##  [82,] 55.83333       11
##  [83,] 55.91667       12
##  [84,] 56.00000        1
##  [85,] 56.08333        2
##  [86,] 56.16667        3
##  [87,] 56.25000        4
##  [88,] 56.33333        5
##  [89,] 56.41667        6
##  [90,] 56.50000        7
##  [91,] 56.58333        8
##  [92,] 56.66667        9
##  [93,] 56.75000       10
##  [94,] 56.83333       11
##  [95,] 56.91667       12
##  [96,] 57.00000        1
##  [97,] 57.08333        2
##  [98,] 57.16667        3
##  [99,] 57.25000        4
## [100,] 57.33333        5
## [101,] 57.41667        6
## [102,] 57.50000        7
## [103,] 57.58333        8
## [104,] 57.66667        9
## [105,] 57.75000       10
## [106,] 57.83333       11
## [107,] 57.91667       12
## [108,] 58.00000        1
## [109,] 58.08333        2
## [110,] 58.16667        3
## [111,] 58.25000        4
## [112,] 58.33333        5
## [113,] 58.41667        6
## [114,] 58.50000        7
## [115,] 58.58333        8
## [116,] 58.66667        9
## [117,] 58.75000       10
## [118,] 58.83333       11
## [119,] 58.91667       12
## [120,] 59.00000        1
## [121,] 59.08333        2
## [122,] 59.16667        3
## [123,] 59.25000        4
## [124,] 59.33333        5
## [125,] 59.41667        6
## [126,] 59.50000        7
## [127,] 59.58333        8
## [128,] 59.66667        9
## [129,] 59.75000       10
## [130,] 59.83333       11
## [131,] 59.91667       12
## [132,] 60.00000        1
## [133,] 60.08333        2
## [134,] 60.16667        3
## [135,] 60.25000        4
## [136,] 60.33333        5
## [137,] 60.41667        6
## [138,] 60.50000        7
## [139,] 60.58333        8
## [140,] 60.66667        9
## [141,] 60.75000       10
## [142,] 60.83333       11
## [143,] 60.91667       12
## [144,] 61.00000        1

Yep, so far so good.

Now, let’s change to the month numbers to the month names. Let’s also check to make sure that there’s 12 obersvations for each month.

levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")  
summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##  12  12  12  12  12  12  12  12  12  12  12  12

The months are named correctly.

Again, let’s check our work. We’ll check the first few, the last few, and one in the middle of our data to make sure that it matches the original.

# Compare the beginning few
head(mofactor)      
## [1] Feb Mar Apr May Jun Jul
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
head(airpass$year)
## [1] 49.08333 49.16667 49.25000 49.33333 49.41667 49.50000
# Compare the end few
tail(mofactor)
## [1] Aug Sep Oct Nov Dec Jan
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
tail(airpass$year)
## [1] 60.58333 60.66667 60.75000 60.83333 60.91667 61.00000
# Randomly select some to check (because middle could be wrong)
randoms <- sample(1:nrow(airpass), 10, replace=FALSE)
mofactor[randoms]
##  [1] May May Apr Jan May Oct Feb Dec Mar Jan
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
airpass[randoms,"year"]
##  [1] 58.33333 50.33333 56.25000 51.00000 53.33333 57.75000 55.08333
##  [8] 57.91667 52.16667 56.00000

Again, we’re good.

Next, let’s add new variable names to create our new model.

airpass$justyear <- justyear
airpass$mofactor <- mofactor

Now let’s createe our model with month as our dummy variable.

mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(mod)
##  (Intercept)     justyear  mofactorFeb  mofactorMar  mofactorApr 
## -1.214997885  0.120825657  0.031389870  0.019403852  0.159699778 
##  mofactorMay  mofactorJun  mofactorJul  mofactorAug  mofactorSep 
##  0.138499729  0.146195892  0.278410898  0.392422029  0.393195995 
##  mofactorOct  mofactorNov  mofactorDec 
##  0.258630198  0.130540762 -0.003108143
summary(mod)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

Our summary tells us that the summer months have more passengers than the winter months because the summer months have low p-values and high coefficients, so the number of pssengers in the summer is are significantly different than the number of passengers in the winter.

We now that winter months are similar to January becasue they have high p-values and are insignificant.

Now, we can plots the data agains our new model.

with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")

Since we see that there seems to be similarity in the number of passengers among the different seasons, let’s create a new factor: season.

season <- airpass$mofactor
levels(season) <- list(Winter = c("Dec", "Jan", "Feb"), 
                       Spring = c("Mar", "Apr", "May"),
                       Summer = c("Jun", "Jul", "Aug"),
                       Fall = c("Sep", "Oct", "Nov"))

Now let’s use the summary command to make sure that we have 36 observations in each season. Let’s also look at the first few entries in our dataset to make sure that the season is included.

summary(season)
## Winter Spring Summer   Fall 
##     36     36     36     36
airpass$season <- season
head(airpass)
##   pass     year justyear mofactor season
## 1  112 49.08333       49      Feb Winter
## 2  118 49.16667       49      Mar Spring
## 3  132 49.25000       49      Apr Spring
## 4  129 49.33333       49      May Spring
## 5  121 49.41667       49      Jun Summer
## 6  135 49.50000       49      Jul Summer

Now we can create a model with season as the seasonal component.

modseason <- lm(log(pass) ~ justyear + season, data=airpass)

Finally, let’s plot the date, along with both of our models.

with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")
lines(airpass$year, modseason$fitted.values,col="red")

We see that all three models have constant seasonal variation.

Paper Peer Review

During the in-class peer review session I received a few pieces of advice that will help my group and I revise our paper. Overall, the feedback that I received was very positive, and there aren’t any fundamental issues that my group and I will need to change. My peer review partner said that our paper made sense and was well written. My partner did mention that the paper was very dense with a lot of numbers, so my group and I will try to format our paper in a way that makes it easier for the reader to read. Also, my group and I created and described SLR models between our response and each predictor, but we didn’t include this coding in the appendix. Thus, we will add that. Third, we we used smooth scatter plots in our paper, so we will explain how these plots should be interpretted. Lastly, my group and I didn’t explain how we created our model, so we will explain that we used the process of backward elimination.