On class last Thursday we reviewed topics in 6.3 and 6.4. 6.3 was specifically focused on seasonal variation. We compared data sets to observe two types of seasonal variation, constant and increasing. Ideally we would like to see data sets that show constant variation, which means that the magnitude of the observed variation stays constant over time. The increasing variation, as you could imagine, would show an increasing variation in magnitude over time. We can build models for both of these types, but the constant variation is much easier and more accurate to model. A common technique is to perform transformations that can make the data appear like a constant variation model.
Below is an example of a data set with an increasing seasonal variatin
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
library(alr3)
data(airpass)
attach(airpass)
head(airpass)
## pass year
## 1 112 49.08333
## 2 118 49.16667
## 3 132 49.25000
## 4 129 49.33333
## 5 121 49.41667
## 6 135 49.50000
plot(year,pass, type ="l")
##Notice the increasing variance as time increases. We can classify it as a increasing seasonal variance. We perform a transformation to get a more constant plot
plot(log(year),log(pass),type="l")
##The log transformation appears to give it a better constant variance appearance
Creating seasonal models
year1 <- floor(airpass$year)
monthvar <- airpass$year - year1
monthfactor <-factor(round(monthvar*12))
cbind(airpass$year, monthvar)
## monthvar
## [1,] 49.08333 0.08333333
## [2,] 49.16667 0.16666667
## [3,] 49.25000 0.25000000
## [4,] 49.33333 0.33333333
## [5,] 49.41667 0.41666667
## [6,] 49.50000 0.50000000
## [7,] 49.58333 0.58333333
## [8,] 49.66667 0.66666667
## [9,] 49.75000 0.75000000
## [10,] 49.83333 0.83333333
## [11,] 49.91667 0.91666667
## [12,] 50.00000 0.00000000
## [13,] 50.08333 0.08333333
## [14,] 50.16667 0.16666667
## [15,] 50.25000 0.25000000
## [16,] 50.33333 0.33333333
## [17,] 50.41667 0.41666667
## [18,] 50.50000 0.50000000
## [19,] 50.58333 0.58333333
## [20,] 50.66667 0.66666667
## [21,] 50.75000 0.75000000
## [22,] 50.83333 0.83333333
## [23,] 50.91667 0.91666667
## [24,] 51.00000 0.00000000
## [25,] 51.08333 0.08333333
## [26,] 51.16667 0.16666667
## [27,] 51.25000 0.25000000
## [28,] 51.33333 0.33333333
## [29,] 51.41667 0.41666667
## [30,] 51.50000 0.50000000
## [31,] 51.58333 0.58333333
## [32,] 51.66667 0.66666667
## [33,] 51.75000 0.75000000
## [34,] 51.83333 0.83333333
## [35,] 51.91667 0.91666667
## [36,] 52.00000 0.00000000
## [37,] 52.08333 0.08333333
## [38,] 52.16667 0.16666667
## [39,] 52.25000 0.25000000
## [40,] 52.33333 0.33333333
## [41,] 52.41667 0.41666667
## [42,] 52.50000 0.50000000
## [43,] 52.58333 0.58333333
## [44,] 52.66667 0.66666667
## [45,] 52.75000 0.75000000
## [46,] 52.83333 0.83333333
## [47,] 52.91667 0.91666667
## [48,] 53.00000 0.00000000
## [49,] 53.08333 0.08333333
## [50,] 53.16667 0.16666667
## [51,] 53.25000 0.25000000
## [52,] 53.33333 0.33333333
## [53,] 53.41667 0.41666667
## [54,] 53.50000 0.50000000
## [55,] 53.58333 0.58333333
## [56,] 53.66667 0.66666667
## [57,] 53.75000 0.75000000
## [58,] 53.83333 0.83333333
## [59,] 53.91667 0.91666667
## [60,] 54.00000 0.00000000
## [61,] 54.08333 0.08333333
## [62,] 54.16667 0.16666667
## [63,] 54.25000 0.25000000
## [64,] 54.33333 0.33333333
## [65,] 54.41667 0.41666667
## [66,] 54.50000 0.50000000
## [67,] 54.58333 0.58333333
## [68,] 54.66667 0.66666667
## [69,] 54.75000 0.75000000
## [70,] 54.83333 0.83333333
## [71,] 54.91667 0.91666667
## [72,] 55.00000 0.00000000
## [73,] 55.08333 0.08333333
## [74,] 55.16667 0.16666667
## [75,] 55.25000 0.25000000
## [76,] 55.33333 0.33333333
## [77,] 55.41667 0.41666667
## [78,] 55.50000 0.50000000
## [79,] 55.58333 0.58333333
## [80,] 55.66667 0.66666667
## [81,] 55.75000 0.75000000
## [82,] 55.83333 0.83333333
## [83,] 55.91667 0.91666667
## [84,] 56.00000 0.00000000
## [85,] 56.08333 0.08333333
## [86,] 56.16667 0.16666667
## [87,] 56.25000 0.25000000
## [88,] 56.33333 0.33333333
## [89,] 56.41667 0.41666667
## [90,] 56.50000 0.50000000
## [91,] 56.58333 0.58333333
## [92,] 56.66667 0.66666667
## [93,] 56.75000 0.75000000
## [94,] 56.83333 0.83333333
## [95,] 56.91667 0.91666667
## [96,] 57.00000 0.00000000
## [97,] 57.08333 0.08333333
## [98,] 57.16667 0.16666667
## [99,] 57.25000 0.25000000
## [100,] 57.33333 0.33333333
## [101,] 57.41667 0.41666667
## [102,] 57.50000 0.50000000
## [103,] 57.58333 0.58333333
## [104,] 57.66667 0.66666667
## [105,] 57.75000 0.75000000
## [106,] 57.83333 0.83333333
## [107,] 57.91667 0.91666667
## [108,] 58.00000 0.00000000
## [109,] 58.08333 0.08333333
## [110,] 58.16667 0.16666667
## [111,] 58.25000 0.25000000
## [112,] 58.33333 0.33333333
## [113,] 58.41667 0.41666667
## [114,] 58.50000 0.50000000
## [115,] 58.58333 0.58333333
## [116,] 58.66667 0.66666667
## [117,] 58.75000 0.75000000
## [118,] 58.83333 0.83333333
## [119,] 58.91667 0.91666667
## [120,] 59.00000 0.00000000
## [121,] 59.08333 0.08333333
## [122,] 59.16667 0.16666667
## [123,] 59.25000 0.25000000
## [124,] 59.33333 0.33333333
## [125,] 59.41667 0.41666667
## [126,] 59.50000 0.50000000
## [127,] 59.58333 0.58333333
## [128,] 59.66667 0.66666667
## [129,] 59.75000 0.75000000
## [130,] 59.83333 0.83333333
## [131,] 59.91667 0.91666667
## [132,] 60.00000 0.00000000
## [133,] 60.08333 0.08333333
## [134,] 60.16667 0.16666667
## [135,] 60.25000 0.25000000
## [136,] 60.33333 0.33333333
## [137,] 60.41667 0.41666667
## [138,] 60.50000 0.50000000
## [139,] 60.58333 0.58333333
## [140,] 60.66667 0.66666667
## [141,] 60.75000 0.75000000
## [142,] 60.83333 0.83333333
## [143,] 60.91667 0.91666667
## [144,] 61.00000 0.00000000
# Changes levels of factor to be month names
levels(monthvar) <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")
summary(monthvar)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2292 0.4583 0.4583 0.6875 0.9167
mod <- lm(log(pass) ~ year1+monthfactor)
coef(mod)
## (Intercept) year1 monthfactor1 monthfactor2 monthfactor3
## -1.214997885 0.120825657 0.031389870 0.019403852 0.159699778
## monthfactor4 monthfactor5 monthfactor6 monthfactor7 monthfactor8
## 0.138499729 0.146195892 0.278410898 0.392422029 0.393195995
## monthfactor9 monthfactor10 monthfactor11
## 0.258630198 0.130540762 -0.003108143
summary(mod)
##
## Call:
## lm(formula = log(pass) ~ year1 + monthfactor)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.156370 -0.041016 0.003677 0.044069 0.132324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.214998 0.081277 -14.949 < 2e-16 ***
## year1 0.120826 0.001432 84.399 < 2e-16 ***
## monthfactor1 0.031390 0.024253 1.294 0.198
## monthfactor2 0.019404 0.024253 0.800 0.425
## monthfactor3 0.159700 0.024253 6.585 1.00e-09 ***
## monthfactor4 0.138500 0.024253 5.711 7.19e-08 ***
## monthfactor5 0.146196 0.024253 6.028 1.58e-08 ***
## monthfactor6 0.278411 0.024253 11.480 < 2e-16 ***
## monthfactor7 0.392422 0.024253 16.180 < 2e-16 ***
## monthfactor8 0.393196 0.024253 16.212 < 2e-16 ***
## monthfactor9 0.258630 0.024253 10.664 < 2e-16 ***
## monthfactor10 0.130541 0.024253 5.382 3.28e-07 ***
## monthfactor11 -0.003108 0.024253 -0.128 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared: 0.9835, Adjusted R-squared: 0.982
## F-statistic: 649.4 on 12 and 131 DF, p-value: < 2.2e-16
plot(log(pass)~year,type="l")
lines(airpass$year, mod$fitted.values, col= "blue")
We can see that this new transformed model does a fairly well job at matching our data.