In class today we looked at seasonal variation and how to model seasonal variation. See below for comments and summaries. #Working with Seasonal Variation
library(alr3)
## Warning: package 'alr3' was built under R version 3.4.4
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
##
## Attaching package: 'faraway'
## The following objects are masked from 'package:alr3':
##
## cathedral, pipeline, twins
## The following objects are masked from 'package:car':
##
## logit, vif
#library(airpass)
data(Mitchell)
names(Mitchell)
## [1] "Month" "Temp"
head(Mitchell)
## Month Temp
## 1 0 -5.18333
## 2 1 -1.65000
## 3 2 2.49444
## 4 3 10.40000
## 5 4 14.99440
## 6 5 21.71670
with(Mitchell, plot(Temp ~ Month, type = "l"))
with(airpass, plot(pass~year, type = "l"))
with(Mitchell, plot(log(Temp) ~ Month, type = "l"))
## Warning in log(Temp): NaNs produced
with(Mitchell, plot(Temp^-1 ~ Month, type = "l"))
with(airpass, plot(log(pass)~year, type = "l"))#This is base e
with(airpass, plot(log(pass,10)~year, type = "l"))
with(airpass, plot(pass^-1~year, type = "l"))
with(airpass, plot(pass^.75~year, type = "l"))
with(airpass, plot(pass^.5~year, type = "l"))
When you have a really small value for the exponent, it gets hard to interpret. So if you have a model that looks similar, but one is easier to interpret, use the model that is easier to interpret even if it isn’t 100% the best.
with(airpass, plot(log(pass)~year, type = "l"))#This is base e
head(airpass)
## pass year
## 1 112 49.08333
## 2 118 49.16667
## 3 132 49.25000
## 4 129 49.33333
## 5 121 49.41667
## 6 135 49.50000
Date is expressed as a decimal: 49.08333 means the 1949, the second month (february)
Dr. Knudson made a program that converts these numbers to actual months and days. When we run the program, we get a measure of all the months and how they relate to january. (January is the default because it is the first one, but this could be set)
The winter months are not super different from each other. None of the winter months are significantly different than January. So it would make sense to split data by season rather than by month to make the model more simple.
There is a way to collapse variables together. For example, if you have “yes” “Y” “y” and “yea”, we could collapse all of these together to clean them up.
####################################################################
# Need to get just the month
justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear #month as a decimal
mofactor <-factor(round(modecimal*12)) #month as a factor ie categorical
# Check your work so far
cbind(airpass$year, mofactor)
## mofactor
## [1,] 49.08333 2
## [2,] 49.16667 3
## [3,] 49.25000 4
## [4,] 49.33333 5
## [5,] 49.41667 6
## [6,] 49.50000 7
## [7,] 49.58333 8
## [8,] 49.66667 9
## [9,] 49.75000 10
## [10,] 49.83333 11
## [11,] 49.91667 12
## [12,] 50.00000 1
## [13,] 50.08333 2
## [14,] 50.16667 3
## [15,] 50.25000 4
## [16,] 50.33333 5
## [17,] 50.41667 6
## [18,] 50.50000 7
## [19,] 50.58333 8
## [20,] 50.66667 9
## [21,] 50.75000 10
## [22,] 50.83333 11
## [23,] 50.91667 12
## [24,] 51.00000 1
## [25,] 51.08333 2
## [26,] 51.16667 3
## [27,] 51.25000 4
## [28,] 51.33333 5
## [29,] 51.41667 6
## [30,] 51.50000 7
## [31,] 51.58333 8
## [32,] 51.66667 9
## [33,] 51.75000 10
## [34,] 51.83333 11
## [35,] 51.91667 12
## [36,] 52.00000 1
## [37,] 52.08333 2
## [38,] 52.16667 3
## [39,] 52.25000 4
## [40,] 52.33333 5
## [41,] 52.41667 6
## [42,] 52.50000 7
## [43,] 52.58333 8
## [44,] 52.66667 9
## [45,] 52.75000 10
## [46,] 52.83333 11
## [47,] 52.91667 12
## [48,] 53.00000 1
## [49,] 53.08333 2
## [50,] 53.16667 3
## [51,] 53.25000 4
## [52,] 53.33333 5
## [53,] 53.41667 6
## [54,] 53.50000 7
## [55,] 53.58333 8
## [56,] 53.66667 9
## [57,] 53.75000 10
## [58,] 53.83333 11
## [59,] 53.91667 12
## [60,] 54.00000 1
## [61,] 54.08333 2
## [62,] 54.16667 3
## [63,] 54.25000 4
## [64,] 54.33333 5
## [65,] 54.41667 6
## [66,] 54.50000 7
## [67,] 54.58333 8
## [68,] 54.66667 9
## [69,] 54.75000 10
## [70,] 54.83333 11
## [71,] 54.91667 12
## [72,] 55.00000 1
## [73,] 55.08333 2
## [74,] 55.16667 3
## [75,] 55.25000 4
## [76,] 55.33333 5
## [77,] 55.41667 6
## [78,] 55.50000 7
## [79,] 55.58333 8
## [80,] 55.66667 9
## [81,] 55.75000 10
## [82,] 55.83333 11
## [83,] 55.91667 12
## [84,] 56.00000 1
## [85,] 56.08333 2
## [86,] 56.16667 3
## [87,] 56.25000 4
## [88,] 56.33333 5
## [89,] 56.41667 6
## [90,] 56.50000 7
## [91,] 56.58333 8
## [92,] 56.66667 9
## [93,] 56.75000 10
## [94,] 56.83333 11
## [95,] 56.91667 12
## [96,] 57.00000 1
## [97,] 57.08333 2
## [98,] 57.16667 3
## [99,] 57.25000 4
## [100,] 57.33333 5
## [101,] 57.41667 6
## [102,] 57.50000 7
## [103,] 57.58333 8
## [104,] 57.66667 9
## [105,] 57.75000 10
## [106,] 57.83333 11
## [107,] 57.91667 12
## [108,] 58.00000 1
## [109,] 58.08333 2
## [110,] 58.16667 3
## [111,] 58.25000 4
## [112,] 58.33333 5
## [113,] 58.41667 6
## [114,] 58.50000 7
## [115,] 58.58333 8
## [116,] 58.66667 9
## [117,] 58.75000 10
## [118,] 58.83333 11
## [119,] 58.91667 12
## [120,] 59.00000 1
## [121,] 59.08333 2
## [122,] 59.16667 3
## [123,] 59.25000 4
## [124,] 59.33333 5
## [125,] 59.41667 6
## [126,] 59.50000 7
## [127,] 59.58333 8
## [128,] 59.66667 9
## [129,] 59.75000 10
## [130,] 59.83333 11
## [131,] 59.91667 12
## [132,] 60.00000 1
## [133,] 60.08333 2
## [134,] 60.16667 3
## [135,] 60.25000 4
## [136,] 60.33333 5
## [137,] 60.41667 6
## [138,] 60.50000 7
## [139,] 60.58333 8
## [140,] 60.66667 9
## [141,] 60.75000 10
## [142,] 60.83333 11
## [143,] 60.91667 12
## [144,] 61.00000 1
# Changes levels of factor to be mo names
levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec")
summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 12 12 12 12 12 12 12 12 12 12 12 12
##################################
# Again, check work
# Compare the beginning few
head(mofactor)
## [1] Feb Mar Apr May Jun Jul
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
head(airpass$year)
## [1] 49.08333 49.16667 49.25000 49.33333 49.41667 49.50000
# Compare the end few
tail(mofactor)
## [1] Aug Sep Oct Nov Dec Jan
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
tail(airpass$year)
## [1] 60.58333 60.66667 60.75000 60.83333 60.91667 61.00000
# Randomly select some to check (because middle could be wrong)
randoms <- sample(1:nrow(airpass), 10, replace=FALSE)
mofactor[randoms]
## [1] Jun Mar Sep Apr Jun Feb Oct Mar Jan May
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
airpass[randoms,"year"]
## [1] 56.41667 57.16667 59.66667 58.25000 55.41667 59.08333 52.75000
## [8] 51.16667 60.00000 59.33333
##################################
# Add new variables into airpass (optional)
airpass$justyear <- justyear
airpass$mofactor <- mofactor
##################################
# Create model using dummy for each month
mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(mod)
## (Intercept) justyear mofactorFeb mofactorMar mofactorApr
## -1.214997885 0.120825657 0.031389870 0.019403852 0.159699778
## mofactorMay mofactorJun mofactorJul mofactorAug mofactorSep
## 0.138499729 0.146195892 0.278410898 0.392422029 0.393195995
## mofactorOct mofactorNov mofactorDec
## 0.258630198 0.130540762 -0.003108143
summary(mod)
##
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.156370 -0.041016 0.003677 0.044069 0.132324
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.214998 0.081277 -14.949 < 2e-16 ***
## justyear 0.120826 0.001432 84.399 < 2e-16 ***
## mofactorFeb 0.031390 0.024253 1.294 0.198
## mofactorMar 0.019404 0.024253 0.800 0.425
## mofactorApr 0.159700 0.024253 6.585 1.00e-09 ***
## mofactorMay 0.138500 0.024253 5.711 7.19e-08 ***
## mofactorJun 0.146196 0.024253 6.028 1.58e-08 ***
## mofactorJul 0.278411 0.024253 11.480 < 2e-16 ***
## mofactorAug 0.392422 0.024253 16.180 < 2e-16 ***
## mofactorSep 0.393196 0.024253 16.212 < 2e-16 ***
## mofactorOct 0.258630 0.024253 10.664 < 2e-16 ***
## mofactorNov 0.130541 0.024253 5.382 3.28e-07 ***
## mofactorDec -0.003108 0.024253 -0.128 0.898
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared: 0.9835, Adjusted R-squared: 0.982
## F-statistic: 649.4 on 12 and 131 DF, p-value: < 2.2e-16
# How can we tell the summer months have more passengers than January?
# How can we tell the winter months are similar to January wrt number of passengers?
with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")
We can do a similar thing for seasons. Pick a season and chose which months you want to collapse into that season.
Winter:December, January, February Spring:September, October, November Summer:June, July, August Fall:March, April, May
# Create a new factor: season (winter, spring, summer, fall)
# Then create model with these 4 dummies (rather than the 12)
# Hint!! You can collapse several categories into one. Use the example below.
# We start with "messy" data. We know that Y and Yes are the same.
x <- c("Y", "Y", "Yes", "N", "No")
x <- factor(x)
levels(x) <- list(Yes=c("Y", "Yes"), No=c("N", "No"))
x
## [1] Yes Yes Yes No No
## Levels: Yes No
Here is the seasons variable that splits the months into seasons.
seasons <- airpass$mofactor
levels(seasons) <- list(winter=c("Dec", "Jan", "Feb"), spring = c("Mar", "Apr","May"), summer = c("Jun","Jul","Aug" ), fall = c("Sep","Oct", "Nov"))
seasons
## [1] winter spring spring spring summer summer summer fall fall fall
## [11] winter winter winter spring spring spring summer summer summer fall
## [21] fall fall winter winter winter spring spring spring summer summer
## [31] summer fall fall fall winter winter winter spring spring spring
## [41] summer summer summer fall fall fall winter winter winter spring
## [51] spring spring summer summer summer fall fall fall winter winter
## [61] winter spring spring spring summer summer summer fall fall fall
## [71] winter winter winter spring spring spring summer summer summer fall
## [81] fall fall winter winter winter spring spring spring summer summer
## [91] summer fall fall fall winter winter winter spring spring spring
## [101] summer summer summer fall fall fall winter winter winter spring
## [111] spring spring summer summer summer fall fall fall winter winter
## [121] winter spring spring spring summer summer summer fall fall fall
## [131] winter winter winter spring spring spring summer summer summer fall
## [141] fall fall winter winter
## Levels: winter spring summer fall
summary(seasons)
## winter spring summer fall
## 36 36 36 36
The anova test is run and the p-value is small, so we should have the model with all the months not with the seasons.