In class on Friday, we learned how to model seasonal variation and conducted peer review for our group projects.

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.4
data(airpass)

Graphing the airpass time series data, we can see that the seasonal variation increases in magnitude as time goes on. That means that we are dealing with Increasing Seasonal variation. To fix this, we must transform the data.

with(airpass, plot(pass~year, type = "l"))

with(airpass, plot(log(pass)~year, type = "l"))

Doing a natural log of the passenger count makes our seasonal variation constant.

justyear <- floor(airpass$year)
modecimal <- airpass$year - justyear
mofactor <-factor(round(modecimal*12))

Follow the above commands to turn decimal years into years and months.

cbind(airpass$year, mofactor)
##                 mofactor
##   [1,] 49.08333        2
##   [2,] 49.16667        3
##   [3,] 49.25000        4
##   [4,] 49.33333        5
##   [5,] 49.41667        6
##   [6,] 49.50000        7
##   [7,] 49.58333        8
##   [8,] 49.66667        9
##   [9,] 49.75000       10
##  [10,] 49.83333       11
##  [11,] 49.91667       12
##  [12,] 50.00000        1
##  [13,] 50.08333        2
##  [14,] 50.16667        3
##  [15,] 50.25000        4
##  [16,] 50.33333        5
##  [17,] 50.41667        6
##  [18,] 50.50000        7
##  [19,] 50.58333        8
##  [20,] 50.66667        9
##  [21,] 50.75000       10
##  [22,] 50.83333       11
##  [23,] 50.91667       12
##  [24,] 51.00000        1
##  [25,] 51.08333        2
##  [26,] 51.16667        3
##  [27,] 51.25000        4
##  [28,] 51.33333        5
##  [29,] 51.41667        6
##  [30,] 51.50000        7
##  [31,] 51.58333        8
##  [32,] 51.66667        9
##  [33,] 51.75000       10
##  [34,] 51.83333       11
##  [35,] 51.91667       12
##  [36,] 52.00000        1
##  [37,] 52.08333        2
##  [38,] 52.16667        3
##  [39,] 52.25000        4
##  [40,] 52.33333        5
##  [41,] 52.41667        6
##  [42,] 52.50000        7
##  [43,] 52.58333        8
##  [44,] 52.66667        9
##  [45,] 52.75000       10
##  [46,] 52.83333       11
##  [47,] 52.91667       12
##  [48,] 53.00000        1
##  [49,] 53.08333        2
##  [50,] 53.16667        3
##  [51,] 53.25000        4
##  [52,] 53.33333        5
##  [53,] 53.41667        6
##  [54,] 53.50000        7
##  [55,] 53.58333        8
##  [56,] 53.66667        9
##  [57,] 53.75000       10
##  [58,] 53.83333       11
##  [59,] 53.91667       12
##  [60,] 54.00000        1
##  [61,] 54.08333        2
##  [62,] 54.16667        3
##  [63,] 54.25000        4
##  [64,] 54.33333        5
##  [65,] 54.41667        6
##  [66,] 54.50000        7
##  [67,] 54.58333        8
##  [68,] 54.66667        9
##  [69,] 54.75000       10
##  [70,] 54.83333       11
##  [71,] 54.91667       12
##  [72,] 55.00000        1
##  [73,] 55.08333        2
##  [74,] 55.16667        3
##  [75,] 55.25000        4
##  [76,] 55.33333        5
##  [77,] 55.41667        6
##  [78,] 55.50000        7
##  [79,] 55.58333        8
##  [80,] 55.66667        9
##  [81,] 55.75000       10
##  [82,] 55.83333       11
##  [83,] 55.91667       12
##  [84,] 56.00000        1
##  [85,] 56.08333        2
##  [86,] 56.16667        3
##  [87,] 56.25000        4
##  [88,] 56.33333        5
##  [89,] 56.41667        6
##  [90,] 56.50000        7
##  [91,] 56.58333        8
##  [92,] 56.66667        9
##  [93,] 56.75000       10
##  [94,] 56.83333       11
##  [95,] 56.91667       12
##  [96,] 57.00000        1
##  [97,] 57.08333        2
##  [98,] 57.16667        3
##  [99,] 57.25000        4
## [100,] 57.33333        5
## [101,] 57.41667        6
## [102,] 57.50000        7
## [103,] 57.58333        8
## [104,] 57.66667        9
## [105,] 57.75000       10
## [106,] 57.83333       11
## [107,] 57.91667       12
## [108,] 58.00000        1
## [109,] 58.08333        2
## [110,] 58.16667        3
## [111,] 58.25000        4
## [112,] 58.33333        5
## [113,] 58.41667        6
## [114,] 58.50000        7
## [115,] 58.58333        8
## [116,] 58.66667        9
## [117,] 58.75000       10
## [118,] 58.83333       11
## [119,] 58.91667       12
## [120,] 59.00000        1
## [121,] 59.08333        2
## [122,] 59.16667        3
## [123,] 59.25000        4
## [124,] 59.33333        5
## [125,] 59.41667        6
## [126,] 59.50000        7
## [127,] 59.58333        8
## [128,] 59.66667        9
## [129,] 59.75000       10
## [130,] 59.83333       11
## [131,] 59.91667       12
## [132,] 60.00000        1
## [133,] 60.08333        2
## [134,] 60.16667        3
## [135,] 60.25000        4
## [136,] 60.33333        5
## [137,] 60.41667        6
## [138,] 60.50000        7
## [139,] 60.58333        8
## [140,] 60.66667        9
## [141,] 60.75000       10
## [142,] 60.83333       11
## [143,] 60.91667       12
## [144,] 61.00000        1

Now, we want to assign categorical labels to each month number.

levels(mofactor) <- c("Jan", "Feb", "Mar", "Apr", "May", 
                      "Jun", "Jul", "Aug", "Sep", "Oct",
                      "Nov", "Dec")  
summary(mofactor)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
##  12  12  12  12  12  12  12  12  12  12  12  12

Great, there are exactly 12 data points for each month.

To check your work, select 10 months at random and see if they are correct.

randoms <- sample(1:nrow(airpass), 10, replace=FALSE)
mofactor[randoms]
##  [1] Jan Nov Apr Mar Apr Sep Dec May May Apr
## Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
airpass[randoms,"year"]
##  [1] 55.00000 51.83333 51.25000 60.16667 57.25000 57.66667 59.91667
##  [8] 54.33333 59.33333 54.25000

To add the variables into the data:

airpass$justyear <- justyear
airpass$mofactor <- mofactor

Now, just flow the new variables thru as predictors.

mod <- lm(log(pass) ~ justyear + mofactor, data=airpass)
coef(mod)
##  (Intercept)     justyear  mofactorFeb  mofactorMar  mofactorApr 
## -1.214997885  0.120825657  0.031389870  0.019403852  0.159699778 
##  mofactorMay  mofactorJun  mofactorJul  mofactorAug  mofactorSep 
##  0.138499729  0.146195892  0.278410898  0.392422029  0.393195995 
##  mofactorOct  mofactorNov  mofactorDec 
##  0.258630198  0.130540762 -0.003108143
summary(mod)
## 
## Call:
## lm(formula = log(pass) ~ justyear + mofactor, data = airpass)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.156370 -0.041016  0.003677  0.044069  0.132324 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.214998   0.081277 -14.949  < 2e-16 ***
## justyear     0.120826   0.001432  84.399  < 2e-16 ***
## mofactorFeb  0.031390   0.024253   1.294    0.198    
## mofactorMar  0.019404   0.024253   0.800    0.425    
## mofactorApr  0.159700   0.024253   6.585 1.00e-09 ***
## mofactorMay  0.138500   0.024253   5.711 7.19e-08 ***
## mofactorJun  0.146196   0.024253   6.028 1.58e-08 ***
## mofactorJul  0.278411   0.024253  11.480  < 2e-16 ***
## mofactorAug  0.392422   0.024253  16.180  < 2e-16 ***
## mofactorSep  0.393196   0.024253  16.212  < 2e-16 ***
## mofactorOct  0.258630   0.024253  10.664  < 2e-16 ***
## mofactorNov  0.130541   0.024253   5.382 3.28e-07 ***
## mofactorDec -0.003108   0.024253  -0.128    0.898    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0593 on 131 degrees of freedom
## Multiple R-squared:  0.9835, Adjusted R-squared:  0.982 
## F-statistic: 649.4 on 12 and 131 DF,  p-value: < 2.2e-16

How can we tell the summer months have more passengers than January? The point estimate for the slope for summer month predictors is higher than those of winter months.

How can we tell the winter months are similar to January wrt number of passengers? They are less significant to the model because the intercept already includes January, so by adding February, March, and December, we are not contributing any useful information to the model.

with(airpass, plot(log(pass)~ year, type="l" ))
lines(airpass$year, mod$fitted.values,col="blue")

We are predicting very well.

Peer Review of Group Project Paper

We received a lot of feedback in our peer review session. I do not think that we had as complete of a draft as we should have prior to a peer review session, so much of the information was obvious to me, but we did come away with a few things to try in our project. Our lit review was not as related to our project mission as it could have been. We had a long paragraph on how movie studios do “cold openings” of movies to avoid being ripped apart by film critics, but our project was more about predicting critic behavior rather than movie performance. Our methods section included all of our predictor variables, but left out the response, which made our reviewers confused as to what we were predicting. We had too many plots cluttering up the page and not enough interpretation of the graphs. Something we definitely need to add is the checking of the regression assumptions. We received positive feedback on the discussion section of our paper, but I fear that it may be too long and wordy. When we have all of the changes made in our paper, that will be the first section that we will trim down. Our next step is to create a good presentation on this topic and I think that through preparing for a presentation, we will also get a sense of what else needs to be included in our paper. Some other next steps include giving our paper a table of contents and more readability.