This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
# Q1. What variables will likely affect/predict the total number of bicycles used (cnt)? Describe examples of at least 3 that may have an effect and explain why.
setwd("/Users/Fisher/Desktop/BANA 288 Predictive Analytics/HW4 Time Series Regression and Decomposition")
bike <- read.csv("hw4_bike_share_day.csv")
str(bike)
## 'data.frame': 731 obs. of 14 variables:
## $ dayID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ dteday : Factor w/ 731 levels "1/1/2011","1/1/2012",..: 1 23 45 51 53 55 57 59 61 3 ...
## $ season : int 1 1 1 1 1 1 1 1 1 1 ...
## $ yr : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mnth : int 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int 0 0 0 0 0 0 0 0 0 0 ...
## $ weekday : int 6 0 1 2 3 4 5 6 0 1 ...
## $ workday : int 0 0 1 1 1 1 1 0 0 1 ...
## $ weathersit: int 2 2 1 1 1 1 2 2 1 1 ...
## $ temp : num 0.344 0.363 0.196 0.2 0.227 ...
## $ atemp : num 0.364 0.354 0.189 0.212 0.229 ...
## $ hum : num 0.806 0.696 0.437 0.59 0.437 ...
## $ windspeed : num 0.16 0.249 0.248 0.16 0.187 ...
## $ cnt : int 985 801 1349 1562 1600 1606 1510 959 822 1321 ...
# Q3. Which variables could be considered as candidates for seasonal variables in this problem? Explain you answer clearly.
# Q4. Run a regression model with time, time-squared and the monthly indicator variables. Comment on the quality of the model. Repeat this modeling step again for “season” indicators. Which of these two models is preferred? Is there a significant trend in the data set? Is there significant seasonality?
reg1 <- lm(cnt ~ trend + trend_sqd + mnth.1 + mnth.2 + mnth.3 + mnth.4 + mnth.5 + mnth.6 + mnth.7 + mnth.8 + mnth.9 + mnth.10 + mnth.11 , data = dbike2)
summary(reg1)
##
## Call:
## lm(formula = cnt ~ trend + trend_sqd + mnth.1 + mnth.2 + mnth.3 +
## mnth.4 + mnth.5 + mnth.6 + mnth.7 + mnth.8 + mnth.9 + mnth.10 +
## mnth.11, data = dbike2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6244.2 -446.1 143.8 648.5 3445.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.101e+01 1.888e+02 0.270 0.78714
## trend 7.645e+00 7.635e-01 10.012 < 2e-16 ***
## trend_sqd -2.272e-03 1.003e-03 -2.266 0.02374 *
## mnth.1 7.733e+02 2.009e+02 3.848 0.00013 ***
## mnth.2 1.033e+03 2.027e+02 5.094 4.49e-07 ***
## mnth.3 1.896e+03 1.975e+02 9.600 < 2e-16 ***
## mnth.4 2.494e+03 1.978e+02 12.608 < 2e-16 ***
## mnth.5 3.168e+03 1.952e+02 16.231 < 2e-16 ***
## mnth.6 3.403e+03 1.956e+02 17.404 < 2e-16 ***
## mnth.7 3.012e+03 1.929e+02 15.618 < 2e-16 ***
## mnth.8 2.932e+03 1.916e+02 15.300 < 2e-16 ***
## mnth.9 2.860e+03 1.919e+02 14.899 < 2e-16 ***
## mnth.10 2.122e+03 1.893e+02 11.214 < 2e-16 ***
## mnth.11 1.005e+03 1.900e+02 5.289 1.64e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1047 on 717 degrees of freedom
## Multiple R-squared: 0.713, Adjusted R-squared: 0.7078
## F-statistic: 137 on 13 and 717 DF, p-value: < 2.2e-16
reg2 <- lm(cnt ~ trend + trend_sqd + season.1 + season.2 + season.3, data = dbike2)
summary(reg2)
##
## Call:
## lm(formula = cnt ~ trend + trend_sqd + season.1 + season.2 +
## season.3, data = dbike2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5492.2 -579.2 138.1 687.4 4075.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.751e+03 1.702e+02 10.284 < 2e-16 ***
## trend 8.196e+00 8.474e-01 9.671 < 2e-16 ***
## trend_sqd -3.835e-03 1.112e-03 -3.447 0.000599 ***
## season.1 -8.632e+02 1.342e+02 -6.431 2.3e-10 ***
## season.2 1.209e+03 1.307e+02 9.255 < 2e-16 ***
## season.3 1.352e+03 1.252e+02 10.800 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1171 on 725 degrees of freedom
## Multiple R-squared: 0.6372, Adjusted R-squared: 0.6347
## F-statistic: 254.7 on 5 and 725 DF, p-value: < 2.2e-16
# Q5. Choose a preferred linear regression model using any set of variables. What are the key observations from this model for decision makers? Note the R-squared and RMSE from this model.
# Note: The RMSE in a linear regression is also known as the “standard error of regression” or just the “standard error”.
set.seed(49204366)
split = sample.split(dbike2$cnt, SplitRatio = 0.5)
training_set = subset(dbike2, split == TRUE)
test_set = subset(dbike2, split == FALSE)
reg_all <- lm(cnt ~., data = training_set)
summary(reg_all)
##
## Call:
## lm(formula = cnt ~ ., data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2549.26 -440.70 80.95 452.30 2837.94
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.782e+03 1.663e+03 1.673 0.095293 .
## trend -3.532e+00 4.675e+00 -0.756 0.450421
## dteday NA NA NA NA
## yr 3.403e+03 1.692e+03 2.011 0.045126 *
## holiday -1.161e+03 3.564e+02 -3.258 0.001236 **
## workday -1.677e+02 1.489e+02 -1.126 0.260860
## temp 3.096e+03 1.576e+03 1.965 0.050252 .
## atemp 7.880e+02 1.583e+03 0.498 0.618875
## hum -1.170e+03 3.993e+02 -2.930 0.003618 **
## windspeed -2.246e+03 5.716e+02 -3.930 0.000103 ***
## season.1 -1.501e+03 2.605e+02 -5.762 1.89e-08 ***
## season.2 -4.534e+02 3.000e+02 -1.511 0.131730
## season.3 -4.364e+02 2.500e+02 -1.745 0.081830 .
## mnth.1 -1.144e+03 1.560e+03 -0.733 0.464025
## mnth.2 -9.361e+02 1.433e+03 -0.653 0.513970
## mnth.3 -3.390e+02 1.298e+03 -0.261 0.794150
## mnth.4 -4.923e+02 1.178e+03 -0.418 0.676292
## mnth.5 9.514e+01 1.069e+03 0.089 0.929135
## mnth.6 -3.461e+02 9.255e+02 -0.374 0.708666
## mnth.7 -5.258e+02 8.026e+02 -0.655 0.512847
## mnth.8 8.052e+01 6.852e+02 0.118 0.906517
## mnth.9 8.808e+02 5.164e+02 1.706 0.088986 .
## mnth.10 7.649e+02 3.788e+02 2.020 0.044228 *
## mnth.11 -5.707e+01 2.597e+02 -0.220 0.826200
## weekday.0 -6.848e+02 1.515e+02 -4.519 8.65e-06 ***
## weekday.1 -1.162e+02 1.571e+02 -0.739 0.460138
## weekday.2 -3.596e+01 1.534e+02 -0.234 0.814775
## weekday.3 2.543e+01 1.409e+02 0.180 0.856878
## weekday.4 -6.715e+00 1.439e+02 -0.047 0.962794
## weekday.5 NA NA NA NA
## weekday.6 NA NA NA NA
## weathersit.1 1.818e+03 2.493e+02 7.293 2.21e-12 ***
## weathersit.2 1.392e+03 2.322e+02 5.995 5.28e-09 ***
## trend_sqd -1.728e-04 1.058e-03 -0.163 0.870361
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 754.1 on 334 degrees of freedom
## Multiple R-squared: 0.8568, Adjusted R-squared: 0.844
## F-statistic: 66.64 on 30 and 334 DF, p-value: < 2.2e-16
regfit.full <- regsubsets(cnt~., dbike2, nvmax = 34, method = "exhaustive")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 3 linear dependencies found
## Reordering variables and trying again:
summary(regfit.full)
## Subset selection object
## Call: regsubsets.formula(cnt ~ ., dbike2, nvmax = 34, method = "exhaustive")
## 33 Variables (and intercept)
## Forced in Forced out
## trend FALSE FALSE
## yr FALSE FALSE
## holiday FALSE FALSE
## workday FALSE FALSE
## temp FALSE FALSE
## atemp FALSE FALSE
## hum FALSE FALSE
## windspeed FALSE FALSE
## season.1 FALSE FALSE
## season.2 FALSE FALSE
## season.3 FALSE FALSE
## mnth.1 FALSE FALSE
## mnth.2 FALSE FALSE
## mnth.3 FALSE FALSE
## mnth.4 FALSE FALSE
## mnth.5 FALSE FALSE
## mnth.6 FALSE FALSE
## mnth.7 FALSE FALSE
## mnth.8 FALSE FALSE
## mnth.9 FALSE FALSE
## mnth.10 FALSE FALSE
## mnth.11 FALSE FALSE
## weekday.0 FALSE FALSE
## weekday.1 FALSE FALSE
## weekday.2 FALSE FALSE
## weekday.3 FALSE FALSE
## weekday.4 FALSE FALSE
## weathersit.1 FALSE FALSE
## weathersit.2 FALSE FALSE
## trend_sqd FALSE FALSE
## dteday FALSE FALSE
## weekday.5 FALSE FALSE
## weekday.6 FALSE FALSE
## 1 subsets of each size up to 30
## Selection Algorithm: exhaustive
## trend dteday yr holiday workday temp atemp hum windspeed season.1
## 1 ( 1 ) " " " " " " " " " " " " "*" " " " " " "
## 2 ( 1 ) " " "*" " " " " " " " " "*" " " " " " "
## 3 ( 1 ) " " " " "*" " " " " " " "*" " " " " "*"
## 4 ( 1 ) " " " " "*" " " " " " " "*" " " " " "*"
## 5 ( 1 ) " " " " "*" " " " " " " "*" " " " " "*"
## 6 ( 1 ) " " " " "*" " " " " "*" " " " " "*" "*"
## 7 ( 1 ) " " " " "*" " " " " " " "*" " " " " "*"
## 8 ( 1 ) " " " " "*" " " " " "*" " " " " "*" "*"
## 9 ( 1 ) " " " " "*" " " " " "*" " " "*" "*" "*"
## 10 ( 1 ) " " " " "*" " " " " "*" " " "*" "*" "*"
## 11 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 12 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 13 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 14 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 15 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 16 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 17 ( 1 ) " " " " "*" "*" " " "*" " " "*" "*" "*"
## 18 ( 1 ) " " "*" "*" "*" " " "*" " " "*" "*" "*"
## 19 ( 1 ) " " "*" "*" "*" " " "*" " " "*" "*" "*"
## 20 ( 1 ) " " "*" "*" "*" " " "*" " " "*" "*" "*"
## 21 ( 1 ) " " "*" "*" "*" " " "*" " " "*" "*" "*"
## 22 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 23 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 24 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 25 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 26 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 27 ( 1 ) "*" " " "*" "*" " " "*" "*" "*" "*" "*"
## 28 ( 1 ) " " "*" "*" "*" " " "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" " " "*" "*" " " "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" " " "*" "*" "*" "*" "*" "*" "*" "*"
## season.2 season.3 mnth.1 mnth.2 mnth.3 mnth.4 mnth.5 mnth.6 mnth.7
## 1 ( 1 ) " " " " " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " " " " " " " " " "
## 10 ( 1 ) " " "*" " " " " " " " " " " " " " "
## 11 ( 1 ) " " "*" " " " " " " " " " " " " " "
## 12 ( 1 ) " " "*" " " " " " " " " " " " " " "
## 13 ( 1 ) " " "*" " " " " " " " " " " " " "*"
## 14 ( 1 ) " " "*" " " " " "*" " " " " " " "*"
## 15 ( 1 ) " " "*" " " " " "*" " " " " " " "*"
## 16 ( 1 ) "*" "*" " " " " "*" " " "*" " " "*"
## 17 ( 1 ) "*" "*" " " " " "*" " " "*" " " "*"
## 18 ( 1 ) "*" "*" "*" "*" " " " " "*" " " "*"
## 19 ( 1 ) "*" "*" "*" "*" " " " " "*" " " "*"
## 20 ( 1 ) "*" "*" "*" "*" " " " " "*" " " "*"
## 21 ( 1 ) "*" "*" "*" "*" "*" "*" " " " " "*"
## 22 ( 1 ) "*" "*" "*" "*" " " " " "*" "*" " "
## 23 ( 1 ) "*" "*" "*" "*" "*" "*" " " " " "*"
## 24 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*"
## 25 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 26 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 27 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 28 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*"
## mnth.8 mnth.9 mnth.10 mnth.11 weekday.0 weekday.1 weekday.2 weekday.3
## 1 ( 1 ) " " " " " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " " " " " " " "
## 7 ( 1 ) " " "*" "*" " " " " " " " " " "
## 8 ( 1 ) " " "*" "*" " " " " " " " " " "
## 9 ( 1 ) " " "*" "*" " " " " " " " " " "
## 10 ( 1 ) " " "*" "*" " " " " " " " " " "
## 11 ( 1 ) " " "*" "*" " " " " " " " " " "
## 12 ( 1 ) " " "*" "*" " " "*" " " " " " "
## 13 ( 1 ) " " "*" "*" " " "*" " " " " " "
## 14 ( 1 ) " " "*" "*" " " "*" " " " " " "
## 15 ( 1 ) " " "*" "*" " " "*" "*" " " " "
## 16 ( 1 ) " " "*" "*" " " "*" " " " " " "
## 17 ( 1 ) " " "*" "*" " " "*" "*" " " " "
## 18 ( 1 ) " " "*" "*" " " "*" " " " " " "
## 19 ( 1 ) " " "*" "*" " " "*" "*" " " " "
## 20 ( 1 ) " " "*" "*" " " "*" "*" "*" " "
## 21 ( 1 ) " " "*" "*" " " "*" "*" "*" " "
## 22 ( 1 ) "*" "*" "*" " " "*" "*" "*" " "
## 23 ( 1 ) " " "*" "*" "*" "*" "*" "*" " "
## 24 ( 1 ) " " "*" "*" "*" "*" "*" "*" " "
## 25 ( 1 ) " " "*" "*" "*" "*" "*" "*" " "
## 26 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
## 27 ( 1 ) " " "*" "*" "*" "*" "*" "*" "*"
## 28 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
## weekday.4 weekday.5 weekday.6 weathersit.1 weathersit.2 trend_sqd
## 1 ( 1 ) " " " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " " " "
## 4 ( 1 ) " " " " " " "*" " " " "
## 5 ( 1 ) " " " " " " "*" "*" " "
## 6 ( 1 ) " " " " " " "*" "*" " "
## 7 ( 1 ) " " " " " " "*" "*" " "
## 8 ( 1 ) " " " " " " "*" "*" " "
## 9 ( 1 ) " " " " " " "*" "*" " "
## 10 ( 1 ) " " " " " " "*" "*" " "
## 11 ( 1 ) " " " " " " "*" "*" " "
## 12 ( 1 ) " " " " " " "*" "*" " "
## 13 ( 1 ) " " " " " " "*" "*" " "
## 14 ( 1 ) " " " " " " "*" "*" " "
## 15 ( 1 ) " " " " " " "*" "*" " "
## 16 ( 1 ) " " " " " " "*" "*" " "
## 17 ( 1 ) " " " " " " "*" "*" " "
## 18 ( 1 ) " " " " " " "*" "*" " "
## 19 ( 1 ) " " " " " " "*" "*" " "
## 20 ( 1 ) " " " " " " "*" "*" " "
## 21 ( 1 ) " " " " " " "*" "*" " "
## 22 ( 1 ) " " " " " " "*" "*" " "
## 23 ( 1 ) " " " " " " "*" "*" " "
## 24 ( 1 ) " " " " " " "*" "*" " "
## 25 ( 1 ) " " " " " " "*" "*" " "
## 26 ( 1 ) " " " " " " "*" "*" " "
## 27 ( 1 ) "*" " " " " "*" "*" " "
## 28 ( 1 ) "*" " " " " "*" "*" " "
## 29 ( 1 ) "*" " " " " "*" "*" "*"
## 30 ( 1 ) "*" " " " " "*" "*" "*"
summary(regfit.full)$which
## (Intercept) trend dteday yr holiday workday temp atemp hum windspeed
## 1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 2 TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## 3 TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 4 TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 5 TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 6 TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
## 7 TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
## 8 TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
## 9 TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
## 10 TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
## 11 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 12 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 13 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 14 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 15 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 16 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 17 TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 18 TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 19 TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 20 TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 21 TRUE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
## 22 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 23 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 24 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 25 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 26 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 27 TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 28 TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 29 TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## 30 TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## season.1 season.2 season.3 mnth.1 mnth.2 mnth.3 mnth.4 mnth.5 mnth.6 mnth.7
## 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 7 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 8 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 9 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 10 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 11 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 12 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 13 TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## 14 TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 15 TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
## 16 TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
## 17 TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
## 18 TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
## 19 TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
## 20 TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
## 21 TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
## 22 TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
## 23 TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
## 24 TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
## 25 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 26 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 27 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 28 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 29 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 30 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## mnth.8 mnth.9 mnth.10 mnth.11 weekday.0 weekday.1 weekday.2 weekday.3
## 1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 7 FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## 8 FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## 9 FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## 10 FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## 11 FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## 12 FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
## 13 FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
## 14 FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
## 15 FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
## 16 FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
## 17 FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
## 18 FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
## 19 FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
## 20 FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## 21 FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## 22 TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
## 23 FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## 24 FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## 25 FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## 26 TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## 27 FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 28 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 29 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 30 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## weekday.4 weekday.5 weekday.6 weathersit.1 weathersit.2 trend_sqd
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE TRUE FALSE FALSE
## 5 FALSE FALSE FALSE TRUE TRUE FALSE
## 6 FALSE FALSE FALSE TRUE TRUE FALSE
## 7 FALSE FALSE FALSE TRUE TRUE FALSE
## 8 FALSE FALSE FALSE TRUE TRUE FALSE
## 9 FALSE FALSE FALSE TRUE TRUE FALSE
## 10 FALSE FALSE FALSE TRUE TRUE FALSE
## 11 FALSE FALSE FALSE TRUE TRUE FALSE
## 12 FALSE FALSE FALSE TRUE TRUE FALSE
## 13 FALSE FALSE FALSE TRUE TRUE FALSE
## 14 FALSE FALSE FALSE TRUE TRUE FALSE
## 15 FALSE FALSE FALSE TRUE TRUE FALSE
## 16 FALSE FALSE FALSE TRUE TRUE FALSE
## 17 FALSE FALSE FALSE TRUE TRUE FALSE
## 18 FALSE FALSE FALSE TRUE TRUE FALSE
## 19 FALSE FALSE FALSE TRUE TRUE FALSE
## 20 FALSE FALSE FALSE TRUE TRUE FALSE
## 21 FALSE FALSE FALSE TRUE TRUE FALSE
## 22 FALSE FALSE FALSE TRUE TRUE FALSE
## 23 FALSE FALSE FALSE TRUE TRUE FALSE
## 24 FALSE FALSE FALSE TRUE TRUE FALSE
## 25 FALSE FALSE FALSE TRUE TRUE FALSE
## 26 FALSE FALSE FALSE TRUE TRUE FALSE
## 27 TRUE FALSE FALSE TRUE TRUE FALSE
## 28 TRUE FALSE FALSE TRUE TRUE FALSE
## 29 TRUE FALSE FALSE TRUE TRUE TRUE
## 30 TRUE FALSE FALSE TRUE TRUE TRUE
summary(regfit.full)$rsq
## [1] 0.3982439 0.6885702 0.7444240 0.7780255 0.7989754 0.8065122 0.8140877
## [8] 0.8202591 0.8262868 0.8319624 0.8358448 0.8402068 0.8421054 0.8434844
## [15] 0.8443512 0.8453537 0.8461603 0.8468853 0.8477040 0.8480081 0.8482437
## [22] 0.8485297 0.8487193 0.8489391 0.8490172 0.8490756 0.8491299 0.8491920
## [29] 0.8492160 0.8492162
plot(summary(regfit.full)$rsq)
par(mfrow=c(2,2))
max.rsq <- which.max(summary(regfit.full)$rsq)
max.rsq
## [1] 30
plot(summary(regfit.full)$rsq, xlab = "Number of Variables",
ylab ="R-squared")
plot(summary(regfit.full)$adjr2, xlab = "Number of Variables",
ylab ="Adj R-squared")
max.adjr2 <- which.max(summary(regfit.full)$adjr2)
max.adjr2
## [1] 22
points(max.adjr2,summary(regfit.full)$adjr2[22], col = "red", cex = 2, pch = 20)
plot(summary(regfit.full)$cp, xlab = "Number of Variables",
ylab ="Mallows Cp")
min.cp <- which.min(summary(regfit.full)$cp)
min.cp
## [1] 19
points(min.cp,summary(regfit.full)$cp[19], col = "blue", cex = 2, pch = 20)
plot(summary(regfit.full)$bic, xlab = "Number of Variables",
ylab ="Bayesian Info Crit")
min.bic <- which.min(summary(regfit.full)$bic)
min.bic
## [1] 13
points(min.bic,summary(regfit.full)$bic[13], col = "green", cex = 2, pch = 20)
coef.adjr2 <- coef(regfit.full, 22)
coef.adjr2
## (Intercept) yr holiday workday atemp hum
## 36637.571572 2789.467337 -424.550511 105.395192 5990.038465 -3126.511256
## windspeed season.1 season.2 season.3 mnth.1 mnth.2
## -3659.621376 -1521.550506 -624.881497 -671.707727 -606.746137 -477.006335
## mnth.3 mnth.6 mnth.7 mnth.9 mnth.10 mnth.11
## -98.975685 -95.983578 -582.230043 683.274899 305.242336 -128.137925
## weekday.1 weekday.2 weekday.3 dteday weekday.5
## -128.618665 -61.174352 -14.623655 -2.149054 101.637859
coef.cp <- coef(regfit.full, 19)
coef.cp
## (Intercept) yr holiday workday atemp windspeed
## 49946.917254 3266.640398 -312.849263 107.045774 4835.700001 -2547.687860
## season.1 season.2 season.3 mnth.1 mnth.2 mnth.3
## -1667.963440 -814.426171 -644.581371 -888.698532 -615.238565 -166.700685
## mnth.6 mnth.8 mnth.10 mnth.11 weekday.1 weekday.2
## 258.137232 134.995494 112.174632 -212.022411 -175.753095 -103.809627
## dteday weekday.5
## -3.131315 140.602179
coef.bic <- coef(regfit.full, 13)
coef.bic
## (Intercept) holiday workday atemp windspeed
## -77613.619882 -500.451264 78.822676 5705.090395 -2184.295457
## season.1 season.2 mnth.1 mnth.8 mnth.10
## -265.129784 693.367538 341.421695 2.776361 276.151121
## mnth.11 weekday.1 dteday weekday.5
## -207.942645 -134.999108 5.191573 172.291712
c(names(coef.adjr2))
## [1] "(Intercept)" "yr" "holiday" "workday" "atemp"
## [6] "hum" "windspeed" "season.1" "season.2" "season.3"
## [11] "mnth.1" "mnth.2" "mnth.3" "mnth.6" "mnth.7"
## [16] "mnth.9" "mnth.10" "mnth.11" "weekday.1" "weekday.2"
## [21] "weekday.3" "dteday" "weekday.5"
c(names(coef.cp))
## [1] "(Intercept)" "yr" "holiday" "workday" "atemp"
## [6] "windspeed" "season.1" "season.2" "season.3" "mnth.1"
## [11] "mnth.2" "mnth.3" "mnth.6" "mnth.8" "mnth.10"
## [16] "mnth.11" "weekday.1" "weekday.2" "dteday" "weekday.5"
c(names(coef.bic))
## [1] "(Intercept)" "holiday" "workday" "atemp" "windspeed"
## [6] "season.1" "season.2" "mnth.1" "mnth.8" "mnth.10"
## [11] "mnth.11" "weekday.1" "dteday" "weekday.5"
reg_adjr2 <- lm(cnt ~ yr + holiday + workday + atemp + hum + windspeed + season.1 + season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 + mnth.6 + mnth.7 + mnth.9 + mnth.10 + mnth.11 + weekday.1 + weekday.2 + weekday.3 + dteday + weekday.5, data = training_set)
summary(reg_adjr2)
##
## Call:
## lm(formula = cnt ~ yr + holiday + workday + atemp + hum + windspeed +
## season.1 + season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 +
## mnth.6 + mnth.7 + mnth.9 + mnth.10 + mnth.11 + weekday.1 +
## weekday.2 + weekday.3 + dteday + weekday.5, data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3297.0 -374.6 120.2 488.4 3078.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30436.037 22847.396 1.332 0.183700
## yr 2727.986 559.193 4.878 1.64e-06 ***
## holiday -756.408 378.432 -1.999 0.046422 *
## workday 165.213 143.052 1.155 0.248931
## atemp 5099.931 578.483 8.816 < 2e-16 ***
## hum -2650.044 338.871 -7.820 6.61e-14 ***
## windspeed -3026.263 605.400 -4.999 9.23e-07 ***
## season.1 -1423.028 283.898 -5.012 8.64e-07 ***
## season.2 -380.332 316.354 -1.202 0.230104
## season.3 -328.547 238.356 -1.378 0.168984
## mnth.1 -636.212 473.813 -1.343 0.180244
## mnth.2 -451.333 431.573 -1.046 0.296399
## mnth.3 -95.024 314.490 -0.302 0.762720
## mnth.6 -354.513 205.091 -1.729 0.084790 .
## mnth.7 -665.076 204.896 -3.246 0.001286 **
## mnth.9 798.667 183.801 4.345 1.84e-05 ***
## mnth.10 805.694 240.462 3.351 0.000896 ***
## mnth.11 -198.288 226.881 -0.874 0.382747
## weekday.1 -91.514 173.639 -0.527 0.598511
## weekday.2 -9.438 171.378 -0.055 0.956113
## weekday.3 44.369 157.567 0.282 0.778428
## dteday -1.760 1.492 -1.179 0.239120
## weekday.5 75.594 158.609 0.477 0.633946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 836.7 on 342 degrees of freedom
## Multiple R-squared: 0.8196, Adjusted R-squared: 0.808
## F-statistic: 70.61 on 22 and 342 DF, p-value: < 2.2e-16
reg_cp <- lm(cnt ~ yr + holiday + workday + atemp + hum + windspeed + season.1 + season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 + mnth.6 + mnth.8 + mnth.10 + mnth.11 + weekday.1 + weekday.2 + dteday + weekday.5, data = training_set)
summary(reg_cp)
##
## Call:
## lm(formula = cnt ~ yr + holiday + workday + atemp + hum + windspeed +
## season.1 + season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 +
## mnth.6 + mnth.8 + mnth.10 + mnth.11 + weekday.1 + weekday.2 +
## dteday + weekday.5, data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3344.5 -477.2 110.5 512.4 3160.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8937.836 23996.436 0.372 0.709776
## yr 2193.198 587.509 3.733 0.000221 ***
## holiday -668.995 397.476 -1.683 0.093261 .
## workday 208.255 123.595 1.685 0.092900 .
## atemp 4627.668 599.376 7.721 1.27e-13 ***
## hum -2315.316 355.902 -6.505 2.73e-10 ***
## windspeed -3061.180 642.893 -4.762 2.84e-06 ***
## season.1 -1594.002 297.321 -5.361 1.52e-07 ***
## season.2 -334.980 329.672 -1.016 0.310295
## season.3 -384.849 260.659 -1.476 0.140740
## mnth.1 -321.884 501.170 -0.642 0.521130
## mnth.2 -162.931 455.560 -0.358 0.720825
## mnth.3 115.988 331.993 0.349 0.727026
## mnth.6 -249.030 209.775 -1.187 0.235994
## mnth.8 108.189 196.887 0.549 0.583019
## mnth.10 623.625 246.635 2.529 0.011901 *
## mnth.11 -462.423 233.735 -1.978 0.048680 *
## weekday.1 -183.611 163.761 -1.121 0.262978
## weekday.2 -60.620 159.024 -0.381 0.703292
## dteday -0.339 1.567 -0.216 0.828830
## weekday.5 63.298 144.141 0.439 0.660838
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 888.7 on 344 degrees of freedom
## Multiple R-squared: 0.7952, Adjusted R-squared: 0.7833
## F-statistic: 66.79 on 20 and 344 DF, p-value: < 2.2e-16
reg_bic <- lm(cnt ~ holiday + workday + temp + atemp + windspeed + season.1 + season.2 + mnth.1 + mnth.8 + mnth.10 + mnth.11 + weekday.1 + dteday + weekday.5, data = training_set)
summary(reg_bic)
##
## Call:
## lm(formula = cnt ~ holiday + workday + temp + atemp + windspeed +
## season.1 + season.2 + mnth.1 + mnth.8 + mnth.10 + mnth.11 +
## weekday.1 + dteday + weekday.5, data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3980.2 -526.7 187.8 601.4 3039.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.099e+04 4.269e+03 -18.971 < 2e-16 ***
## holiday -5.531e+02 4.432e+02 -1.248 0.21291
## workday 1.213e+02 1.280e+02 0.947 0.34420
## temp 4.172e+03 1.856e+03 2.248 0.02519 *
## atemp 8.452e+02 2.044e+03 0.414 0.67945
## windspeed -2.113e+03 6.995e+02 -3.021 0.00271 **
## season.1 -1.127e+02 2.188e+02 -0.515 0.60672
## season.2 7.299e+02 1.593e+02 4.581 6.45e-06 ***
## mnth.1 2.490e+02 2.232e+02 1.115 0.26547
## mnth.8 4.941e+01 2.135e+02 0.231 0.81708
## mnth.10 7.545e+02 2.309e+02 3.268 0.00119 **
## mnth.11 -3.384e+02 2.371e+02 -1.427 0.15440
## weekday.1 -2.292e+02 1.771e+02 -1.295 0.19634
## dteday 5.421e+00 2.756e-01 19.667 < 2e-16 ***
## weekday.5 1.104e+02 1.530e+02 0.722 0.47083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 993.7 on 350 degrees of freedom
## Multiple R-squared: 0.7395, Adjusted R-squared: 0.7291
## F-statistic: 70.98 on 14 and 350 DF, p-value: < 2.2e-16
anova(reg_adjr2, reg_all)
## Analysis of Variance Table
##
## Model 1: cnt ~ yr + holiday + workday + atemp + hum + windspeed + season.1 +
## season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 + mnth.6 +
## mnth.7 + mnth.9 + mnth.10 + mnth.11 + weekday.1 + weekday.2 +
## weekday.3 + dteday + weekday.5
## Model 2: cnt ~ trend + dteday + yr + holiday + workday + temp + atemp +
## hum + windspeed + season.1 + season.2 + season.3 + mnth.1 +
## mnth.2 + mnth.3 + mnth.4 + mnth.5 + mnth.6 + mnth.7 + mnth.8 +
## mnth.9 + mnth.10 + mnth.11 + weekday.0 + weekday.1 + weekday.2 +
## weekday.3 + weekday.4 + weekday.5 + weekday.6 + weathersit.1 +
## weathersit.2 + trend_sqd
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 342 239407549
## 2 334 189936339 8 49471210 10.874 1.238e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(reg_cp, reg_all)
## Analysis of Variance Table
##
## Model 1: cnt ~ yr + holiday + workday + atemp + hum + windspeed + season.1 +
## season.2 + season.3 + mnth.1 + mnth.2 + mnth.3 + mnth.6 +
## mnth.8 + mnth.10 + mnth.11 + weekday.1 + weekday.2 + dteday +
## weekday.5
## Model 2: cnt ~ trend + dteday + yr + holiday + workday + temp + atemp +
## hum + windspeed + season.1 + season.2 + season.3 + mnth.1 +
## mnth.2 + mnth.3 + mnth.4 + mnth.5 + mnth.6 + mnth.7 + mnth.8 +
## mnth.9 + mnth.10 + mnth.11 + weekday.0 + weekday.1 + weekday.2 +
## weekday.3 + weekday.4 + weekday.5 + weekday.6 + weathersit.1 +
## weathersit.2 + trend_sqd
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 344 271703071
## 2 334 189936339 10 81766733 14.379 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(reg_bic, reg_all)
## Analysis of Variance Table
##
## Model 1: cnt ~ holiday + workday + temp + atemp + windspeed + season.1 +
## season.2 + mnth.1 + mnth.8 + mnth.10 + mnth.11 + weekday.1 +
## dteday + weekday.5
## Model 2: cnt ~ trend + dteday + yr + holiday + workday + temp + atemp +
## hum + windspeed + season.1 + season.2 + season.3 + mnth.1 +
## mnth.2 + mnth.3 + mnth.4 + mnth.5 + mnth.6 + mnth.7 + mnth.8 +
## mnth.9 + mnth.10 + mnth.11 + weekday.0 + weekday.1 + weekday.2 +
## weekday.3 + weekday.4 + weekday.5 + weekday.6 + weathersit.1 +
## weathersit.2 + trend_sqd
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 350 345578168
## 2 334 189936339 16 155641830 17.106 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
reg_adjr2_pred <- predict(reg_adjr2, test_set)
RSS_Test <- sum((test_set$cnt - reg_adjr2_pred)^2)
MSE_Test <- RSS_Test/366
MSE_Test
## [1] 782758.1
RMSE_Test <- MSE_Test^0.5
RMSE_Test
## [1] 884.7362
# Q6. Create a (new) time series object from the count (cnt) variable with frequency 7. Use this new series for questions 6-9 below. Plot the data in time. Is there significant autocorrelation in this series? Explain the results found.
# Hint: See Hyndman, Sections 2.1, 2.2, 2.3 and 2.8.
bikets <- ts(dbike2$cnt, start = 1, frequency = 7)
bikets[1:104]
## [1] 985 801 1349 1562 1600 1606 1510 959 822 1321 1263 1162 1406 1421 1248
## [16] 1204 1000 683 1650 1927 1543 981 986 1416 1985 506 431 1167 1098 1096
## [31] 1501 1360 1526 1550 1708 1005 1623 1712 1530 1605 1538 1746 1472 1589 1913
## [46] 1815 2115 2475 2927 1635 1812 1107 1450 1917 1807 1461 1969 2402 1446 1851
## [61] 2134 1685 1944 2077 605 1872 2133 1891 623 1977 2132 2417 2046 2056 2192
## [76] 2744 3239 3117 2471 2077 2703 2121 1865 2210 2496 1693 2028 2425 1536 1685
## [91] 2227 2252 3249 3115 1795 2808 3141 1471 2455 2895 3348 2034 2162 3267
autoplot(bikets) +
ggtitle("Daily Bike Share Amount 2011 - 2012") +
ylab("Bike Share") +
xlab("Week")
gglagplot(bikets)
# A6: The plot shows some period to period correlation (autocorrelation) of the residuals.
# Q7. Estimate a 3-period (centered) moving average model for the count data, that is a moving average model with order equal to 3. What is the RMSE (root mean squared error) of this model? Try other odd values for the number periods to see if a better model can be found. Compare the RMSE of the best model found here to that of question 5.
# Hint: See section 6.2. To compute root mean squared error (RMSE) use the standard process. Subtract forecasts from actuals to compute errors. Square the errors and take the average. Finally, compute the square root of the mean squared error.
ma3 <- ma(bikets, 3, centre = TRUE)
ma3[1:20]
## [1] NA 1045.0000 1237.3333 1503.6667 1589.3333 1572.0000 1358.3333
## [8] 1097.0000 1034.0000 1135.3333 1248.6667 1277.0000 1329.6667 1358.3333
## [15] 1291.0000 1150.6667 962.3333 1111.0000 1420.0000 1706.6667
mean(bikets[1:3])
## [1] 1045
ma3[2]
## [1] 1045
#
autoplot(ma(bikets, 3)) +
ylab("Bike Share Amount") +
xlab("Year")
autoplot(bikets, series = "Actual Amount") +
autolayer(ma3, series = "MA(3)") +
xlab("Year") + ylab("Bike Share Amount") +
ggtitle("Bike Share Amount 2011 - 2012") +
scale_colour_manual(values =
c("Actual Amount"="grey50","MA(3)"="red"))
## Warning: Removed 2 rows containing missing values (geom_path).
ma7 <- ma(bikets, 7)
ma7[1:20]
## [1] NA NA NA 1344.714 1341.000 1344.000 1340.000 1297.286
## [9] 1234.714 1206.143 1193.429 1234.714 1289.286 1243.429 1160.571 1230.286
## [17] 1304.714 1322.143 1284.000 1252.857
mean(bikets[1:7])
## [1] 1344.714
ma7[4]
## [1] 1344.714
#
autoplot(ma(bikets, 7)) +
ylab("Bike Share Amount") +
xlab("Year")
autoplot(bikets, series = "Actual Amount") +
autolayer(ma7, series = "MA(7)") +
xlab("Year") + ylab("Bike Share Amount") +
ggtitle("Bike Share Amount 2011 - 2012") +
scale_colour_manual(values =
c("Actual Amount"="grey50","MA(7)"="red"))
## Warning: Removed 6 rows containing missing values (geom_path).
MSE_3 <- sum((ma3[2:730] - bikets[2:730])^2)/length(bikets[2:730])
MSE_3
## [1] 325407.5
RMSE_3 <- MSE_3^0.5
RMSE_3
## [1] 570.445
MSE_7 <- sum((ma7[4:728] - bikets[4:728])^2)/length(bikets[4:728])
MSE_7
## [1] 631982.9
RMSE_7 <- MSE_7^0.5
RMSE_7
## [1] 794.9735
# Q8. Execute the classical additive and multiplicative decompositions on the “cnt” series. Based on the two decompositions, what is observed regarding trend and seasonal effect? Is/Are the remainder components showing autocorrelation? Compute the RMSE from the remainder series.
# Hint: Use the “remainder” command to obtain the remainder component of a decomposition. See Hyndman, Section 6.3.
# First the classical decomposition
# This is additive classical decomp
#
dc1_classical_add <- decompose(bikets, type = "additive")
autoplot(dc1_classical_add) +
ggtitle("Bike Share Classical Additive Decomposition") +
xlab("Year")
#
# Here is the multiplicative version
#
dc1_classical_mult <- decompose(bikets, type = "multiplicative")
autoplot(dc1_classical_mult) +
ggtitle("Bike Share Classical Multiplicative Decomposition") +
xlab("Year")
# To get the remainder part in the additive case:
#
dc1_classical_add$random
## Time Series:
## Start = c(1, 1)
## End = c(105, 3)
## Frequency = 7
## [1] NA NA NA 222.2696690 226.5979657
## [6] 112.2435701 -0.8896717 -409.4760635 -154.9198915 276.3172791
## [11] 74.5553832 -105.1163201 -33.0421442 6.6817569 16.2382222
## [16] 231.5086799 -143.2541495 -634.1589025 333.5979657 524.3864272
## [21] 59.8246140 -588.4760635 -91.0627486 456.3172791 922.5553832
## [26] -610.5448915 -818.6135728 -115.8896717 4.0953651 185.3658228
## [31] 334.1744220 -40.5874739 101.3122514 -67.3278585 39.3960426
## [36] -588.1903492 347.5086799 341.8887077 -2.0160453 -31.1163201
## [41] -210.6135728 -52.4611003 -267.4760635 105.6515371 199.4601363
## [46] -223.7303310 15.5979657 226.3864272 772.3960426 -367.7617778
## [51] 166.5086799 -539.3970066 -143.4446168 238.4551085 -73.1850014
## [56] -488.7468146 61.6667937 792.6515371 -242.2541495 -62.7303310
## [61] 167.4551085 -142.1850014 34.8246140 227.2382222 -881.0627486
## [66] 441.3172791 541.1268118 253.8836799 -1390.3278585 -82.3182431
## [71] 183.3810794 754.3658228 -15.9684352 -342.7303310 -384.8306058
## [76] 42.1007129 511.5388997 396.8096508 89.9372514 -274.8255780
## [81] 341.6982404 -188.9734629 -451.1850014 -120.3182431 305.0953651
## [86] -85.3484629 179.0315648 417.1268118 -474.4020343 -665.0421442
## [91] -299.4611003 -84.7617778 1059.5086799 621.1744220 -747.3017596
## [96] 199.3122514 465.5292843 -1258.8896717 -209.3332063 651.9372514
## [101] 990.6029934 -716.3017596 -388.5448915 477.8149986 304.1103283
## [106] -2094.3332063 929.0801085 386.0315648 210.6982404 450.3122514
## [111] 514.1007129 -2105.0325288 176.8096508 671.0801085 475.4601363
## [116] 229.9839547 -517.6877486 -329.0421442 139.9674712 949.3810794
## [121] -505.6341772 394.4601363 286.1268118 -1483.8306058 58.5292843
## [126] 217.9674712 373.3810794 100.0801085 -28.8255780 327.5553832
## [131] -144.4020343 388.8149986 -333.6039574 -832.7617778 686.9372514
## [136] 36.8887077 -70.5874739 -718.2591772 -130.8992871 144.8246140
## [141] 1079.8096508 103.3658228 -393.5398637 -298.0160453 300.1693942
## [146] -136.4707157 -130.4611003 121.0953651 623.5086799 -204.3970066
## [151] -567.3017596 -696.1163201 163.6721415 422.2531854 430.3810794
## [156] 262.3658228 -41.5398637 190.6982404 -224.9734629 -764.6135728
## [161] -182.1753860 289.2382222 0.9372514 485.7458505 163.4125261
## [166] 393.1693942 -1177.7564299 22.3960426 405.0953651 455.2229657
## [171] -521.2541495 126.2696690 -250.9734629 -165.4707157 -85.3182431
## [176] 252.0953651 581.5086799 -215.3970066 -484.8731882 66.5979657
## [181] 332.9578558 -31.8896717 -177.6189206 -233.4913201 1196.0315648
## [186] -149.5874739 -253.9734629 -441.4707157 -735.0325288 718.8096508
## [191] 633.7943942 -327.8255780 -526.3017596 -563.5448915 0.9578558
## [196] 380.6817569 824.9525080 534.3658228 -220.2541495 13.5553832
## [201] 144.0265371 -279.0421442 -608.8896717 -618.1903492 -14.4913201
## [206] 36.6029934 564.5553832 423.1693942 -59.6135728 -685.6039574
## [211] 6.6667937 317.2229657 158.3172791 435.1268118 -847.4020343
## [216] 111.1007129 371.3960426 -66.1903492 -418.4913201 -4.6827209
## [221] 109.2696690 270.4551085 160.1007129 250.2531854 -422.6189206
## [226] -411.3484629 151.3172791 489.2696690 272.1693942 -741.7564299
## [231] -474.8896717 495.6667937 -555.6341772 270.6029934 1178.5553832
## [236] 958.4551085 -812.7564299 302.8246140 -3044.7617778 513.5086799
## [241] 492.4601363 896.5553832 231.8836799 84.9578558 -140.8896717
## [246] 72.0953651 1294.5086799 76.7458505 -551.7303310 -1426.1163201
## [251] -1712.6135728 -226.3182431 1381.0953651 1012.6515371 323.7458505
## [256] 43.5553832 147.3122514 -985.7564299 118.9674712 129.9525080
## [261] 283.7943942 290.1744220 -426.4446168 116.8836799 337.3864272
## [266] -2096.7468146 962.5239365 942.0801085 459.4601363 -608.0160453
## [271] -430.6877486 682.8149986 1176.1103283 -1545.1903492 -858.4913201
## [276] -292.2541495 468.2696690 375.1693942 -173.6135728 -195.7468146
## [281] 312.6667937 1087.9372514 862.1744220 343.2696690 -1813.6877486
## [286] -1366.8992871 -578.8896717 1067.3810794 1219.2229657 468.7458505
## [291] 395.9839547 -1835.5448915 -87.6135728 54.9674712 167.3810794
## [296] 359.3658228 288.4601363 711.5553832 407.0265371 -795.3278585
## [301] 345.5388997 -2586.3332063 404.9372514 458.7458505 658.5553832
## [306] 267.8836799 -106.8992871 -108.3182431 -148.1903492 -85.2056058
## [311] 353.1744220 463.5553832 310.0265371 -993.0421442 -643.6039574
## [316] 156.5239365 462.9372514 1118.4601363 667.5553832 -1690.1163201
## [321] -543.3278585 20.3960426 760.8096508 839.7943942 211.0315648
## [326] -1017.7303310 -11.1163201 -1135.3278585 -16.8896717 172.0953651
## [331] 354.5086799 735.3172791 -538.1589025 45.4551085 -17.0421442
## [336] 182.8246140 2.2382222 617.6515371 905.1744220 -422.5874739
## [341] -2288.4020343 317.2435701 665.6817569 202.6667937 -348.9198915
## [346] 66.4601363 129.1268118 373.1693942 269.3864272 102.9674712
## [351] -667.7617778 -492.4913201 474.7458505 860.6982404 -19.8306058
## [356] 510.3864272 -71.7468146 -800.3332063 -677.2056058 -118.3970066
## [361] -542.7303310 349.3122514 132.9578558 597.2531854 29.5239365
## [366] 158.0801085 -402.5398637 -431.0160453 -627.2591772 -2.1850014
## [371] 741.9674712 1070.0953651 330.3658228 -932.8255780 258.9839547
## [376] -909.6877486 1052.1007129 159.1103283 -367.4760635 -391.7770343
## [381] -386.1112923 101.6982404 675.5979657 521.9578558 352.6817569
## [386] -1610.1903492 -732.9198915 -486.1112923 1222.5553832 727.3122514
## [391] 234.1007129 -576.3182431 66.0953651 -429.0627486 -99.5398637
## [396] 529.6982404 732.4551085 -160.6135728 185.3960426 -1014.7617778
## [401] -316.9198915 413.8887077 894.1268118 -621.5448915 491.6721415
## [406] 523.2531854 -974.3332063 -1480.6341772 433.8887077 731.2696690
## [411] 633.8836799 -813.1850014 356.5388997 640.9525080 -745.3484629
## [416] -695.5398637 -108.7303310 1076.4551085 1148.1007129 -618.4611003
## [421] -1357.4760635 48.3658228 895.3172791 821.6982404 -1935.2591772
## [426] 1098.5292843 -577.3182431 452.5239365 -301.7770343 -544.1112923
## [431] -274.0160453 641.1693942 777.2435701 -337.6039574 -959.0474920
## [436] -36.4913201 138.4601363 558.2696690 454.7408228 77.2435701
## [441] -1880.0325288 1642.5239365 39.2229657 106.8887077 -678.7303310
## [446] 58.5979657 710.2435701 2265.1103283 -2483.6189206 -454.6341772
## [451] 116.4601363 -81.3017596 68.3122514 236.6721415 -512.4611003
## [456] 124.6667937 154.2229657 -93.3970066 443.1268118 -19.1163201
## [461] 9.1007129 41.1103283 659.8096508 -474.3484629 -4.9684352
## [466] 180.4125261 -999.1163201 -849.8992871 5.8246140 1057.0953651
## [471] 1128.7943942 105.3172791 142.4125261 -2099.5448915 853.2435701
## [476] 2007.9674712 1592.8096508 -3936.4913201 -1625.9684352 787.5553832
## [481] 1656.5979657 -384.6135728 464.3960426 -1464.1903492 952.6515371
## [486] -74.9684352 -72.4446168 -61.2591772 65.5292843 -180.7468146
## [491] 507.6667937 520.0801085 316.1744220 -490.1589025 -1616.5448915
## [496] 155.5292843 1082.3960426 1668.6667937 299.9372514 -3187.3970066
## [501] -1158.8731882 989.1693942 687.3864272 704.6817569 1322.5239365
## [506] 795.6515371 -1982.9684352 -296.1589025 -895.4020343 574.1007129
## [511] 276.3960426 225.2382222 381.3658228 -344.1112923 -428.1589025
## [516] 420.1693942 635.8149986 -2732.7468146 1180.2382222 1001.6515371
## [521] 240.0315648 -429.0160453 -323.5448915 147.1007129 415.6817569
## [526] 567.2382222 -56.0627486 -67.6827209 -1906.0160453 476.4551085
## [531] 246.8149986 751.2531854 623.2382222 401.0801085 -1365.9684352
## [536] 466.6982404 -149.8306058 -560.7564299 -903.8896717 742.6667937
## [541] 344.0801085 -3.3970066 554.5553832 663.1693942 284.1007129
## [546] -1074.1753860 -638.7617778 -475.4913201 215.3172791 385.5553832
## [551] 1212.1693942 55.5292843 -48.4611003 -1262.9046349 -1082.0627486
## [556] 546.4601363 -73.5874739 558.8836799 429.3864272 423.9674712
## [561] -77.1903492 -464.6341772 360.1744220 392.4125261 -359.4020343
## [566] 204.2435701 -557.3182431 -1983.7617778 944.7943942 365.8887077
## [571] 687.6982404 913.3122514 -399.8992871 -397.8896717 -463.4760635
## [576] -137.7770343 216.7458505 132.5553832 439.3122514 164.8149986
## [581] 70.8246140 -188.6189206 -1213.0627486 236.0315648 537.9839547
## [586] 836.5979657 316.9578558 -1185.6039574 -503.0474920 97.6515371
## [591] 294.7458505 -155.3017596 146.5979657 572.2435701 144.5388997
## [596] 929.5239365 -2061.4913201 -199.6827209 57.8410975 648.3122514
## [601] 820.1007129 560.6817569 -873.4760635 -1388.4913201 184.6029934
## [606] 184.2696690 791.4551085 610.8149986 352.8246140 -732.3332063
## [611] -649.7770343 -306.3970066 345.1268118 579.1693942 -792.4707157
## [616] 274.3960426 -1282.9046349 1188.7943942 161.7458505 175.1268118
## [621] -150.4020343 -206.0421442 71.5388997 1403.9525080 391.7943942
## [626] -156.5398637 -3131.5874739 394.5979657 324.2435701 669.1103283
## [631] 501.8096508 322.5086799 -198.1112923 -145.1589025 -10.4020343
## [636] -322.3278585 -227.4611003 1426.3810794 112.3658228 -85.6827209
## [641] -2487.0160453 492.8836799 614.2435701 1606.8246140 1265.0953651
## [646] -2877.9198915 -1040.8255780 -158.4446168 1225.4551085 540.1007129
## [651] 174.2531854 -62.1903492 -170.3484629 -1021.9684352 745.9839547
## [656] 495.4551085 399.6721415 -1875.4611003 899.9525080 -70.2056058
## [661] 88.8887077 51.8410975 275.4551085 161.9578558 1230.9674712
## [666] 2648.6667937 -111.4913201 -4448.6827209 -3303.0160453 1517.3122514
## [671] 1727.3864272 819.1103283 -445.9046349 -72.0627486 79.4601363
## [676] 329.2696690 -558.8306058 -645.4707157 -133.8896717 737.2382222
## [681] 1316.5086799 618.6029934 -1670.8731882 -177.6877486 -33.1850014
## [686] 308.6817569 119.3810794 -461.7770343 703.3172791 937.2696690
## [691] 890.7408228 -1626.8992871 -104.1753860 -1398.1903492 -938.4913201
## [696] 1214.1744220 -321.4446168 525.8836799 153.6721415 313.6817569
## [701] -441.7617778 -721.7770343 759.4601363 1069.2696690 99.0265371
## [706] -169.3278585 -405.4611003 426.0953651 -1540.3484629 282.8887077
## [711] 371.2696690 228.3122514 244.2435701 385.6817569 -86.6189206
## [716] -1011.2056058 -107.9684352 991.5553832 1135.3122514 164.5292843
## [721] 161.9674712 -963.1903492 93.2229657 -582.3970066 -570.4446168
## [726] -1121.5448915 432.8149986 1134.2531854 NA NA
## [731] NA
MSE_dc1_cA <- mean((dc1_classical_add$random[4:728])^2)
RMSE_dc1_cA <- MSE_dc1_cA^0.5
RMSE_dc1_cA
## [1] 781.3811
#
# A litte more work to get the remainder amount in the
# multiplicative case
#
?decompose
fit_cM <- dc1_classical_mult$trend *
dc1_classical_mult$seasonal
fit_cM[1:12]
## [1] NA NA NA 1355.155 1337.808 1388.002 1410.885 1292.412
## [9] 1162.122 1171.800 1202.695 1231.775
#
# Then we will compute the error of the remainder
#
MSE_dc1_cM <- mean((bikets[4:728] - fit_cM[4:728])^2)
RMSE_dc1_cM <- MSE_dc1_cM^0.5
RMSE_dc1_cM
## [1] 782.5306
# Q9. Try applying the STL decomposition from the text. Use s.window = 11 and t.window = 7 for these required parameters intitally. Compute the RMSE from the remainder series. Optional: Try other odd parameters for s.window and t.window to see if the RMSE decreases.
# Hint: Assuming the data is in series “cntts”, the command to run the STL decomposition would be: decomp_STL <- stl(cntts, s.window = 11, t.window = 7, robust = TRUE)
decomp_STL <- stl(bikets, s.window = 11, t.window = 7, robust = TRUE)
autoplot(decomp_STL) +
ggtitle("Bike Share Classical Additive Decomposition") +
xlab("Year")
# The decomposition compnents are
# seasonal, trend-cycle and remander:
#
sa_stl <- decomp_STL$time.series[,1]
tc_stl <- decomp_STL$time.series[,2]
rem_stl <- decomp_STL$time.series[,3]
# This is an additive decomposition
#
chk <- sa_stl + tc_stl + rem_stl
chk[1:20]
## [1] 985 801 1349 1562 1600 1606 1510 959 822 1321 1263 1162 1406 1421 1248
## [16] 1204 1000 683 1650 1927
bikets[1:20]
## [1] 985 801 1349 1562 1600 1606 1510 959 822 1321 1263 1162 1406 1421 1248
## [16] 1204 1000 683 1650 1927
# Again to check the error associated
# with the breakdown
#
MSE_stl <- mean((rem_stl)^2)
RMSE_stl <- MSE_stl^0.5
RMSE_stl
## [1] 750.8186
# RMSE = 750.8186
# Q10. Compare all RMSEs computed. Which model worked best? Explain why. In one sentence what would you want management to know about modeling rider count?
RMSE_Q6 = 884.7362 RMSE_ma3 = 570.445 RMSE_ma7 = 794.9735 Classical Decomp Additive = 781.3811 & Classical Decomp Multiplicative = 782.5306 RMSE_stl = 750.8186 The model with 3-period moving average is the best. I think it is better capturing the trend with 3 day period. Generally speaking, the modeling rider count will increase in the incoming year.