Homework 3

Number 1

library(fpp2)

## Warning: package 'fpp2' was built under R version 3.4.4

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 3.4.3

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.4.4

## Loading required package: fma

## Warning: package 'fma' was built under R version 3.4.4

## Loading required package: expsmooth

## Warning: package 'expsmooth' was built under R version 3.4.4

autoplot(mens400)

#There is a a gap for 1916 and another gap for 1940 and 1944 for there was no Olympics these years.

summary(mens400)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   43.03   43.92   45.00   46.02   47.65   54.20       3

line1 <- tslm(mens400 ~ trend)

line1

## 
## Call:
## tslm(formula = mens400 ~ trend)
## 
## Coefficients:
## (Intercept)        trend  
##     50.3078      -0.2583

#The slop of the line is -0.2583 seconds for every 4 years

checkresiduals(line1)

## 
##  Breusch-Godfrey test for serial correlation of order up to 6
## 
## data:  Residuals from Linear regression model
## LM test = 3.6082, df = 6, p-value = 0.7295

#This line is a poor fit and does not capture the distribution. The residuals appear to have a small correlation based on the ACF. They also seem to have a trend when plotted. Not only does the distribution lean more toward negative values, but the poitive residuals tend to be in the earlier years and negative in the later years. This happens because of the logarithmic decrease in the data that isn't captured well by a linear trend.

forecast(line1, h = 1)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2020       42.04231 40.44975 43.63487 39.55286 44.53176

#These predictions are based on the assumption that the previous trend will continue into the next olympics, but in reality it seems to be leveling off, so this prediction is most likely to be too low.

Number 2

summary(huron)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.960   8.135   9.120   9.004   9.875  11.860

autoplot(huron)

#This data has a wide range of variability with a slight downward trend. This appears to look like randomness.

line2 <- tslm(huron ~ trend)
line2

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Coefficients:
## (Intercept)        trend  
##     10.2020      -0.0242

forecast(line2, h = 8)

##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 1973       7.806127 6.317648 9.294605 5.516501 10.095752
## 1974       7.781926 6.292536 9.271315 5.490899 10.072952
## 1975       7.757724 6.267406 9.248043 5.465269 10.050179
## 1976       7.733523 6.242259 9.224788 5.439613 10.027434
## 1977       7.709322 6.217094 9.201550 5.413929 10.004715
## 1978       7.685121 6.191912 9.178331 5.388219  9.982024
## 1979       7.660920 6.166712 9.155128 5.362481  9.959359
## 1980       7.636719 6.141494 9.131943 5.336717  9.936721

#This model has a lager interval for this data. The data is only ranging from as low a 6.0 and as high as 11.9. With that being said, the confidence interval is 5.5 to 10.1, therefore this is not a good model. This is cause from the wide range of varience and lack of a trend.

NUumber 3

autoplot(elecdaily)

#There is a positive relationship because as the temperature rises people use more electricity on air conditioning.

line3 <- tslm(Demand ~ Temperature, data = elecdaily)
line3

## 
## Call:
## tslm(formula = Demand ~ Temperature, data = elecdaily)
## 
## Coefficients:
## (Intercept)  Temperature  
##    212.3856       0.4182

checkresiduals(line3)

## 
##  Breusch-Godfrey test for serial correlation of order up to 14
## 
## data:  Residuals from Linear regression model
## LM test = 271.66, df = 14, p-value < 2.2e-16

max(elecdaily)

## [1] 347.6376

#This model is not adaquate to explain the data. There is a clear trend in the data that is not being captured by Temperature alone and we can see this with the ACF graph. There is a large outlier that overstates the max of this data. Typically a number like this will have a negative impact on our forecast since it is not repeated or predictable.

fcast15 <- forecast(line3, newdata = data.frame(Temperature = 15))
fcast15

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 53.14286       218.6592 184.4779 252.8406 166.3039 271.0146

fcast35 <- forecast(line3, newdata = data.frame(Temperature = 35))
fcast35

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 53.14286       227.0242 192.6585 261.3898 174.3866 279.6617

#The forecast for 35 degrees is higher than the prediction for 15 degrees. The more extreme tampatures are going to have a higher demand than more comfortable temps. However, for both forecasts there is a large confidence interval which goes in hand with our subpar forecast using only temperature.

elecdaily %>%
  as.data.frame() %>%
  ggplot(aes(x = Temperature, y = Demand))+
  ylab("Demand") +
  xlab("Temperature") +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

#This graph shows a two-way relationship between temperature and demand. There is a negative relationship with low tempatures and a positive relationship with high tempatures.

Number 4

autoplot(fancy)

#As stated in the problem, there are large increases in sales at Christmas and then at the surfing festival in March. Also it seems that these events are increasing in magnitude each year. You can see a steady increase in sales even between peak times their slow season is also rising over time. This makes it necessary to take the log of our data before forcasting. With out doing this step there would be an increase in residuals over time.

autoplot(log(fancy))

fancy1 <- log(fancy)

line4 <- tslm(fancy1 ~ trend + season)

checkresiduals(line4)

## 
##  Breusch-Godfrey test for serial correlation of order up to 16
## 
## data:  Residuals from Linear regression model
## LM test = 37.152, df = 16, p-value = 0.001996

df <- as.data.frame(fancy1)

df[,"Residuals"] <- as.numeric(residuals(line4))

ggplot(df, aes(x=x, y=Residuals)) +
  geom_point()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

#With looking at the residuals like this, there does not appear to be a major problem in the forecast

line4

## 
## Call:
## tslm(formula = fancy1 ~ trend + season)
## 
## Coefficients:
## (Intercept)        trend      season2      season3      season4  
##     7.60586      0.02239      0.25104      0.69521      0.38293  
##     season5      season6      season7      season8      season9  
##     0.40799      0.44696      0.60822      0.58535      0.66634  
##    season10     season11     season12  
##     0.74403      1.20302      1.95814

#These show the large increases near Christmas, as well as the surfing festival in March

fcastraw <- forecast(line4, h = 36)
str(fcastraw)

## List of 11
##  $ model    :List of 17
##   ..$ coefficients : Named num [1:13] 7.6059 0.0224 0.251 0.6952 0.3829 ...
##   .. ..- attr(*, "names")= chr [1:13] "(Intercept)" "trend" "season2" "season3" ...
##   ..$ residuals    : Time-Series [1:84] from 1987 to 1994: -0.2108 -0.1195 -0.4164 0.0956 0.1045 ...
##   .. ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##   ..$ effects      : Named num [1:84] -84.50268 5.48913 -1.10989 0.00916 -0.86716 ...
##   .. ..- attr(*, "names")= chr [1:84] "(Intercept)" "trend" "season2" "season3" ...
##   ..$ rank         : int 13
##   ..$ fitted.values: Time-Series [1:84] from 1987 to 1994: 7.63 7.9 8.37 8.08 8.13 ...
##   .. ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##   ..$ assign       : int [1:13] 0 1 2 2 2 2 2 2 2 2 ...
##   ..$ qr           :List of 5
##   .. ..$ qr   : num [1:84, 1:13] -9.165 0.109 0.109 0.109 0.109 ...
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:84] "1" "2" "3" "4" ...
##   .. .. .. ..$ : chr [1:13] "(Intercept)" "trend" "season2" "season3" ...
##   .. .. ..- attr(*, "assign")= int [1:13] 0 1 2 2 2 2 2 2 2 2 ...
##   .. .. ..- attr(*, "contrasts")=List of 1
##   .. .. .. ..$ season: chr "contr.treatment"
##   .. ..$ qraux: num [1:13] 1.11 1.16 1.09 1.07 1.06 ...
##   .. ..$ pivot: int [1:13] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ tol  : num 1e-07
##   .. ..$ rank : int 13
##   .. ..- attr(*, "class")= chr "qr"
##   ..$ df.residual  : int 71
##   ..$ contrasts    :List of 1
##   .. ..$ season: chr "contr.treatment"
##   ..$ xlevels      :List of 1
##   .. ..$ season: chr [1:12] "1" "2" "3" "4" ...
##   ..$ call         : language tslm(formula = fancy1 ~ trend + season)
##   ..$ terms        :Classes 'terms', 'formula'  language fancy1 ~ trend + season
##   .. .. ..- attr(*, "variables")= language list(fancy1, trend, season)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "fancy1" "trend" "season"
##   .. .. .. .. ..$ : chr [1:2] "trend" "season"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "trend" "season"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: 0x000000002249e9a0> 
##   .. .. ..- attr(*, "predvars")= language list(fancy1, trend, season)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "factor"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "fancy1" "trend" "season"
##   ..$ model        :'data.frame':    84 obs. of  3 variables:
##   .. ..$ fancy1: num [1:84] 7.42 7.78 7.95 8.17 8.23 ...
##   .. ..$ trend : int [1:84] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ season: Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..- attr(*, "terms")=Classes 'terms', 'formula'  language fancy1 ~ trend + season
##   .. .. .. ..- attr(*, "variables")= language list(fancy1, trend, season)
##   .. .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. ..$ : chr [1:3] "fancy1" "trend" "season"
##   .. .. .. .. .. ..$ : chr [1:2] "trend" "season"
##   .. .. .. ..- attr(*, "term.labels")= chr [1:2] "trend" "season"
##   .. .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. .. ..- attr(*, "intercept")= int 1
##   .. .. .. ..- attr(*, "response")= int 1
##   .. .. .. ..- attr(*, ".Environment")=<environment: 0x000000002249e9a0> 
##   .. .. .. ..- attr(*, "predvars")= language list(fancy1, trend, season)
##   .. .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "factor"
##   .. .. .. .. ..- attr(*, "names")= chr [1:3] "fancy1" "trend" "season"
##   ..$ data         :'data.frame':    84 obs. of  3 variables:
##   .. ..$ fancy1: Time-Series [1:84] from 1987 to 1994: 7.42 7.78 7.95 8.17 8.23 ...
##   .. ..$ trend : int [1:84] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ season: Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ x            : Time-Series [1:84] from 1987 to 1994: 7.42 7.78 7.95 8.17 8.23 ...
##   .. ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##   ..$ method       : chr "Linear regression model"
##   ..$ series       : chr "fancy1"
##   ..- attr(*, "class")= chr [1:2] "tslm" "lm"
##  $ mean     : Time-Series [1:36] from 1994 to 1997: 9.51 9.78 10.25 9.96 10.01 ...
##   ..- attr(*, "names")= chr [1:36] "1" "2" "3" "4" ...
##  $ lower    : Time-Series [1:36, 1:2] from 1994 to 1997: 9.25 9.52 9.99 9.7 9.74 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##  $ upper    : Time-Series [1:36, 1:2] from 1994 to 1997: 9.77 10.04 10.51 10.22 10.27 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : NULL
##  $ level    : num [1:2] 80 95
##  $ x        : Time-Series [1:84] from 1987 to 1994: 7.42 7.78 7.95 8.17 8.23 ...
##   ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##  $ series   : chr "fancy1"
##  $ method   : chr "Linear regression model"
##  $ newdata  :'data.frame':   36 obs. of  2 variables:
##   ..$ trend : int [1:36] 85 86 87 88 89 90 91 92 93 94 ...
##   ..$ season: Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ residuals: Time-Series [1:84] from 1987 to 1994: -0.2108 -0.1195 -0.4164 0.0956 0.1045 ...
##   ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##  $ fitted   : Time-Series [1:84] from 1987 to 1994: 7.63 7.9 8.37 8.08 8.13 ...
##   ..- attr(*, "names")= chr [1:84] "1" "2" "3" "4" ...
##  - attr(*, "class")= chr "forecast"

fcastmean <- exp(as.numeric(fcastraw$mean))

fcastup <- exp(as.numeric(fcastraw$upper))

fcastlow <- exp(as.numeric(fcastraw$lower))

fcast1 <- data.frame(Mean = fcastmean, Upper = fcastup, Lower = fcastlow)
fcast1

##         Mean     Upper      Lower
## 1   13484.06  17527.60  10373.351
## 2   17724.45  23039.57  13635.501
## 3   28261.51  36736.44  21741.712
## 4   21149.61  27491.85  16270.491
## 5   22177.42  28827.88  17061.192
## 6   23580.87  30652.19  18140.872
## 7   28334.55  36831.38  21797.899
## 8   28321.23  36814.06  21787.651
## 9   31405.93  40823.80  24160.730
## 10  34711.77  45120.97  26703.924
## 11  56174.03  73019.23  43214.941
## 12 122237.73 158893.80  94038.046
## 13  17640.97  22998.44  13531.520
## 14  23188.60  30230.86  17786.832
## 15  36974.06  48202.90  28360.981
## 16  27669.68  36072.82  21224.045
## 17  29014.35  37825.85  22255.475
## 18  30850.46  40219.58  23663.864
## 19  37069.62  48327.47  28434.274
## 20  37052.19  48304.75  28420.905
## 21  41087.86  53566.03  31516.469
## 22  45412.83  59204.47  34833.938
## 23  73491.54  95810.55  56371.738
## 24 159921.57 208488.92 122667.947
## 25  23079.39  30195.26  17640.456
## 26  30337.26  39690.89  23187.922
## 27  48372.55  63286.85  36972.981
## 28  36199.78  47360.95  27668.868
## 29  37958.98  49662.56  29013.498
## 30  40361.14  52805.34  30849.554
## 31  48497.56  63450.40  37068.530
## 32  48474.75  63420.57  37051.102
## 33  53754.55  70328.24  41086.654
## 34  59412.84  77731.09  45411.495
## 35  96147.75 125792.17  73489.391
## 36 209222.70 273730.57 159916.886
## 37  13484.06  20201.76   9000.202
## 38  17724.45  26554.69  11830.533
## 39  28261.51  42341.27  18863.703
## 40  21149.61  31686.25  14116.722
## 41  22177.42  33226.11  14802.755
## 42  23580.87  35328.75  15739.516
## 43  28334.55  42450.69  18912.452
## 44  28321.23  42430.74  18903.560
## 45  31405.93  47052.23  20962.508
## 46  34711.77  52005.01  23169.053
## 47  56174.03  84159.68  37494.462
## 48 122237.73 183136.01  81589.975
## 49  17640.97  26549.43  11721.681
## 50  23188.60  34898.53  15407.845
## 51  36974.06  55645.47  24567.703
## 52  27669.68  41642.49  18385.332
## 53  29014.35  43666.20  19278.808
## 54  30850.46  46429.53  20498.826
## 55  37069.62  55789.28  24631.193
## 56  37052.19  55763.05  24619.613
## 57  41087.86  61836.67  27301.145
## 58  45412.83  68345.69  30174.903
## 59  73491.54 110603.79  48832.025
## 60 159921.57 240679.82 106261.125
## 61  23079.39  34924.36  15251.766
## 62  30337.26  45907.16  20048.051
## 63  48372.55  73198.66  31966.480
## 64  36199.78  54778.49  23922.233
## 65  37958.98  57440.57  25084.787
## 66  40361.14  61075.57  26672.224
## 67  48497.56  73387.82  32049.090
## 68  48474.75  73353.32  32034.022
## 69  53754.55  81342.86  35523.121
## 70  59412.84  89905.12  39262.336
## 71  96147.75 145493.40  63538.212
## 72 209222.70 316601.49 138262.581

#To improve this forcast further, you could try to find another predictor of the target variable. This could be number of vistors that month to the resort.

Homework 3

Jordan Metz

February 18, 2019