WEEK 2 HOMEWORK

CHAPTER 5 EXERCISE : Time Series Regression Models

1. Daily electricity demand for Victoria, Australia,during 2014 is contained in elecdaily.The data for the first 20 days can be obtained as follows:

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

daily20 <- head(elecdaily, 20)
daily20

## Time Series:
## Start = c(1, 1) 
## End = c(3, 6) 
## Frequency = 7 
##            Demand WorkDay Temperature
## 1.000000 174.8963       0        26.0
## 1.142857 188.5909       1        23.0
## 1.285714 188.9169       1        22.2
## 1.428571 173.8142       0        20.3
## 1.571429 169.5152       0        26.1
## 1.714286 195.7288       1        19.6
## 1.857143 199.9029       1        20.0
## 2.000000 205.3375       1        27.4
## 2.142857 228.0782       1        32.4
## 2.285714 258.5984       1        34.0
## 2.428571 201.7970       0        22.4
## 2.571429 187.6298       0        22.5
## 2.714286 254.6636       1        30.0
## 2.857143 322.2323       1        42.4
## 3.000000 343.9934       1        41.5
## 3.142857 347.6376       1        43.2
## 3.285714 332.9455       1        43.1
## 3.428571 219.7517       0        23.7
## 3.571429 186.9816       0        22.3
## 3.714286 228.4876       1        24.0

1a.Plot the data and find the regression model for Demand with temperature as an explanatory variable. Why is there a positive relationship?

library(ggplot2)
autoplot(daily20, main="Electricity Demand for Victoria,Australia, during 2014")

regmodel <-lm(Demand~Temperature, data = elecdaily)
summary(regmodel)

## 
## Call:
## lm(formula = Demand ~ Temperature, data = elecdaily)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.474 -15.719  -0.336  15.767 117.184 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 212.3856     5.0080  42.409   <2e-16 ***
## Temperature   0.4182     0.2263   1.848   0.0654 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.55 on 363 degrees of freedom
## Multiple R-squared:  0.009322,   Adjusted R-squared:  0.006592 
## F-statistic: 3.416 on 1 and 363 DF,  p-value: 0.0654

*regmodel=regression model*
It is observed that there is a small correlation between the variables Demand and Temperature. 

There is a positive relationship because temperature is positively related to demand.

1b. Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

plot(regmodel)

It is observed that a majority of the datasets are outside of the red line. This indicates that the values of the dataset have no influence really, however, some observations are influential to the regression results

1c. Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was  
15∘ and compare it with the forecast if the with maximum temperature was  
35∘. Do you believe these forecasts?

demand<- tslm(Demand ~ Temperature, data = daily20)
Temperature_Range <- data.frame(Temperature=c(15,35))
forecast.Demandtemp <- forecast(demand,Temperature_Range)
forecast.Demandtemp

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 3.857143       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.000000       275.7146 245.2278 306.2014 227.57056 323.8586

I believe these forecasts because the forecasted range are reasonable.

1d. Give prediction intervals for your forecasts. The following R code will get you started:

autoplot(daily20, facets=TRUE)

daily20 %>%
  as.data.frame() %>%
  ggplot(aes(x=Temperature, y=Demand)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

fit <- tslm(Demand ~ Temperature, data=daily20)
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 5
## 
## data:  Residuals from Linear regression model
## LM test = 3.8079, df = 5, p-value = 0.5774

forecast(fit, newdata=data.frame(Temperature=c(15,35)))

##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 3.857143       140.5701 108.6810 172.4591  90.21166 190.9285
## 4.000000       275.7146 245.2278 306.2014 227.57056 323.8586

The prediction intervals are acceptable based on the traning dataset.

1e. Plot Demand vs Temperature for all of the available data in elecdaily. What does this say about your model?

plot(Demand~Temperature, data = elecdaily, main = "Demand vs. Temperature")

The graph above is very similar to the daily20 model graph created earlier on. This goes to show that the daily20 dataframe is a good representation of the full dataset.

2. Data set mens400 contains the winning times (in seconds) for the men’s 400 meters final in each Olympic Games from 1896 to 2016.

2a.Plot the winning time against the year.Describe the main features of the plot.

library(forecast)
autoplot(mens400,main = "Winning Time for the Men'a 400 meter Final in each Olympic Games frim 1896-2016")

There are many missing values in the data set which is contributing to the breaks in the graph above. Besides that, the dataset has a general decreasing trend.

2b. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

##Create a time series format
time.mens400 <- time(mens400)
tslm(mens400~time.mens400, data = mens400)

## 
## Call:
## tslm(formula = mens400 ~ time.mens400, data = mens400)
## 
## Coefficients:
##  (Intercept)  time.mens400  
##    172.48148      -0.06457

The winning times decreases by 0.06457 per year

##Adding a Regression Line

autoplot(mens400, main="Winning Time for the Men'a 400 meter Final in each Olympic Games frim 1896-2016", ylab="Time(Seconds)", xlab = "Year")+geom_smooth(method = "lm",se=FALSE)

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (stat_smooth).

2c. Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

mens400.res <-lm(mens400~time.mens400,data = mens400)
plot(mens400.res)

The first graph, Residuals vs Fitted graph displays that the slope decreases ar a slowy pace, while the winnign time decreases at a decreasing rate every year. This indicates that the fitted line is moderately acurate in the short term but does not offer any insight into winning times of the future.

2d. Predict the winning time for the men’s 400 meters final in the 2020 Olympics. Give a prediction interval for your forecasts. 
What assumptions have you made in these calculations?

Mens400.time <- time(mens400)
Mens400.lm <- lm(mens400~Mens400.time,data = mens400,na.action = na.exclude) ##<-removing the missing data 
Mens400.2020 <- forecast(Mens400.lm, newdata=data.frame(Mens400.time=2020))
Mens400.2020

##   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1       42.04231 40.44975 43.63487 39.55286 44.53176

Based on the calculations above, it is safe to assume that the model's residuals are normally distributed.

3.Type easter(ausbeer) and interpret what you see.

easter(ausbeer)

##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956 0.67 0.33 0.00 0.00
## 1957 0.00 1.00 0.00 0.00
## 1958 0.00 1.00 0.00 0.00
## 1959 1.00 0.00 0.00 0.00
## 1960 0.00 1.00 0.00 0.00
## 1961 0.33 0.67 0.00 0.00
## 1962 0.00 1.00 0.00 0.00
## 1963 0.00 1.00 0.00 0.00
## 1964 1.00 0.00 0.00 0.00
## 1965 0.00 1.00 0.00 0.00
## 1966 0.00 1.00 0.00 0.00
## 1967 1.00 0.00 0.00 0.00
## 1968 0.00 1.00 0.00 0.00
## 1969 0.00 1.00 0.00 0.00
## 1970 1.00 0.00 0.00 0.00
## 1971 0.00 1.00 0.00 0.00
## 1972 0.33 0.67 0.00 0.00
## 1973 0.00 1.00 0.00 0.00
## 1974 0.00 1.00 0.00 0.00
## 1975 1.00 0.00 0.00 0.00
## 1976 0.00 1.00 0.00 0.00
## 1977 0.00 1.00 0.00 0.00
## 1978 1.00 0.00 0.00 0.00
## 1979 0.00 1.00 0.00 0.00
## 1980 0.00 1.00 0.00 0.00
## 1981 0.00 1.00 0.00 0.00
## 1982 0.00 1.00 0.00 0.00
## 1983 0.00 1.00 0.00 0.00
## 1984 0.00 1.00 0.00 0.00
## 1985 0.00 1.00 0.00 0.00
## 1986 1.00 0.00 0.00 0.00
## 1987 0.00 1.00 0.00 0.00
## 1988 0.00 1.00 0.00 0.00
## 1989 1.00 0.00 0.00 0.00
## 1990 0.00 1.00 0.00 0.00
## 1991 1.00 0.00 0.00 0.00
## 1992 0.00 1.00 0.00 0.00
## 1993 0.00 1.00 0.00 0.00
## 1994 0.00 1.00 0.00 0.00
## 1995 0.00 1.00 0.00 0.00
## 1996 0.00 1.00 0.00 0.00
## 1997 1.00 0.00 0.00 0.00
## 1998 0.00 1.00 0.00 0.00
## 1999 0.00 1.00 0.00 0.00
## 2000 0.00 1.00 0.00 0.00
## 2001 0.00 1.00 0.00 0.00
## 2002 1.00 0.00 0.00 0.00
## 2003 0.00 1.00 0.00 0.00
## 2004 0.00 1.00 0.00 0.00
## 2005 1.00 0.00 0.00 0.00
## 2006 0.00 1.00 0.00 0.00
## 2007 0.00 1.00 0.00 0.00
## 2008 1.00 0.00 0.00 0.00
## 2009 0.00 1.00 0.00 0.00
## 2010 0.00 1.00

str(ausbeer)

##  Time-Series [1:218] from 1956 to 2010: 284 213 227 308 262 228 236 320 272 233 ...

I see a time series table of quarterly values from 1956-2010. Each quater value is represented by 1 or 0, possibly signifying if Easter is observed?

4. An elasticity coefficient is the ratio of the percentage change in the forecast variable ( y) to the percentage change in the predictor variable ( x ). Mathematically, the elasticity is defined as  (dy/dx)×(x/y). Consider the log-log model,logy=β0+β1logx+ε.Express  
y as a function of x and show that the coefficient β1 is the elasticity coefficient.

# log(y)=β0+β1logx+ε
# β1log(x)= log(y)-β0-ε
# β1= (log(y)-β0-ε)/log(x)

The coefficient β1 is proven to be the elasticity coefficinet above. This is because the mathematical proof above shows the percentage change in y to the chnage in x.

5. The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

5a. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

autoplot(fancy, main = "Sale Volumes Overtime",ylab = "Sales Volume")

seasonplot(fancy, main = "Seasonal Sales Volume Overtime", ylab = "Sales Volume")

The illustration above gives indications of seasonal trends. Just as it was meentioned in the prompt, we observe a spike in Sales right before the new year- indicating that the holiday season has an effect on sale volumes.
There is a large increase in sales from 1992-1994.

5b. Explain why it is necessary to take logarithms of these data before fitting a model.

It is necessary to take logarithms of these data before fitting a model due to the seasonal variations we observe,

5c. Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.

##Create a surfing festival dummy variable
fancy.time <- time(fancy)
SurfingFestival <- c()
for(i in 1:length(fancy.time)){
  month <- round(12*(fancy.time[i] - floor(fancy.time[i]))) + 1
  year <- floor(fancy.time[i])
  if(year >= 1988 & month == 3) 
    {
    SurfingFestival[i] <- 1 ##  - 1  if surfing festival is in town
  } else {
    SurfingFestival[i] <- 0 ## 0 if  surfing festival is not in town
  }
}

##Run Regression and Use boxcoc? 
Fancy.reg <- (BoxCox(fancy,0))
Fancy.tslm <- tslm(Fancy.reg~trend+season+SurfingFestival)
summary(Fancy.tslm)

## 
## Call:
## tslm(formula = Fancy.reg ~ trend + season + SurfingFestival)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33673 -0.12757  0.00257  0.10911  0.37671 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     7.6196670  0.0742471 102.626  < 2e-16 ***
## trend           0.0220198  0.0008268  26.634  < 2e-16 ***
## season2         0.2514168  0.0956790   2.628 0.010555 *  
## season3         0.2660828  0.1934044   1.376 0.173275    
## season4         0.3840535  0.0957075   4.013 0.000148 ***
## season5         0.4094870  0.0957325   4.277 5.88e-05 ***
## season6         0.4488283  0.0957647   4.687 1.33e-05 ***
## season7         0.6104545  0.0958039   6.372 1.71e-08 ***
## season8         0.5879644  0.0958503   6.134 4.53e-08 ***
## season9         0.6693299  0.0959037   6.979 1.36e-09 ***
## season10        0.7473919  0.0959643   7.788 4.48e-11 ***
## season11        1.2067479  0.0960319  12.566  < 2e-16 ***
## season12        1.9622412  0.0961066  20.417  < 2e-16 ***
## SurfingFestival 0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared:  0.9567, Adjusted R-squared:  0.9487 
## F-statistic:   119 on 13 and 70 DF,  p-value: < 2.2e-16

5d. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

autoplot(Fancy.tslm$residuals, main="Fancy.tslm Residuals plot", ylab="Residuals")

It appears that the plot above still has the seasonal pattern that was observed earlier even after the transformations done. Indicating that the residual distribution is heteroskedastic.

5e. Do boxplots of the residuals for each month. Does this reveal any problems with the model?

cbind.data.frame(
  Month = factor(
    month.abb[round(12*(fancy.time - floor(fancy.time)) + 1)],
    labels = month.abb,
    ordered = TRUE
  ),
  Residuals = Fancy.tslm$residuals
) %>%
  ggplot(aes(x = Month,
             y = Residuals)) +
  geom_boxplot()

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

5f. What do the values of the coefficients tell you about each variable?

Fancy.tslm$coefficients

##     (Intercept)           trend         season2         season3         season4 
##      7.61966701      0.02201983      0.25141682      0.26608280      0.38405351 
##         season5         season6         season7         season8         season9 
##      0.40948697      0.44882828      0.61045453      0.58796443      0.66932985 
##        season10        season11        season12 SurfingFestival 
##      0.74739195      1.20674790      1.96224123      0.50151509

There is a large increase in season 10, this is an indication of the expansion during this time. The surfing festival coefficient shows an increase in sales.The trend coefficient is somewhat low but still indicates a gradual increase in sales volumes.

5g. What does the Breusch-Godfrey test tell you about your model?

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

bgtest(Fancy.tslm)

## 
##  Breusch-Godfrey test for serial correlation of order up to 1
## 
## data:  Fancy.tslm
## LM test = 25.031, df = 1, p-value = 5.642e-07

The Breusch-Godfrey test gives indication of the serail correlation in regards to the regression model. The p-value presented in quite less that 0.05.

5h. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

Fancy.TimeRange <- ts(data = fancy.time, start = 1994, end = c(1996,12), frequency=12)

forecast(Fancy.tslm, Fancy.TimeRange)

## Warning in forecast.lm(Fancy.tslm, Fancy.TimeRange): newdata column names not
## specified, defaulting to first variable required.

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1994       1006.002 501.0539 1510.950 227.5849 1784.419
## Feb 1994       1006.317 501.3479 1511.286 227.8675 1784.767
## Mar 1994       1006.396 501.6231 1511.168 228.2491 1784.542
## Apr 1994       1006.577 501.5658 1511.589 228.0624 1785.092
## May 1994       1006.667 501.6339 1511.699 228.1190 1785.214
## Jun 1994       1006.770 501.7159 1511.824 228.1896 1785.350
## Jul 1994       1006.995 501.9201 1512.070 228.3823 1785.608
## Aug 1994       1007.036 501.9403 1512.133 228.3910 1785.682
## Sep 1994       1007.182 502.0643 1512.299 228.5036 1785.860
## Oct 1994       1007.324 502.1850 1512.462 228.6128 1786.034
## Nov 1994       1007.847 502.6870 1513.006 229.1033 1786.590
## Dec 1994       1008.666 503.4851 1513.847 229.8900 1787.442
## Jan 1995       1006.768 501.5678 1511.967 227.9624 1785.573
## Feb 1995       1007.083 501.8618 1512.304 228.2450 1785.921
## Mar 1995       1007.161 502.1369 1512.186 228.6266 1785.696
## Apr 1995       1007.343 502.0797 1512.606 228.4399 1786.246
## May 1995       1007.432 502.1478 1512.717 228.4965 1786.368
## Jun 1995       1007.535 502.2298 1512.841 228.5670 1786.504
## Jul 1995       1007.761 502.4340 1513.088 228.7598 1786.762
## Aug 1995       1007.802 502.4542 1513.150 228.7685 1786.836
## Sep 1995       1007.947 502.5782 1513.317 228.8810 1787.014
## Oct 1995       1008.089 502.6989 1513.480 228.9903 1787.188
## Nov 1995       1008.612 503.2009 1514.024 229.4808 1787.744
## Dec 1995       1009.432 503.9990 1514.865 230.2674 1788.596
## Jan 1996       1007.533 502.0817 1512.985 228.3399 1786.727
## Feb 1996       1007.849 502.3757 1513.321 228.6225 1787.075
## Mar 1996       1007.927 502.6508 1513.203 229.0041 1786.850
## Apr 1996       1008.109 502.5936 1513.624 228.8174 1787.400
## May 1996       1008.198 502.6617 1513.734 228.8740 1787.522
## Jun 1996       1008.301 502.7437 1513.859 228.9445 1787.658
## Jul 1996       1008.527 502.9479 1514.105 229.1373 1787.916
## Aug 1996       1008.568 502.9681 1514.168 229.1460 1787.990
## Sep 1996       1008.713 503.0921 1514.334 229.2585 1788.168
## Oct 1996       1008.855 503.2128 1514.497 229.3678 1788.342
## Nov 1996       1009.378 503.7148 1515.042 229.8583 1788.898
## Dec 1996       1010.198 504.5129 1515.882 230.6449 1789.750

5i.Transform your predictions and intervals to obtain predictions and intervals for the raw data

Fancy.transform <- (BoxCox(fancy,0))
forecast(Fancy.transform)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1994       9.674641  9.464046  9.885235  9.352565  9.996717
## Feb 1994       9.958925  9.733371 10.184479  9.613970 10.303880
## Mar 1994      10.439733 10.200143 10.679323 10.073312 10.806155
## Apr 1994      10.121042  9.868185 10.373898  9.734331 10.507752
## May 1994      10.157291  9.891822 10.422759  9.751292 10.563289
## Jun 1994      10.227196  9.949682 10.504711  9.802774 10.651618
## Jul 1994      10.403289 10.114223 10.692356  9.961200 10.845378
## Aug 1994      10.382661 10.082480 10.682842  9.923574 10.841748
## Sep 1994      10.518849 10.207944 10.829754 10.043361 10.994337
## Oct 1994      10.612257 10.290980 10.933534 10.120906 11.103608
## Nov 1994      11.108401 10.777070 11.439732 10.601674 11.615129
## Dec 1994      11.898748 11.557653 12.239843 11.377088 12.420408
## Jan 1995       9.985303  9.634703 10.335903  9.449107 10.521499
## Feb 1995      10.269587  9.909735 10.629440  9.719240 10.819934
## Mar 1995      10.750396 10.381517 11.119274 10.186244 11.314547
## Apr 1995      10.431704 10.054009 10.809399  9.854070 11.009338
## May 1995      10.467953 10.081638 10.854268  9.877135 11.058770
## Jun 1995      10.537858 10.143107 10.932610  9.934137 11.141579
## Jul 1995      10.713952 10.310934 11.116969 10.097590 11.330314
## Aug 1995      10.693323 10.282201 11.104445 10.064567 11.322080
## Sep 1995      10.829511 10.410437 11.248586 10.188592 11.470430
## Oct 1995      10.922919 10.496035 11.349803 10.270057 11.575781
## Nov 1995      11.419063 10.984506 11.853621 10.754465 12.083661
## Dec 1995      12.209410 11.767308 12.651512 11.533273 12.885547

5j.How could you improve these predictions by modifying the model?

From a little research done, it seems the best way to improve this model is to use the boxcox.lambda funciton. This function is the most effective method for transformign regression models. Or perhaps using the trasnsofrmations to handle to exponential growth in the dataset.

6. The gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.

6a. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

str(gasoline)

##  Time-Series [1:1355] from 1991 to 2017: 6.62 6.43 6.58 7.22 6.88 ...

head(gasoline)

## Time Series:
## Start = 1991.1 
## End = 1991.19582477755 
## Frequency = 52.1785714285714 
## [1] 6.621 6.433 6.582 7.224 6.875 6.947

Gasoline.2004 <- window(gasoline,end=2005)
autoplot(Gasoline.2004, xlab="Year")

for (num in c(1,2,3,6,9,12)) {
 var_name <- paste("Fit",as.character(num),".gasoline.2004",sep = "") 
 assign(var_name,tslm(Gasoline.2004~trend+ fourier(Gasoline.2004, K= num))
        )
 print(autoplot(Gasoline.2004)+
         autolayer(get(var_name)$fitted.values,series=as.character(num))+ ggtitle(var_name)+
         ylab("Gasoline")+
         guides(colour=guide_legend(title = "Fourier Transform Pairs"))
       )
}

Combine all Fourier Pairs unto 1 graph

autoplot(Gasoline.2004) +
  autolayer(Fit1.gasoline.2004$fitted.values, series = "1") +
  autolayer(Fit6.gasoline.2004$fitted.values, series = "2") +
  autolayer(Fit9.gasoline.2004$fitted.values, series = "3") +
  autolayer(Fit9.gasoline.2004$fitted.values, series = "6") +
  autolayer(Fit12.gasoline.2004$fitted.values, series = "9") +
  autolayer(Fit12.gasoline.2004$fitted.values, series = "12") +
  guides(colour = guide_legend(title = "Fourier Transform pairs")) +
  scale_color_discrete(breaks = c(1, 2, 3, 6, 9, 12))

As displayed above, the more Fourier pairs usedm the more the fitted line seems to look like the original.

6b. Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

for(i in c(1, 2, 3, 6, 9, 12)){
  Fit.gasoline.2004.name <- paste(
    "Fit", as.character(i), ".gasoline.2004",
    sep = ""
  )
  writeLines(
    paste(
      "\n", Fit.gasoline.2004.name, "\n"
    )
  )
  print(CV(get(Fit.gasoline.2004.name)))
}

## 
##  Fit1.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.201921e-02 -1.813292e+03 -1.813208e+03 -1.790354e+03  8.250858e-01 
## 
##  Fit2.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  8.136854e-02 -1.819113e+03 -1.818957e+03 -1.787001e+03  8.269569e-01 
## 
##  Fit3.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.658846e-02 -1.863085e+03 -1.862834e+03 -1.821797e+03  8.375702e-01 
## 
##  Fit6.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##     0.0725333 -1902.7534284 -1902.0773721 -1833.9401782     0.8474536 
## 
##  Fit9.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.161451e-02 -1.912292e+03 -1.910980e+03 -1.815954e+03  8.506543e-01 
## 
##  Fit12.gasoline.2004 
## 
##            CV           AIC          AICc           BIC         AdjR2 
##  7.096945e-02 -1.919276e+03 -1.917110e+03 -1.795412e+03  8.532618e-01

The appropriate number of Fourier terms that minimizes CV and AICc is 9

6c.Check the residuals of the final model using the checkresiduals() function. Even though the residuals fail the correlation tests, the results are probably not severe enough to make much difference to the forecasts and prediction intervals. (Note that the correlations are relatively small, even though they are significant.)

checkresiduals(Fit9.gasoline.2004)

## 
##  Breusch-Godfrey test for serial correlation of order up to 104
## 
## data:  Residuals from Linear regression model
## LM test = 156.67, df = 104, p-value = 0.0006505

6d.To forecast using harmonic regression, you will need to generate the future values of the Fourier terms. This can be done as follows.

fc.gasoline <- forecast(
  Fit9.gasoline.2004,
  newdata=data.frame(fourier(
    Gasoline.2004, K = 9, h = 52)
  )
)

6e. Plot the forecasts along with the actual data for 2005. What do you find?

autoplot(fc.gasoline)+autolayer(window(gasoline, start=2004, end=2006)) +scale_x_continuous(limits = c(2004,2006))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 674 row(s) containing missing values (geom_path).

I found that the model's prediction is restrained in the 80% interval (the darker purple region).

7. Data set huron gives the water level of Lake Huron in feet from 1875 to 1972.

7a. Plot the data and comment on its features.

autoplot(huron, main = "Water level of Lake Huron in feet from 1875 to 1972.", ylab = "Feet", xlab = "Year")

str(huron)

##  Time-Series [1:98] from 1875 to 1972: 10.38 11.86 10.97 10.8 9.79 ...

head(huron)

## Time Series:
## Start = 1875 
## End = 1880 
## Frequency = 1 
## [1] 10.38 11.86 10.97 10.80  9.79 10.39

The illustration above shows a negative trend from 1880-1920. It shows a not so obvious seasonal pattern.

7b. Fit a linear regression and compare this to a piecewise linear trend model with a knot at 1915.

tslm(huron~trend)

## 
## Call:
## tslm(formula = huron ~ trend)
## 
## Coefficients:
## (Intercept)        trend  
##     10.2020      -0.0242

huron.reg <- tslm(huron~trend)

huron.time <- time(huron)
huron.1915 <- 1915

huron.sec <- ts(pmax(0,huron.time- huron.1915), start = 1875)

7c. Generate forecasts from these two models for the period up to 1980 and comment on these.

##First Forecast
huron.reg <- tslm(huron~trend)
forecast(huron.reg,h=8)

##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 1973       7.806127 6.317648 9.294605 5.516501 10.095752
## 1974       7.781926 6.292536 9.271315 5.490899 10.072952
## 1975       7.757724 6.267406 9.248043 5.465269 10.050179
## 1976       7.733523 6.242259 9.224788 5.439613 10.027434
## 1977       7.709322 6.217094 9.201550 5.413929 10.004715
## 1978       7.685121 6.191912 9.178331 5.388219  9.982024
## 1979       7.660920 6.166712 9.155128 5.362481  9.959359
## 1980       7.636719 6.141494 9.131943 5.336717  9.936721

##Second Forecast
huron.reg2 <- tslm(huron~huron.sec)
forecast(huron.sec,h=8)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1973             58 57.86560 58.13440 57.79446 58.20554
## 1974             59 58.70969 59.29031 58.55602 59.44398
## 1975             60 59.51703 60.48297 59.26136 60.73864
## 1976             61 60.29422 61.70578 59.92061 62.07939
## 1977             62 61.04503 62.95497 60.53950 63.46050
## 1978             63 61.77204 64.22796 61.12199 64.87801
## 1979             64 62.47716 65.52284 61.67102 66.32898
## 1980             65 63.16194 66.83806 62.18893 67.81107

Time<-time(huron)
t2<-ts(pmax(0,Time-1915),start = 1875)
pwhuron<-tslm(huron~Time+t2)
summary(pwhuron)

## 
## Call:
## tslm(formula = huron ~ Time + t2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49626 -0.66240 -0.07139  0.85163  2.39222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 132.90870   19.97687   6.653 1.82e-09 ***
## Time         -0.06498    0.01051  -6.181 1.58e-08 ***
## t2            0.06486    0.01563   4.150 7.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.045 on 95 degrees of freedom
## Multiple R-squared:  0.3841, Adjusted R-squared:  0.3711 
## F-statistic: 29.62 on 2 and 95 DF,  p-value: 1.004e-10

Though the format used for the forecasts above are slightly different, we observe similar increasing trends in th two.We also observe that the is a smaller residual standard error and a ;arger adjusted r-square associated with the piecewise model compared to the linear model.

CHAPTER 6 EXERCISE : Time series decomposition

1. Show that a  3×5 MA is equivalent to a 7-term weighted moving average with weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.

Well 3x5MA is taking the average of the first five terms and then it gets the moving average of all three obeservations. In other words a 3x5 moving average signifiews a three moving average of a  five moving average. 
Therefore since:
 Weights = c(0.067, 0.133, 0.200, 0.200, 0.200, 0.133, 0.067)
 
 and

 3x5 MA = [((Y1 + Y2 + Y3 + Y4 + Y5)/5) + ((Y2 + Y3 + Y4 + Y5 + Y6)/5) + ((Y3 + Y4 + Y5 + Y6 + Y7)/5)] / 3
 
by plugging in these weighted values proves that the 3x5 moving average is equal to a 7-term weighted moving average whih is 0.067.

2.The plastics data set consists of the monthly sales (in thousands) of product A for a plastics manufacturer for five years.

2a.Plot the time series of sales of product A. Can you identify seasonal fluctuations and/or a trend-cycle?

autoplot(plastics, main="Monthly Sales of Product-A", xlab="Year", ylab="Monthly sales(thousands")

str(plastics)

##  Time-Series [1:60] from 1 to 5.92: 742 697 776 898 1030 ...

head(plastics)

##    Jan  Feb  Mar  Apr  May  Jun
## 1  742  697  776  898 1030 1107

The plot shows a seasonal trend of sales. It indicated that sales are most highest at the end of each given year and lowest at the beginning of the year

2b. Use a classical multiplicative decomposition to calculate the trend-cycle and seasonal indices.

plastics.decompose <- decompose(plastics,type = "multiplicative")
autoplot(plastics.decompose, main="Monthly sales of Product-A", xlab="Year")

2c. Do the results support the graphical interpretation from part a?

 The results do support the graphical intepretation from part a. It also shows the positive trend in sales observed in part a.

2d.Compute and plot the seasonally adjusted
data.

  plastics.seasonal <- plastics.decompose$x/plastics.decompose$seasonal
  autoplot(plastics.seasonal, main="Monthly Sales of Product-A", xlab="Year", ylab="Sales(thousands")

2e.Change one observation to be an outlier (e.g., add 500 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

plastics.outlier <- plastics
plastics.outlier[15]<- plastics.outlier[15]+500
plastics.outlier

##    Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  742  697  776  898 1030 1107 1165 1216 1208 1131  971  783
## 2  741  700 1274  932 1099 1223 1290 1349 1341 1296 1066  901
## 3  896  793  885 1055 1204 1326 1303 1436 1473 1453 1170 1023
## 4  951  861  938 1109 1274 1422 1486 1555 1604 1600 1403 1209
## 5 1030 1032 1126 1285 1468 1637 1611 1608 1528 1420 1119 1013

plastics.decomp2 <- decompose(plastics.outlier,type="multiplicative")
plastics.adjusted <- plastics.decomp2$x/plastics.decomp2$seasonal
autoplot(plastics.adjusted, main="Monthly Sales of Product-A (with modified value of 15)", xlab="Year", ylab="Sales(thousands)")

The forecasted values stay somewhat stable, however, there is an observed spike at the point in which the outlier of 500 to value 15 was introduced.

2f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?

##Add 500 to the end value 
plastics.end <- plastics
plastics.end[40] <- plastics.end[40]+500
autoplot(plastics.end, main="Monthly Sales of Product-A (with modified value of 40)", xlab="Year", ylab="Sales(thousands)")

Introducing the outlier at the end seems to have shifted the forecasts a bit. However, it doesnt seem to show a large of a spike like we observed in part e. Perhaps if i had done a multiplicative decomposition, I might've seen the change better.

3. Recall your retail time series data (from Exercise 3 in Section 2.10). Decompose the series using X11. Does it reveal any outliers, or unusual features that you had not noticed previously?

##Import retail data 
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
head(retaildata)

## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # … with 184 more variables: A3349336V <dbl>, A3349337W <dbl>, A3349397X <dbl>,
## #   A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>, A3349790V <dbl>,
## #   A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>, A3349873A <dbl>,
## #   A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>, A3349789K <dbl>,
## #   A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>, A3349799R <dbl>,
## #   A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>, A3349416V <dbl>,
## #   A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>, A3349727C <dbl>,
## #   A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>, A3349349F <dbl>,
## #   A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>, A3349566A <dbl>,
## #   A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>, A3349561R <dbl>,
## #   A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>, A3349637X <dbl>,
## #   A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>, A3349719C <dbl>,
## #   A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>, A3349480L <dbl>,
## #   A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>, A3349481R <dbl>,
## #   A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>, A3349654A <dbl>,
## #   A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>, A3349656F <dbl>,
## #   A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>, A3349360V <dbl>,
## #   A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>, A3349575C <dbl>,
## #   A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>, A3349433W <dbl>,
## #   A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>, A3349815C <dbl>,
## #   A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>, A3349742A <dbl>,
## #   A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>, A3349824F <dbl>,
## #   A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>, A3349434X <dbl>,
## #   A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>, A3349908R <dbl>,
## #   A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>, A3349365F <dbl>,
## #   A3349746K <dbl>, …

##Creating a time series 
retail.ts <- ts(retaildata[,"A3349627V"], frequency = 12, start = c(1982,4))

##Plotting the time series
autoplot(retail.ts, main="Monthly Australian Retail Sales", ylab="Sales", xlab="Year")

##Decompose the series using X11
library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:psych':
## 
##     outlier

retail.x11 <- seas(retail.ts, x11="")
autoplot(retail.x11, main="X11 Decomposition on Monthly Australian Retail Sales", xlab="Year")

There are major outliers around 1989,1994,etc. are revelaed in the remainder plot.

4.Figures 6.16 and 6.17 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

4a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

## Figures 6.16 and 6.17 show very little seaonality in the civilian labor force in Australia each month from Februaury 1978 to August 1995. Following the period, we see postive trends of the number of workers. We also observe a drop of the overall amoount of workers which may indicate stagnation in growth of the labor force. There seem to be major outliers around 1992- some more research may explain a decrease fo this size; which from the next question seems to be a recession. thus explaining such a decrease. 

4b.Is the recession of 1991/1992 visible in the estimated components?

##Yes, as mentioned above, the recession of 1991/1992 is very visible in the remainder plot.

5.This exercise uses the cangas data (monthly Canadian gas production in billions of cubic metres, January 1960 – February 2005).

5a.Plot the data using autoplot(), ggsubseriesplot() and ggseasonplot() to look at the effect of the changing seasonality over time. What do you think is causing it to change so much?

str(cangas)

##  Time-Series [1:542] from 1960 to 2005: 1.43 1.31 1.4 1.17 1.12 ...

head(cangas)

##         Jan    Feb    Mar    Apr    May    Jun
## 1960 1.4306 1.3059 1.4022 1.1699 1.1161 1.0113

autoplot(cangas, main = "Monthly Canadian Gas Production", ylab = "Gas Production(billions)",xlab = "Year")

ggsubseriesplot(cangas,main = "Monthly Canadian Gas Production", ylab = "Gas Production(billions)")

ggseasonplot(cangas,main = "Monthly Canadian Gas Production", ylab = "Gas Production(billions)")

Oil Productions seems to decrease during the summer monthes. This is most likely due to there being less of a demand for oil powered products such as heaters,etc. The increase in gas production observed is most likely in correlation with the use of electricity powered products overtime as technology grows overtime.

5b.Do an STL decomposition of the data. You will need to choose s.window to allow for the changing shape of the seasonal component.

cangas %>%
stl(t.window = 13, s.window = "periodic", robust = TRUE) %>%
  autoplot()

5c.ompare the results with those obtained using SEATS and X11. How are they different?

cangasts <- ts(cangas,frequency=12, start=c(1960,45))

X11cangas<-seas(cangasts, x11 = "")

autoplot(X11cangas)

The results appear to be somewhat similar, however, some differences are observed.
We observe more variation in seasonality with SEATS and X11 than we do with STL.

6.We will use the bricksq data (Australian quarterly clay brick production. 1956–1994) for this exercise.

6a.Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality.)

bricksq.stl <- stl(bricksq, s.window = "periodic",t.window = 26)
autoplot(bricksq.stl)

6b.Compute and plot the seasonally adjusted data.

bricksq.adj <- seasadj(bricksq.stl)
autoplot(bricksq.adj, main="Seasonal Adjusted Brick Production", ylab="Brick Production (Thousands)", xlab="Year")

6c.Use a naïve method to produce forecasts of the seasonally adjusted data.

bricksq.naive <- naive(bricksq.adj)
autoplot(bricksq.naive, main="Brick Production from 1956-1994", ylab="Brick Production", xlab="Year")

6d.Use stlf() to reseasonalise the results, giving forecasts for the original data.

bricksq.fc <- stlf(bricksq)
bricksq.fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       465.7326 437.4903 493.9748 422.5398 508.9254
## 1995 Q1       424.6727 384.7119 464.6335 363.5580 485.7874
## 1995 Q2       483.9914 435.0232 532.9595 409.1010 558.8817
## 1995 Q3       493.9988 437.4242 550.5734 407.4754 580.5222
## 1995 Q4       465.7326 402.4454 529.0198 368.9431 562.5220
## 1996 Q1       424.6727 355.3067 494.0387 318.5865 530.7589
## 1996 Q2       483.9914 409.0259 558.9568 369.3416 598.6411
## 1996 Q3       493.9988 413.8128 574.1848 371.3650 616.6326

autoplot(bricksq.fc)

6e.Do the residuals look uncorrelated?

checkresiduals(bricksq.fc)

## Warning in checkresiduals(bricksq.fc): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 41.128, df = 6, p-value = 2.733e-07
## 
## Model df: 2.   Total lags used: 8

The residuals are observed to be normally distributed and show correlation

6f.Repeat with a robust STL decomposition. Does it make much difference?

brickF1 <- stlf(bricksq, robust=TRUE)
brickF1

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1994 Q4       462.2355 432.9988 491.4723 417.5217 506.9493
## 1995 Q1       423.5662 382.1965 464.9358 360.2967 486.8356
## 1995 Q2       486.4525 435.7558 537.1492 408.9186 563.9864
## 1995 Q3       493.9985 435.4245 552.5726 404.4173 583.5798
## 1995 Q4       462.2355 396.7089 527.7621 362.0212 562.4498
## 1996 Q1       423.5662 351.7427 495.3896 313.7216 533.4107
## 1996 Q2       486.4525 408.8280 564.0770 367.7360 605.1689
## 1996 Q3       493.9985 410.9649 577.0322 367.0096 620.9875

autoplot(brickF1)

checkresiduals(brickF1)

## Warning in checkresiduals(brickF1): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,N,N)
## Q* = 28.163, df = 6, p-value = 8.755e-05
## 
## Model df: 2.   Total lags used: 8

It seems that the results are somewhat similiar despite doing a robust stl decompostion.

6g. Compare forecasts from stlf() with those from snaive(), using a test set comprising the last 2 years of data. Which is better?

##Split data into training and test sets 
bricksq.train <- subset(bricksq, end = length(bricksq-9))

bricksq.test<- subset(bricksq, start = length(bricksq)-8)

bricksq.naive2 <- snaive(bricksq.train)
bricksq.stlf <-  stlf(bricksq.train, robust = TRUE)
##Plot 
autoplot(bricksq.naive2)

The stlf function seems to create less variable forecast(more accurately predicted)and also seems like the best predictor for this Bricksq data.

7.se stlf() to produce forecasts of the writing series with either method="naive" or method="rwdrift", whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(writing)

##Choose drift method 
stlf(writing, method="rwdrift")

##          Point Forecast    Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1978       994.1148 934.1959 1054.0337 902.47669 1085.7529
## Feb 1978      1028.3524 943.2589 1113.4459 898.21318 1158.4917
## Mar 1978      1085.7244 981.0733 1190.3756 925.67431 1245.7745
## Apr 1978      1016.3740 895.0350 1137.7130 830.80195 1201.9460
## May 1978       985.4141 849.1980 1121.6301 777.08964 1193.7385
## Jun 1978      1053.7226 903.9001 1203.5452 824.58886 1282.8564
## Jul 1978       906.0076 743.5296 1068.4857 657.51890 1154.4963
## Aug 1978       498.0555 323.6657  672.4452 231.34939  764.7615
## Sep 1978       941.7754 756.0746 1127.4762 657.77056 1225.7803
## Oct 1978      1030.8382 834.3232 1227.3531 730.29446 1331.3819
## Nov 1978       968.4962 761.5860 1175.4064 652.05442 1284.9380
## Dec 1978      1036.1790 819.2324 1253.1257 704.38778 1367.9703
## Jan 1979      1036.5608 809.8888 1263.2329 689.89586 1383.2258
## Feb 1979      1070.7985 834.6736 1306.9233 709.67672 1431.9202
## Mar 1979      1128.1704 882.8340 1373.5069 752.96072 1503.3802
## Apr 1979      1058.8200 804.4868 1313.1532 669.85099 1447.7891
## May 1979      1027.8601 764.7231 1290.9971 625.42673 1430.2935
## Jun 1979      1096.1687 824.4019 1367.9354 680.53727 1511.8001
## Jul 1979       948.4537 668.2152 1228.6921 519.86593 1377.0414
## Aug 1979       540.5015 251.9355  829.0675  99.17787  981.8251
## Sep 1979       984.2214 687.4599 1280.9830 530.36377 1438.0791
## Oct 1979      1073.2842 768.4484 1378.1200 607.07804 1539.4903
## Nov 1979      1010.9422 698.1441 1323.7404 532.55884 1489.3257
## Dec 1979      1078.6251 757.9683 1399.2818 588.22284 1569.0273

##Box cox
writing.boxcox <- stlf(writing, method="rwdrift", robust = TRUE, lambda = BoxCox.lambda(writing))
autoplot(writing.boxcox)

8.Use stlf() to produce forecasts of the fancy series with either method="naive" or method="rwdrift", whichever is most appropriate. Use the lambda argument if you think a Box-Cox transformation is required.

autoplot(fancy)

##Using the drift method for this dataset as well
stlf(fancy, method="rwdrift")

##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## Jan 1994       62016.88 54142.09  69891.67 49973.43  74060.33
## Feb 1994       63653.72 52450.20  74857.24 46519.41  80788.02
## Mar 1994       69171.00 55368.11  82973.89 48061.31  90280.70
## Apr 1994       66186.51 50154.83  82218.19 41668.17  90704.85
## May 1994       66161.22 48133.34  84189.09 38589.96  93732.47
## Jun 1994       67578.60 47716.88  87440.32 37202.72  97954.48
## Jul 1994       70352.01 48777.36  91926.67 37356.42 103347.60
## Aug 1994       71702.28 48508.79  94895.77 36230.89 107173.66
## Sep 1994       73083.34 48346.63  97820.04 35251.81 110914.86
## Oct 1994       74310.72 48093.07 100528.37 34214.28 114407.16
## Nov 1994       83745.93 56099.59 111392.27 41464.50 126027.36
## Dec 1994      113495.30 84464.81 142525.80 69096.99 157893.61
## Jan 1995       70851.51 40475.31 101227.70 24395.13 117307.89
## Feb 1995       72488.35 40800.01 104176.69 24025.21 120951.49
## Mar 1995       78005.63 45034.69 110976.58 27580.93 128430.34
## Apr 1995       75021.14 40793.82 109248.46 22674.97 127367.31
## May 1995       74995.85 39535.58 110456.11 20764.06 129227.64
## Jun 1995       76413.23 39741.10 113085.36 20328.05 132498.42
## Jul 1995       79186.64 41321.70 117051.59 21277.20 137096.09
## Aug 1995       80536.91 41496.45 119577.37 20829.67 140244.15
## Sep 1995       81917.97 41717.77 122118.16 20437.08 143398.86
## Oct 1995       83145.35 41799.89 124490.82 19912.92 146377.79
## Nov 1995       92580.56 50103.11 135058.01 27616.91 157544.22
## Dec 1995      122329.93 78732.75 165927.12 55653.79 189006.07

##Boxcox
Fancy.boxcox <- stlf(fancy, method="rwdrift", robust = TRUE, lambda = BoxCox.lambda(fancy))
autoplot(Fancy.boxcox)

The fancy dataset used here appears to have a  similar trend and seasonally adjusted data as the writing data did, however, this trend if moving in the positive direction.

WEEK 2 HOMEWORK

Dorothy Mensah

7/12/2020