library(fpp3)
library(tsibble)
library(cowplot)

5.1

  For the first data set, global_economy, we use the naive and drift and naive method and ignore SNaive as this data set is recorded in years. It would appear that the drift method gives the most accurate prediction in population growth.
 aus <- global_economy %>% filter(Country=="Australia") 



aus %>%   model(`Naïve`= NAIVE(Population), drift =RW(Population ~ drift())   )  %>%  
  forecast(h =10)  %>% autoplot(aus,level = NULL)

  Here we can assume either drift or seasonal naive would lead to the most accurate results.
aus_production  %>% filter(complete.cases(Bricks)) %>%
  model(
   
    `Naïve` = NAIVE(Bricks),
    `Seasonal naïve` = SNAIVE(Bricks),
    Drift = RW(Bricks ~ drift())
  )  %>%   forecast(h =14)  %>% autoplot(aus_production,level = NULL)

  Here it would appear seasonal naive works best.
aus_livestock %>% filter(Animal=="Lambs" & State =="New South Wales" ) %>% 
  model(
    `Naïve` = NAIVE(Count),
    `Seasonal naïve` = SNAIVE(Count),
    Drift = RW(Count ~ drift())
  )  %>%   forecast(h =14)  %>% autoplot(aus_livestock,level = NULL)

  Drift would appear to lead to the most accurate result here, wealth will likely not level off.
hh_budget %>% model(
    `Naïve` = NAIVE(Wealth),
    Drift = RW(Wealth ~ drift())
  )  %>%   forecast(h =14)  %>% autoplot(hh_budget,level = NULL)

  Seasonal naive is likely the most accurate here, follows the previous trend very well.
aus_adjust <- aus_retail  %>% filter(Industry=="Takeaway food services") %>% summarise(`Average Turnover` = mean(Turnover)) 
aus_adjust %>%
  model(
    `Naïve` = NAIVE(`Average Turnover`),
     `Seasonal naïve` = SNAIVE(`Average Turnover`),
    Drift = RW(`Average Turnover` ~ drift())
  )  %>%   forecast(h =14)  %>% autoplot(aus_adjust,level = NULL)

5.2

  We will use a similar approach to the one used in the textbook example shown in section 5.2. As stated there, because stock prices are not observed every day, we first set up a new time index based on the trading days rather than calendar days. We will use 2016 here.

a.

fb_stock <-  gafa_stock %>%
  filter(Symbol == "FB", year(Date) >= 2016) %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)



fb_stock %>% autoplot(Open)

b.

fb_2016 <- fb_stock %>% filter(year(Date) == 2016)

fb_fit <- fb_2016 %>% 
  model( 
    Drift = RW(Open ~ drift())) 


fb_fit %>%
  forecast(h=20)   %>%
  autoplot(fb_2016, level = NULL) +

  labs(y = "$US",
       title = "Facebook daily opening stock prices",
       subtitle = "(Jan 2016 - Jan 2017)") 

c

  As we can see, by adding a line between the first points of both the original data and the forecast data, we can produce a completely straight line with the drift forecast.
forcast <- fb_fit %>%
  forecast(h=20)   
  


forcast %>% 
    autoplot(fb_2016, level = NULL) +
  labs(y = "$US",
       title = "Facebook daily opening stock prices",
       subtitle = "(Jan 2016 - Jan 2017)")  + geom_segment(aes(x=first(day),y=first(Open ), xend=  first(forcast$day), yend= first(forcast$.mean ) ))

d. 

  The red data below is the actual trend that occurred. It would appear that the drift method most closely matches the actual opening prices. Both the mean and naive method under predicted the actual opening prices. That would likely stem from the unpredictable nature of stock prices, and the sudden growth in opening price price that the mean and naive method did not capture.
fb_fit <- fb_2016 %>% 
  model( Mean = MEAN(Open),
           `Naïve` = NAIVE(Open),
    Drift = NAIVE(Open ~ drift())) 


fb_jan_2017 <- fb_stock %>%
  filter(yearmonth(Date) == yearmonth("2017 Jan"))


fb_fc <- fb_fit %>%
  forecast(new_data = fb_jan_2017)

fb_fc %>%
  autoplot(fb_2016, level = NULL) +
  autolayer(fb_jan_2017,Open,color="red") +
  labs(y = "$US",
       title = "Facebook daily opening stock prices",
       subtitle = "(Jan 2016 - Jan 2017)") +
  guides(colour = guide_legend(title = "Forecast"))

5.3

  Our residuals show a clear and healthy pattern. The acf graphic shows the mean of our residuals to be closely around zero. Our histogram appears to be roughly normal as well. It would appear we can trust our forecast to not be too far off what we should expect. Furthering testing using the Ljung-Box test and Box-Pierce test show that our residuals are distinguishable from white noise.
# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

fit %>% 
  augment() %>%
  features(.innov, box_pierce, lag = 5, dof = 0)
  fit %>%
   augment() %>% 
    features(.innov, ljung_box, lag = 5, dof = 0)

5.4

  With the global_economy data being coded yearly, NAIVE() is the best approach as we have yearly data. We see roughly the same results as the previous question, with a clean set of residuals and a very low p value for our tests. It would appear that was can assume that our prediction is normally distributed and we can possibly rely on the results of this forecast.
# Extract data of interest
aus_exports <- global_economy  %>% filter(Country=="Australia") 

fit_aus <- aus_exports %>% model(
           `Naïve` = NAIVE(Exports)
  )
# Look at the residuals
fit_aus %>% gg_tsresiduals()

# Look a some forecasts
fit_aus %>% forecast() %>% autoplot(aus_exports)

fit %>% 
  augment() %>%
  features(.innov, box_pierce, lag = 10, dof = 0)
  fit %>%
   augment() %>% 
    features(.innov, ljung_box, lag = 10, dof = 0)
  For aus_production, we have quarterly data, so we want to represent that seasonality with SNAIVE(). We see some weird patterns in our residuals and it is clear that the histogram is not normally distributed. It appears to pass both tests howevver, so we can conclude that our model is distinguishable from white noise.
brick_au <- aus_production %>% filter(complete.cases(Bricks)) 
fit_brick <- brick_au %>% model(SNAIVE(Bricks))
# Look at the residuals
fit_brick %>% gg_tsresiduals()

# Look a some forecasts
fit_brick %>% forecast() %>% autoplot(brick_au)

fit %>% 
  augment() %>%
  features(.innov, box_pierce, lag = 5, dof = 0)
  fit %>%
   augment() %>% 
    features(.innov, ljung_box, lag = 5, dof = 0)

5.7

a.

set.seed(1221)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c.

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

d.

Residuals appear normally distributed and uncorrelated.
fit %>% gg_tsresiduals()

e.

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))
fc %>% autoplot(myseries)

f.

fit %>% accuracy()
fc %>% accuracy(myseries)

g.

  It would appear that the smaller set of more recent data produced a far more accurate model than the sample of data we originally created for the earlier homework. By keeping the data smaller, it reduice the noise from earlier years which likely led it to be far more accurate.