DATA 624 HOMEWORK3

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.

5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy) Bricks (aus_production) NSW Lambs (aus_livestock) Household wealth (hh_budget). Australian takeaway food turnover (aus_retail)

ANSWER SECTION 5.1

For naïve forecasts, we set all forecasts to be the value of the last observation.

In this case snaive, we set each forecast to be equal to the last observed value from the same season so we have used a year lag below.

The drift method allows us to draw a line between the first and last observations, and extrapolate into the future.

#Question 1
auspop <- global_economy %>% filter(Country=='Australia') %>% mutate(Pim = Population / 10^6) %>% select(Pim)
fit <-auspop %>% model(NAIVE(Pim)) 

fit %>% forecast(h="10 years")

## # A fable: 10 x 4 [1Y]
## # Key:     .model [1]
##    .model      Year          Pim .mean
##    <chr>      <dbl>       <dist> <dbl>
##  1 NAIVE(Pim)  2018 N(25, 0.069)  24.6
##  2 NAIVE(Pim)  2019  N(25, 0.14)  24.6
##  3 NAIVE(Pim)  2020  N(25, 0.21)  24.6
##  4 NAIVE(Pim)  2021  N(25, 0.28)  24.6
##  5 NAIVE(Pim)  2022  N(25, 0.35)  24.6
##  6 NAIVE(Pim)  2023  N(25, 0.41)  24.6
##  7 NAIVE(Pim)  2024  N(25, 0.48)  24.6
##  8 NAIVE(Pim)  2025  N(25, 0.55)  24.6
##  9 NAIVE(Pim)  2026  N(25, 0.62)  24.6
## 10 NAIVE(Pim)  2027  N(25, 0.69)  24.6

fit %>% forecast(h="10 years") %>% autoplot(auspop) + labs(y = "Population in millions", title = "Population of Australia")

Aus<- global_economy %>%
  filter(Country=="Australia")

ausfit <- Aus %>%
  model(RW(Population ~ drift())) %>%
  forecast(h=5) %>% autoplot()


#Question 2

bricks <- aus_production %>%
  filter(!is.na(Bricks))

bricksfit <- bricks %>%
  model(SNAIVE(Bricks ~ lag("year"))) %>%
  forecast(h=12)

bricksfit %>% autoplot(bricks)

bricksfit1 <- bricks %>%
  model(RW(Bricks ~ drift())) %>%
  forecast(h=24)

bricksfit1 %>% autoplot(bricks)

#Question 3
nslambs <- aus_livestock %>% filter(Animal=='Lambs',State=='New South Wales') %>% select(Count)
nslambs %>% model(NAIVE(Count)) %>% forecast(h=24) %>% autoplot(nslambs)

nslambs %>% model(SNAIVE(Count ~ lag("year"))) %>% forecast(h=120) %>% autoplot(nslambs)

#Question 4

housew <- hh_budget
house1 <- housew %>% model(RW(Wealth ~ drift())) %>% forecast(h=20)
autoplot(house1)

#Question 5
austake <- aus_retail %>% filter(Industry=='Takeaway food services') %>% select(Turnover)
austake1 <- austake %>% model(NAIVE(Turnover)) %>% forecast(h=15)
autoplot(austake) + autolayer(austake1)

## Plot variable not specified, automatically selected `.vars = Turnover`

5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

#Plot First
fbstock <- gafa_stock %>% filter(Symbol=='FB') %>% mutate(Day = row_number()) %>% update_tsibble(index=Day, regular=TRUE)
fbstock %>% autoplot(Close)

Produce forecasts using the drift method and plot them

#Produce forecasts using the drift method and plot them.
# Re-index based on trading days
fbstock <- gafa_stock |>
  filter(Symbol == "FB", year(Date) >= 2015) |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE)
# Filter the year of interest
FB_2015 <- fbstock |> filter(year(Date) == 2015)
# Fit the models
FB_fit <- FB_2015 |>
  model(
    Drift = NAIVE(Close ~ drift())
  )
# Produce forecasts for the trading days in January 2016
FB_jan_2016 <- fbstock |>
  filter(yearmonth(Date) == yearmonth("2016 Jan"))
FB_fc <- FB_fit |>
  forecast(new_data = FB_jan_2016)
# Plot the forecasts
FB_fc |>
  autoplot(FB_2015, level = NULL) +
  autolayer(FB_jan_2016, Close, colour = "black") +
  labs(y = "$US",
       title = "FB daily closing stock prices",
       subtitle = "(Jan 2015 - Jan 2016)") +
  guides(colour = guide_legend(title = "Forecast"))

Show that the forecasts are identical to extending the line drawn between the first and last observations.

###ANSWER

Here we have updated the example code from the book and then I used values from the adj close column and Day column to identify the x, y xend and yend values to show the line between the 1st value and the last value of the forecasted series.

#Join First & Last Observation
#Values from day and Adj.Close values x = 1, y = 78.45, xend = 252, yend = 104.66"

FB_fc %>%
  autoplot(FB_2015, level = NULL) +
  autolayer(FB_jan_2016, Close, colour = "black") +
  geom_segment(aes(x=1, y = 78.45, xend=252, yend = 104.66, colour = "Join First & Last Observation"), data = FB_2015) +
  labs(y = "$US",
       title = "FB daily closing stock prices",
       subtitle = "(Jan 2015 - Jan 2016)") +
  guides(colour = guide_legend(title = "Forecast"))

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

ANSWER

In the book, the explanation was that for the non-seasonal methods are applied to FB daily closing stock price in 2015, and we used to forecast one month ahead. Because stock prices are not observed every day, we first set up a new time index based on the trading days rather than calendar days. This method seems to have forcasted better since stocks data is not seasonal and doesn’t have any pattern to it so other methods are not that optimal.The SNAIVE is not applicable since it is for projecting seasonal data. NAIVE simple is setting the value to the last value of the data. Mean is just the average of the historical data so te line is seen at the average value of the stock.

# Fit benchmrk models
# Re-index based on trading days
fbstock <- gafa_stock |>
  filter(Symbol == "FB", year(Date) >= 2015) |>
  mutate(day = row_number()) |>
  update_tsibble(index = day, regular = TRUE)
# Filter the year of interest
FB_2015 <- fbstock |> filter(year(Date) == 2015)
# Fit the models
FB_fit2 <- FB_2015 |>
  model(
    Mean = MEAN(Close),
    Naive = NAIVE(Close),
    sNaive = SNAIVE(Close),
    Drift = NAIVE(Close ~ drift())
  )
# Produce forecasts for the trading days in January 2016
FB_jan_2016 <- fbstock |>
  filter(yearmonth(Date) == yearmonth("2016 Jan"))
FB_fc <- FB_fit2 |>
  forecast(new_data = FB_jan_2016)
# Plot the forecasts
FB_fc |>
  autoplot(FB_2015, level = NULL) +
  autolayer(FB_jan_2016, Close, colour = "black") +
  labs(y = "$US",
       title = "FB daily closing stock prices",
       subtitle = "(Jan 2015 - Jan 2016)") +
  guides(colour = guide_legend(title = "Forecast"))

5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

What do you conclude?

ANSWER SECTION 5.3

When we review residuals below, we need to check for 4 key points to understand if the forecast covers these checks.

The mean of the residuals is close to zero and there is no significant correlation in the residuals series. Below there seems to be no correlation on the residual innovation plot, however, there seems to be an outlier on the residual plot but only 1.

The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier (NEGATIVE POINT, PRIOR TO 1995 Q1, beer production dipped below the 400 ), and therefore the residual variance can be treated as constant.

The outlier can also be seen on the histogram of the residuals. The histogram suggests that the residuals while seems to be normal and doesn’t have prominant right or left skewed tails. Consequently, forecasts from SNAIVE model here seems to be better fit ,however looking at the plot the 95% confidence interval calculated may not be as good (this maybe potentially due to the outlier).

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

# Look aT some forecasts
fit |> forecast() |> autoplot(recent_production)

autoplot(recent_production)

## Plot variable not specified, automatically selected `.vars = Beer`

5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

ANSWER SECTION 5.4a

For the Export series, the NAIVE seems to be a better fit since the histogram is normally distributed without significant outlier. Additionally the ACF plot shows not as much of a patter.

The SNAIVE may not be a good fit, there seems to be 2 main outliers on the innovation residual plot and on the ACF plot we can see the outlier. Additional the histogram doesn’t show as normal of a data with a left tailed skew.

auspop <- global_economy %>% filter(Country=='Australia')

# Define and estimate a model
ausexpfit <- auspop %>% model(SNAIVE(Exports~lag(4)))
# Look at the residuals
ausexpfit %>% gg_tsresiduals()

# Look at some forecasts
ausexpfit %>% forecast() %>% autoplot(auspop)

# Define and estimate a model
ausexpfit2 <- auspop %>% model(NAIVE(Exports))
# Look at the residuals
ausexpfit2 %>% gg_tsresiduals()

# Look at some forecasts
ausexpfit2 %>% forecast() %>% autoplot(auspop)

ANSWER SECTION 5.4b

In the case of SNAIVE below, the time plot shows some time of wave like pattern which indicates the variance might not be constance. The histogram also shows non-normal with left skewed data. Thefore in the case of the Bricks dataset, we maybe better off with the NAVIVE method, the ACF plot shows not as much of of a pattern or outliers for this method and the histogram also seems better distribution without left or right skewed data.

# Extract data of interest
bricks2 <- aus_production %>%
  filter(!is.na(Bricks))
# Define and estimate a model
bricksfit2 <- bricks2 %>% model(SNAIVE(Bricks ~ lag("year")))
# Look at the residuals
bricksfit2 %>% gg_tsresiduals()

# Look at some forecasts
bricksfit2 %>% forecast() %>% autoplot(bricks2)

#NAIVE
bricksfit3 <- bricks2 %>% model(NAIVE(Bricks))
# Look at the residuals
bricksfit3 %>% gg_tsresiduals()

# Look at some forecasts
bricksfit3 %>% forecast() %>% autoplot(bricks2)

5.7

ANSWER SECTION 5.7

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

C Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

D. Check the residuals.Do the residuals appear to be uncorrelated and normally distributed?

ANSWER

Yes the residuals appear to be uncorrelated and the residuals seem to have normal distribution with few outliers.

fit %>% gg_tsresiduals()

E. Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

G. Compare the accuracy of your forecasts against the actual values.

Answer

Comparing across the values, it seems that every score like certain score values are smaller on the test data than the Training data. The MAE is slightly higher on the TEST data.

From the book. We can use the MAE when we use a single time series in this case, however, we do show MAE score slightly higher on the test data, therefore there is room for improvement here.Since there are few outliers that are large that will impact the mean of the forecasted data.

“When comparing forecast methods applied to a single time series, or to several time series with the same units, the MAE is popular as it is easy to both understand and compute. A forecast method that minimises the MAE will lead to forecasts of the median, while minimising the RMSE will lead to forecasts of the mean. Consequently, the RMSE is also widely used, despite being more difficult to interpret.”

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New Sou… Furnitu… SNAIV… Trai…  8.41  18.4  13.3  4.83  8.10     1     1 0.667

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… New … Furnitu… Test   66.9  80.9  67.5  16.9  17.1  5.07  4.40 0.902

G. How sensitive are the accuracy measures to the amount of training data used?

ANSWER

In the book it has been stated that accuracy measures are better when we use 4 step forecasts data sampled from prior historical data which can be used in training sets and therefore a rolling origin forecasts can be obtained and average of forecast errors can be minimized. Therefore larger training data and prior historical data can help in minimizing and averaging forcast data and we can use the stretch_tsibble function to take the 4 step forecasts of the test data in that case.

DATA 624 HOMEWORK3

Banu Boopalan

2024-09-22

5.1

ANSWER SECTION 5.1

5.2

ANSWER

5.3

ANSWER SECTION 5.3

5.4

ANSWER SECTION 5.4a

ANSWER SECTION 5.4b

5.7

ANSWER SECTION 5.7

ANSWER

Answer

ANSWER