Assignment3

Homework3

Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book. Please submit your Rpubs link as well as your .pdf file showing your run code.

Exercise 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

First , we will understand the data that we will use.

glimpse(global_economy)

## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…

Now, we will plot the data to visualize it and determinate if it’s seasonal or not.

global_economy |>
  filter(Country == "Australia") |>
  autoplot(Population) +
  labs(title = "Australian Population Over Time",
       y = "Population",
       x = "Year")

The data is not seasonal, so now we can rule out one of the three methods. With that information we have two options left NAIVE(y) or RW(y ~ drift()). To choose between these two options we will look if it has a trend, in this case the data is increasing. Given that information we will use RW(y ~ drift()). The term drift suggests a consistent upward or downward trend over time.

we will create a new dataset aust_population with only the data we need (Year and Population).
Then we convert the data frame to a tsibble object, specifying Year as the index. We are telling R that the aust_population data is a time series object (tsibble), and that the Year column should be used as the time index.
We create the forecasting model using the RW(y ~ drift()) method.
We generate forecasts for the next 10 years, the result is saved in an object fc.
Finally, we plot the original data along with the forecasts using autoplot().

aust_population <- global_economy |>
  filter(Country == "Australia") |>
  select(Year, Population)

aust_population <- aust_population |>
  as_tsibble(index = Year)

fit <- aust_population |>
  model(RW = RW(Population ~ drift()))

fc <- fit |>
  forecast(h = "10 years")

fc |>
  autoplot(aust_population) +
  labs(title = "Australian Population Forecast",
       y = "Population",
       x = "Year")

In the plot above, the blue line represents the historical population data for Australia, while the shaded area indicates the forecasted population for the next 10 years. The forecast suggests a continued upward trend in Australia’s population.

Bricks (aus_production)

First , we will understand the data that we will use.

glimpse(aus_production)

## Rows: 218
## Columns: 7
## $ Quarter     <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957…
## $ Beer        <dbl> 284, 213, 227, 308, 262, 228, 236, 320, 272, 233, 237, 313…
## $ Tobacco     <dbl> 5225, 5178, 5297, 5681, 5577, 5651, 5317, 6152, 5758, 5641…
## $ Bricks      <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234…
## $ Cement      <dbl> 465, 532, 561, 570, 529, 604, 603, 582, 554, 620, 646, 637…
## $ Electricity <dbl> 3923, 4436, 4806, 4418, 4339, 4811, 5259, 4735, 4608, 5196…
## $ Gas         <dbl> 5, 6, 7, 6, 5, 7, 7, 6, 5, 7, 8, 6, 5, 7, 8, 6, 6, 8, 8, 7…

Now, we will plot the data to visualize it and determinate if it’s seasonal or not.

aus_production |>
  autoplot(Bricks) +
  labs(title = "Australian Bricks Production Over Time",
       y = "Bricks Production",
       x = "Year")

The data is seasonal, so now we can rule out two of the three methods. With that information we have one option left SNAIVE(y). The S in SNAIVE stands for “seasonal,” indicating that this method is specifically designed to handle seasonal patterns in time series data.

we will create a new dataset bricks_population with only the data we need (Quarter and Bricks).
Then we convert the data frame to a tsibble object, specifying Quarter as the index. We are telling R that the bricks_population data is a time series object (tsibble), and that the Quarter column should be used as the time index.
We create the forecasting model using the SNAIVE(y) method.We generate forecasts for the next 2 years, the result is saved in an object fc.
Finally, we plot the original data along with the forecasts using autoplot().

library(fpp3)

# Select Bricks series
bricks_population <- aus_production |> 
  select(Quarter, Bricks) |> 
  filter(!is.na(Bricks))

fc_bricks <- bricks_population |> 
  model(
    snaive = SNAIVE(Bricks),
  ) |> 
  forecast(h = 8)  

autoplot(fc_bricks, bricks_population) +
  labs(title = "Australian Bricks Production Forecast",
       y = "Bricks Production",
       x = "Year")

In the plot above, the blue line represents the historical bricks production data for Australia, while the shaded area indicates the forecasted bricks production for the next 2 years. The forecast suggests a continued seasonal pattern in Australia’s bricks production.

NSW Lambs (aus_livestock)

First , we will understand the data that we will use.

glimpse(aus_livestock)

## Rows: 29,364
## Columns: 4
## Key: Animal, State [54]
## $ Month  <mth> 1976 Jul, 1976 Aug, 1976 Sep, 1976 Oct, 1976 Nov, 1976 Dec, 197…
## $ Animal <fct> "Bulls, bullocks and steers", "Bulls, bullocks and steers", "Bu…
## $ State  <fct> Australian Capital Territory, Australian Capital Territory, Aus…
## $ Count  <dbl> 2300, 2100, 2100, 1900, 2100, 1800, 1800, 1900, 2700, 2300, 250…

Now, we will plot the data to visualize it and determinate if it’s seasonal or not.

aus_livestock |>
  filter(State == "New South Wales", Animal == "Lambs")|>
  autoplot(Count) +
  labs(title = "NSW Lambs Over Time",
       y = "Count",
       x = "Year")

The data is seasonal, so now we can rule out two of the three methods. With that information we have one option left SNAIVE(y).

we will create a new dataset NSW_lambs with only the data we need .
We create the forecasting model using the SNAIVE(y) method. We generate forecasts for the next 10 years, the result is saved in an object fc_lambs.
Finally, we plot the original data along with the forecasts using autoplot().

NSW_lambs <- aus_livestock |>
  filter(State == "New South Wales", Animal == "Lambs")|>
  select(Count  , Animal)

fit_lambs <- NSW_lambs |>
  model(SNAIVE = SNAIVE(Count))

fc_lambs <- fit_lambs |>
  forecast(h = "10 years")

fc_lambs |>
  autoplot(NSW_lambs) +
  labs(title = "NSW Lambs Forecast",
       y = "Count",
       x = "Year")

In the plot above, the blue line represents the historical count data for NSW Lambs, while the shaded area indicates the forecasted count for the next 10 years. The forecast suggests a continued seasonal pattern in NSW Lambs.

Household wealth (hh_budget). First , we will understand the data that we will use.

glimpse(hh_budget)

## Rows: 88
## Columns: 8
## Key: Country [4]
## $ Country      <chr> "Australia", "Australia", "Australia", "Australia", "Aust…
## $ Year         <dbl> 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 200…
## $ Debt         <dbl> 95.68999, 99.53078, 107.54020, 114.63320, 121.09980, 126.…
## $ DI           <dbl> 3.7195453, 3.9844784, 2.5163448, 4.0237543, 3.8401975, 3.…
## $ Expenditure  <dbl> 3.4043113, 2.9717413, 4.9491246, 5.7315408, 4.2578288, 3.…
## $ Savings      <dbl> 5.2389216, 6.4716693, 3.7399359, 1.2875994, 0.6377422, 1.…
## $ Wealth       <dbl> 314.9344, 314.5559, 323.2357, 339.3139, 354.4382, 350.279…
## $ Unemployment <dbl> 8.472281, 8.506114, 8.362488, 7.677429, 6.873791, 6.28554…

Now, we will plot the data to visualize it and determinate if it’s seasonal or not.

hh_budget |>
  autoplot(Wealth) +
  labs(title = "Household Wealth Over Time",
       y = "Wealth",
       x = "Year")

we will create a new dataset hh_wealth with only the data we need (Year and Wealth).
We create the forecasting model using the RW(y ~ drift()) method. We generate forecasts for the next 10 years, the result is saved in an object fc_wealth.
Finally, we plot the original data along with the forecasts using autoplot().

hh_wealth <- hh_budget |>
  select(Year, Wealth)

fit_wealth <- hh_wealth |>
  model(RW = RW(Wealth ~ drift()))

fc_wealth <- fit_wealth |>
  forecast(h = "10 years")

fc_wealth |>
  autoplot(hh_wealth) +
  labs(title = "Household Wealth Forecast",
       y = "Wealth",
       x = "Year")

In the plot above, the blue line represents the historical household wealth data, while the shaded area indicates the forecasted wealth for the next 10 years. The forecast suggests a continued upward trend in household wealth.

Australian takeaway food turnover (aus_retail).

First , we will understand the data that we will use.

glimpse(aus_retail)

## Rows: 64,532
## Columns: 5
## Key: State, Industry [152]
## $ State       <chr> "Australian Capital Territory", "Australian Capital Territ…
## $ Industry    <chr> "Cafes, restaurants and catering services", "Cafes, restau…
## $ `Series ID` <chr> "A3349849A", "A3349849A", "A3349849A", "A3349849A", "A3349…
## $ Month       <mth> 1982 Apr, 1982 May, 1982 Jun, 1982 Jul, 1982 Aug, 1982 Sep…
## $ Turnover    <dbl> 4.4, 3.4, 3.6, 4.0, 3.6, 4.2, 4.8, 5.4, 6.9, 3.8, 4.2, 4.0…

now, we will plot the data to visualize it and determinate if it’s seasonal or not.

aus_retail |>
  filter(Industry == "Takeaway food services" ,State == "Australian Capital Territory") |>
  autoplot(Turnover) +
  labs(title = "Australian Takeaway Food Turnover Over Time",
       y = "Turnover",
       x = "Year")

The data is seasonal, so now we can rule out two of the three methods. With that information we have one option left SNAIVE(y).

we will create a new dataset takeaway with only the data we need (Month and Turnover).
We create the forecasting model using the SNAIVE(y) method. We generate forecasts for the next 12 months, the result is saved in an object fc_takeaway.
Finally, we plot the original data along with the forecasts using autoplot().

takeaway <- aus_retail |>
  filter(Industry == "Takeaway food services",State == "Australian Capital Territory")


fc_takeaway <- takeaway |>
  model(snaive = SNAIVE(Turnover)) |>
  forecast(h = 12)

autoplot(fc_takeaway, takeaway) +
  labs(title = "Australian Takeaway Food Turnover Forecast",
       y = "Turnover",
       x = "Year")

In the plot, we got 80% and 95% confidence intervals for the forecasted turnover values. The shaded areas represent the uncertainty around the forecasts, with the darker area indicating the 80% confidence interval.

Exercise 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

We will first understand the data that we will use.

glimpse(gafa_stock)

## Rows: 5,032
## Columns: 8
## Key: Symbol [4]
## $ Symbol    <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAP…
## $ Date      <date> 2014-01-02, 2014-01-03, 2014-01-06, 2014-01-07, 2014-01-08,…
## $ Open      <dbl> 79.38286, 78.98000, 76.77857, 77.76000, 76.97285, 78.11429, …
## $ High      <dbl> 79.57571, 79.10000, 78.11429, 77.99429, 77.93714, 78.12286, …
## $ Low       <dbl> 78.86000, 77.20428, 76.22857, 76.84571, 76.95571, 76.47857, …
## $ Close     <dbl> 79.01857, 77.28286, 77.70428, 77.14857, 77.63715, 76.64571, …
## $ Adj_Close <dbl> 66.96433, 65.49342, 65.85053, 65.37959, 65.79363, 64.95345, …
## $ Volume    <dbl> 58671200, 98116900, 103152700, 79302300, 64632400, 69787200,…

Produce a time plot of the series.

gafa_stock |>
  filter(Symbol == "FB") |>
  autoplot(Close) +
  labs(title = "Facebook Stock Price Over Time",
       y = "Closing Price",
       x = "Date")

From the plot we can see that the Facebook stock price has an upward trend over time, with some fluctuations.It’s not seasonal, it has an upward trend, but the fluctuations around that trend are random, not a repeating annual pattern.

Produce forecasts using the drift method and plot them.

fb_stock_df <- gafa_stock |>
  filter(Symbol == "FB") |>
  select(Date, Close) |>
  index_by(Month = yearmonth(Date)) |>
  summarise(Close = last(Close)) 

fc_fb_stock <- fb_stock_df |>
  model(drift = RW(Close ~ drift())) |>
  forecast(h = 12)   

autoplot(fc_fb_stock, fb_stock_df) +
  labs(title = "Facebook Stock Price Forecast",
       y = "Price", x = "Month")

The forecast suggests that the Facebook stock price will continue to increase over the next 12 months, following the established trend.

Show that the forecasts are identical to extending the line drawn between the first and last observations.

In order to extend the line drawn between the first and last observations for closing price, we will create a new data frame fb_line that contains the fitted line values. We will then plot the original data along with the fitted line to visualize the comparison.

fb_line <- fb_stock_df |>
  mutate(
    t = row_number(),
    fitted_line = first(Close) + (t - 1) * (last(Close) - first(Close)) / (n() - 1)
  )

autoplot(fb_stock_df, Close) +
  autolayer(fb_line, fitted_line, linewidth = 0.8) +
  labs(title = "Observed series with first to last straight line")

The plot shows the original Facebook stock price data along with the fitted line drawn between the first and last observations. The fitted line closely follows the trend of the original data, indicating that the drift method effectively captures the overall trend in the stock price.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fits <- fb_stock_df |>
  model(
    naive  = NAIVE(Close),
    drift  = RW(Close ~ drift()),
    snaive = SNAIVE(Close)   
  )

fc <- fits |> forecast(h = 12)

autoplot(fc, fb_stock_df) +
  labs(title = "Benchmark forecasts — Facebook Stock Close Price")

The plot shows the forecasted Facebook stock prices using three different benchmark methods: NAIVE, RW with drift, and SNAIVE. Each method produces a different forecast trajectory for the next 12 months.

While the NAIVE method produces a flat forecast that does not account for the trend. The SNAIVE method is not suitable in this case since the data does not exhibit seasonal patterns. Therefore, RW with drift is preferred as it aligns better with the characteristics of the data.

Exercise 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

What do you conclude?

The residuals have white noise because the data is randomly distributed around zero, with no discernible patterns or trends. The ACF plot shows that the autocorrelations are within the confidence bounds, indicating that there is no significant correlation between the residuals at different lags. The histogram of the residuals appears to be approximately normally distributed, which is another characteristic of white noise.

The forecast captures the seasonal pattern in the data, with peaks and troughs occurring at regular intervals. The shaded area represents the uncertainty around the forecasts, with wider intervals indicating greater uncertainty.

Exercise 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Australian Exports series from global_economy

exports_data <- global_economy |>
  filter(Country == "Australia") |>
  select(Year, Exports) |>
  as_tsibble(index = Year)

exports_fit <- exports_data |> model(NAIVE(Exports))
exports_fit |> gg_tsresiduals()

exports_fit |> forecast() |> autoplot(exports_data) +
  labs(title = "Australian Exports Forecast",
       y = "Exports",
       x = "Year")

The residuals have white noise because the data is randomly distributed around zero, with no discernible patterns or trends. The ACF plot shows that the auto correlations are within the confidence bounds, indicating that there is no significant correlation between the residuals at different lags. The histogram of the residuals appears to be approximately normally distributed, which is another characteristic of white noise.

Bricks series from aus_production

bricks_data <- aus_production |>
  select(Quarter, Bricks) |>
  filter(!is.na(Bricks))

bricks_fit <- bricks_data |> model(SNAIVE(Bricks))

bricks_fit |> gg_tsresiduals()

bricks_fit |> forecast() |> autoplot(bricks_data) +
  labs(title = "Australian Bricks Production Forecast",
       y = "Bricks Production",
       x = "Year")

The residuals show a clear pattern, indicating they are not white noise. The ACF has significant spikes at seasonal lags and the time plot reveals persistent runs of positive and negative errors. This suggests the seasonal NAIVE model does not fully capture the trend and seasonality in bricks production, so a more advanced model would be needed.

Exercise 5.7

For your retail time series (from Exercise 7 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

myseries <- aus_retail |>
  filter(Industry == "Cafes, restaurants and catering services",State == "Australian Capital Territory")

myseries_train <- myseries |>
  filter(year(Month) < 2011) |>
  filter(!is.na(Turnover))

b.Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

In the plot, the red line represents the training data (before 2011), while the blue line represents the entire dataset. This visual confirmation ensures that the training data has been correctly separated from the test data.

c.Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE(Turnover))

We added the missing values “Turnover” in the SNAIVE() function, which is necessary to specify the variable we want to model.

d.Check the residuals.

fit |> gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear to be uncorrelated and normally distributed. The time plot shows that the residuals are randomly scattered around zero, with no discernible patterns or trends. The ACF plot indicates that the auto correlations are within the confidence bounds, suggesting no significant correlation between the residuals at different lags. The histogram of the residuals appears to be approximately normally distributed.

e.Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

The forecasts successfully reproduce the seasonal pattern in the series, with regular peaks and troughs aligned with past observations. The shaded prediction intervals illustrate forecast uncertainty, which widens as the horizon extends, reflecting increasing variability further into the future.

f.Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()

## # A tibble: 1 × 12
##   State    Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral… Cafes, … SNAIV… Trai… 0.985  3.37  2.53  5.05  16.1     1     1 0.826

fc |> accuracy(myseries)

## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… Aust… Cafes, … Test   5.86  8.42  7.38  13.4  19.0  2.92  2.50 0.847

For the training set with ME =0.99 and RMSE =3.37 , the errors are relatively small because the model is fitted on the same data.
For the test set with ME = 5.86 , RMSE = 8.41 and MAE = 7.38 , the forecast errors are clearly larger once we move beyond the training window.

How sensitive are the accuracy measures to the amount of training data used?

Forecast accuracy is sensitive to the amount of training data used. More training data helps capture seasonality and trends, while too little data leads to less reliable forecasts.

Assignment3

Alinzon Simon

2025-09-21

Homework3

Exercise 5.1

Exercise 5.2

Exercise 5.3

Exercise 5.4

Exercise 5.7