Data 624 HW 3

5.1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

- Australian Population (global_economy)

global_economy %>% 
  filter(Country == "Australia") %>%
  model(RW(Population ~ drift())) %>%
  forecast(h = 14) %>%
  autoplot(global_economy) +
  labs(title = "Australia Population",
       subtitle = "1960 - 2017, Forecasted until 2022")

The most appropriate method for forecasting the Australian Population is the Random Walk with Drift, as it captures the strong upward trend over time. The forecast continues this trend, reflecting ongoing population growth due to factors like natural increases and immigration. The confidence intervals indicate increasing uncertainty as we look further into the future.

- Bricks (aus_production)

aus_production %>% 
  filter(!is.na(Bricks)) %>%
  model(SNAIVE(Bricks)) %>%
  forecast(h = 14) %>%
  autoplot(aus_production) +
  labs(title = "Australian Bricks Production",
       subtitle = "1956 - 2005 Q2, Forecasted until 2008 Q4")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

For the Australian Bricks production data, the Seasonal Naïve (SNAIVE) method is the most appropriate. This method is effective because brick production exhibits clear seasonal patterns, as seen in the regular fluctuations over time.

The forecast continues these seasonal patterns into the future, assuming that past seasonality repeats. The confidence intervals widen slightly, indicating increasing uncertainty, but they still reflect the seasonal nature of the production data.

- NSW Lambs (aus_livestock)

aus_livestock %>%
  filter(State == "New South Wales", 
         Animal == "Lambs") %>%
  model(NAIVE(Count)) %>%
  forecast(h = 14) %>%
  autoplot(aus_livestock) +
  labs(title = "Lambs in New South Wales",
       subtitle = "July 1976 - Dec 2018, Forecasted until Dec 2020")

For the NSW Lambs data, the Seasonal Naïve (SNAIVE) method is appropriate due to the clear seasonal patterns in lamb counts. This method projects the seasonal trends observed in the past into the future, capturing the cyclical nature of lamb production driven by farming cycles and market demand. The forecast’s confidence intervals widen over time, reflecting increasing uncertainty while still emphasizing the expected continuation of past seasonal behavior.

- Household wealth (hh_budget)

hh_budget %>%
  model(RW(Wealth ~ drift())) %>%
  forecast(h = 14) %>%
  autoplot(hh_budget) +
  labs(title = "Household Wealth",
       subtitle = "1996 - Dec 2016, Forecasted until 2021")

For the Household Wealth data, the Random Walk with Drift (RW with drift) method is the most suitable choice. This method captures the overall upward trend in household wealth observed across countries such as Australia, Canada, Japan, and the USA.

The forecast extends this trend into the future, assuming that the factors driving wealth growth, like income and investments, will continue. The widening confidence intervals highlight increasing uncertainty, reflecting the possibility of economic changes that could influence future wealth levels. This approach effectively projects the observed trend while acknowledging the inherent unpredictability of long-term financial growth.

- Australian takeaway food turnover (aus_retail)

aus_retail %>%
  filter(Industry == "Cafes, restaurants and takeaway food services") %>%
  model(RW(Turnover ~ drift())) %>%
  forecast(h = 14) %>%
  autoplot(aus_retail) +
  labs(title = "Australian Takeaway Food Turnover",
       subtitle = "Apr 1982 - Dec 2018, Forecasted until Dec 2021") +
  facet_wrap(~State, scales = "free")

For the Australian Takeaway Food Turnover data, the Seasonal Naïve (SNAIVE) method is the most suitable due to the clear seasonal trends observed across different states and territories. The data shows recurring patterns in turnover, reflecting consumer behavior influenced by holidays, seasons, and other periodic events.

The forecast extends these seasonal patterns into the future, assuming that past seasonal cycles will repeat. This method captures the predictable nature of takeaway food demand across regions. The forecast confidence intervals widen slightly, indicating increased uncertainty but still reflecting the consistent, seasonal nature of the turnover trends.

5.2. Use the Facebook stock price (data set gafa_stock) to do the following:

a. Produce a time plot of the series.

fb_stock <- gafa_stock %>%
  filter(Symbol == "FB") %>%
  mutate(day = row_number()) %>%
  update_tsibble(index = day, regular = TRUE)

fb_stock%>%
  autoplot(Open) +
  labs(title= "Daily Open Price of Facebook", y = "USD")

This shows the daily open price of Facebook stock over time. The plot highlights a clear upward trend in the stock price, followed by a decline. The trend indicates that the stock price has experienced both periods of growth and declines, which are typical for financial markets driven by investor behavior, market conditions, and external events.

b. Produce forecasts using the drift method and plot them.

fb_stock %>%
  model(RW(Open ~ drift())) %>%
  forecast(h = 14) %>%
  autoplot(fb_stock) +
  labs(title = "Daily Open Price of Facebook", y = "USD")

This shows the forecast for the Facebook stock price using the Random Walk with Drift (RW with drift) method. This method extends the trend observed in the data by projecting future values based on the overall average change. The resulting forecast captures the general direction of the stock price movement, which includes both historical increases and recent declines. The forecasted values are accompanied by 80% and 95% confidence intervals, which represent increasing uncertainty as time progresses. The forecast reflects the trend observed in the past and anticipates a continued but uncertain trajectory for the stock price.

c. Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_stock %>%
  model(RW(Open ~ drift())) %>%
  forecast(h = 14) %>%
  autoplot(fb_stock) +
  labs(title = "Daily Open Price of Facebook", y = "USD") +
  geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45),
               colour = "blue", linetype = "dashed")

## Warning in geom_segment(aes(x = 1, y = 54.83, xend = 1258, yend = 134.45), : All aesthetics have length 1, but the data has 1258 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

This shows the Facebook stock price forecast using the Random Walk with Drift method along with a dashed line representing the trend line drawn between the first and last observations in the historical data. The forecast is identical to extending this trend line, confirming that the drift method projects the stock price by extrapolating the average rate of change observed over the entire historical period. This method effectively continues the trend observed in the past, and the forecast aligns perfectly with the extended trend line. This demonstrates that the Random Walk with Drift approach captures the average upward trend while allowing for a measure of uncertainty in future prices.

d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_stock %>%
  model(Mean = MEAN(Open),
        `Naïve` = NAIVE(Open),
        Drift = NAIVE(Open ~ drift())) %>%
  forecast(h = 63) %>%
  autoplot(fb_stock, level = NULL) +
  labs(title = "Daily Open Price of Facebook", y = "USD")

Among the three benchmark methods—Drift, Mean, and Naïve—the Drift Method is the most effective for forecasting the Facebook stock price. It captures the overall trend in the data, making it more suitable for stock prices that exhibit a directional pattern over time. In contrast, the Mean and Naïve methods do not account for trends, resulting in less accurate forecasts.

5.3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts.

# Extract data of interest
recent_production <- aus_production %>%
  filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Australian Beer Production")

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look at some forecasts
fit %>% forecast() %>% autoplot(recent_production)+
  ggtitle("Australian Beer Production")

#Box-Pierce test, ℓ=2m for seasonal data, m=4
fit %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 8, dof = 0)

ABCDEFGHIJ0123456789

.model <chr>	bp_stat <dbl>	bp_pvalue <dbl>
SNAIVE(Beer)	29.74759	0.000234228

#Ljung-Box test
fit %>%
  augment()%>% features(.innov, ljung_box, lag = 8, dof = 0)

ABCDEFGHIJ0123456789

.model <chr>	lb_stat <dbl>	lb_pvalue <dbl>
SNAIVE(Beer)	32.26893	8.335611e-05

The seasonal naïve (SNAIVE) method applied to the quarterly Australian beer production data from 1992 shows that while the model captures seasonal patterns, the residuals do not fully resemble white noise. The residual analysis reveals periodic patterns and significant autocorrelation, as confirmed by statistical tests (Breusch-Pagan and Ljung-Box). This suggests that a more complex model might provide a better fit.

5.4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

# Extract data of interest
aus_exports <- global_economy %>%
  filter(Country == "Australia")

# Define and estimate a model
fit <- aus_exports %>% model(NAIVE(Exports))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Australian Exports")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).

# Look at some forecasts
fit %>% forecast() %>% autoplot(aus_exports) +
  ggtitle("Annual Australian Exports")

#Box-Pierce test, ℓ=10 for non-seasonal data
fit %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 10, dof = 0)

ABCDEFGHIJ0123456789

Country <fct>	.model <chr>	bp_stat <dbl>	bp_pvalue <dbl>
Australia	NAIVE(Exports)	14.58068	0.1481135

#Ljung-Box test
fit %>%
  augment()%>% features(.innov, ljung_box, lag = 10, dof = 0)

ABCDEFGHIJ0123456789

Country <fct>	.model <chr>	lb_stat <dbl>	lb_pvalue <dbl>
Australia	NAIVE(Exports)	16.3655	0.08963678

For the Australian Exports series, the NAIVE method was used, and residuals analysis showed no significant issues, indicating that the method is appropriate. For the Bricks series, the SNAIVE method was chosen due to clear seasonality. Although the model captures the seasonal patterns, residual analysis suggests that there might be room for improvement with a more complex model.

# Define and estimate a model
fit <- aus_production %>% 
  filter(!is.na(Bricks)) %>% 
  model(SNAIVE(Bricks))

# Look at the residuals
fit %>% gg_tsresiduals() +
  ggtitle("Residual Plots for Australian Production of Bricks")

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).

# Look at some forecasts
fit %>% forecast() %>% autoplot(aus_production) +
  ggtitle("Australian Production of Bricks")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

#Box-Pierce test, ℓ=2m for seasonal data, m=4
fit %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 8, dof = 0)

ABCDEFGHIJ0123456789

.model <chr>	bp_stat <dbl>	bp_pvalue <dbl>
SNAIVE(Bricks)	267.0146	0

#Ljung-Box test
fit %>%
  augment()%>% features(.innov, ljung_box, lag = 8, dof = 0)

ABCDEFGHIJ0123456789

.model <chr>	lb_stat <dbl>	lb_pvalue <dbl>
SNAIVE(Bricks)	274.1714	0

The SNAIVE method was used for the Australian bricks production data to capture its seasonal patterns. While the model effectively reflects seasonality, residual analysis and statistical tests reveal significant autocorrelation and heteroskedasticity. This suggests that a more complex model may be needed for better accuracy.

5.7. For your retail time series (from Exercise 7 in Section 2.10):

a. Create a training dataset consisting of observations before 2011 using:

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

b. Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

The plot shows that the data has been split appropriately into training and test sets. The training data (in red) covers the period before 2011, while the test data (in black) represents observations from 2011 onwards. This division enables an effective evaluation of the forecasting model’s performance on unseen data, ensuring the results are reliable and not biased by the training process. This analysis was completed independently.

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train %>%
  model(SNAIVE(Turnover))

d. Check the residuals.

fit %>% gg_tsresiduals()  +
  ggtitle("Residual Plots for Australian Retail Turnover")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

Do the residuals appear to be uncorrelated and normally distributed?

No, the residuals do not appear to be uncorrelated and normally distributed.The residual plot shows visible patterns, and the ACF plot indicates significant autocorrelation at various lags. Additionally, while the histogram is roughly symmetric, it shows slight skewness. These observations suggest that the residuals are not purely random, indicating that the model has not fully captured all systematic variations in the data.

e. Produce forecasts for the test data.

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc %>% autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

ABCDEFGHIJ0123456789

State <chr>	Industry <chr>	.model <chr>	.type <chr>
Tasmania	Cafes, restaurants and takeaway food services	SNAIVE(Turnover)	Training

fc %>% accuracy(myseries)

ABCDEFGHIJ0123456789

.model <chr>	State <chr>	Industry <chr>	.type <chr>	ME <dbl>
SNAIVE(Turnover)	Tasmania	Cafes, restaurants and takeaway food services	Test	7.11875

g. How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures appear to be sensitive to the amount of training data used. The residual analysis showed autocorrelation and patterns, indicating that the model has not fully captured the underlying structure in the data. Additionally, the forecast plot shows a wider range of uncertainty, which suggests that using less training data can increase forecast variability and reduce overall accuracy. In general, more training data helps the model better capture trends and seasonality, leading to more reliable and accurate forecasts. However, if too much historical data is included, it could also introduce outdated patterns that may no longer be relevant. Therefore, finding the right balance is key.

Data 624 HW 3

Shamecca Marshall

2/20/25

5.1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

- Australian Population (global_economy)

- Bricks (aus_production)

- NSW Lambs (aus_livestock)

- Household wealth (hh_budget)

- Australian takeaway food turnover (aus_retail)

5.2. Use the Facebook stock price (data set gafa_stock) to do the following:

a. Produce a time plot of the series.

b. Produce forecasts using the drift method and plot them.

c. Show that the forecasts are identical to extending the line drawn between the first and last observations.

d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

5.3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts.

5.4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

5.7. For your retail time series (from Exercise 7 in Section 2.10):

a. Create a training dataset consisting of observations before 2011 using:

b. Check that your data have been split appropriately by producing the following plot.

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

d. Check the residuals.

e. Produce forecasts for the test data.

f. Compare the accuracy of your forecasts against the actual values.

g. How sensitive are the accuracy measures to the amount of training data used?