Do exercises 5.1, 5.2, 5.3, 5.4 and 5.7 in the Hyndman book.
Please submit your Rpubs link as well as your .pdf file showing your run
code.
library("fpp3")
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Question 5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:
Australian Population (global_economy)
data("global_economy")
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # ℹ 15,140 more rows
help("global_economy")
aus = global_economy %>%
filter(Country == "Australia")
aus |>
autoplot(Population)
labs(y = "Population ", title = "Population for Australia ")
## $y
## [1] "Population "
##
## $title
## [1] "Population for Australia "
##
## attr(,"class")
## [1] "labels"
aus %>%
model(RW(Population ~ drift())) %>%
forecast(h = "3 years") %>%
autoplot(global_economy) + labs(y = "Population ", title = "Population for Australia ")
This data shows the population growth in Australia from 1960 to 2017. Using the time series function autoplot, we can observe a increase in the population growth overtime. There are no indication of seasonality or cycle patterns in the data, we can observe a linear growth. The random walk drift model will be used to forecast the future increase and decrease overtime.
Bricks (aus_production)
data("aus_production")
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
help("aus_production")
bricks_fit = aus_production %>%
filter(!is.na(Bricks)) %>%
model(SNAIVE(Bricks ~ lag("year")))
bricks_forecast = bricks_fit %>%
forecast( h = "5 years")
bricks_forecast %>%
autoplot(aus_production, level = NULL) + labs(
title = "Clay Bricks Production in Australia", y = "Millions of Bricks"
) + guides(colour = guide_legend(title = "Forecast"))
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
This data shows the quarterly production of bricks in Australia. using the time series function we can obserse seasonality in the production of bricks. A seasonal naive forecasting method, we be useful in forecasting the quarterly bricks production.
NSW Lambs (aus_livestock)
data("aus_livestock")
aus_livestock
## # A tsibble: 29,364 x 4 [1M]
## # Key: Animal, State [54]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
## 7 1977 Jan Bulls, bullocks and steers Australian Capital Territory 1800
## 8 1977 Feb Bulls, bullocks and steers Australian Capital Territory 1900
## 9 1977 Mar Bulls, bullocks and steers Australian Capital Territory 2700
## 10 1977 Apr Bulls, bullocks and steers Australian Capital Territory 2300
## # ℹ 29,354 more rows
help("aus_livestock")
lamb_meat = aus_livestock %>%
filter(State == "New South Wales", Animal == "Lambs")
lamb_meat %>%
autoplot(Count) +
labs(title = "New South Wales Lambs", y = "Count of Lambs")
lamb_fit <- lamb_meat %>%
model(
`Naïve` = NAIVE(Count)
)
lamb_fc <- lamb_fit %>%
forecast(h = 35)
lamb_fc %>%
autoplot(lamb_meat) +
labs(title = "Lamb Forecast", y = "Count of Lambs")
This data shows the meat production of lambs in Australia. Using the time series functions, there is no indication of cycle or seasonality through the data. A naive forecast, we help use the recent the recent observation to predict any future values.
Household wealth (hh_budget)
data("hh_budget")
hh_budget
## # A tsibble: 88 x 8 [1Y]
## # Key: Country [4]
## Country Year Debt DI Expenditure Savings Wealth Unemployment
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Australia 1995 95.7 3.72 3.40 5.24 315. 8.47
## 2 Australia 1996 99.5 3.98 2.97 6.47 315. 8.51
## 3 Australia 1997 108. 2.52 4.95 3.74 323. 8.36
## 4 Australia 1998 115. 4.02 5.73 1.29 339. 7.68
## 5 Australia 1999 121. 3.84 4.26 0.638 354. 6.87
## 6 Australia 2000 126. 3.77 3.18 1.99 350. 6.29
## 7 Australia 2001 132. 4.36 3.10 3.24 348. 6.74
## 8 Australia 2002 149. 0.0218 4.03 -1.15 349. 6.37
## 9 Australia 2003 159. 6.06 5.04 -0.413 360. 5.93
## 10 Australia 2004 170. 5.53 4.54 0.657 379. 5.40
## # ℹ 78 more rows
help("hh_budget")
hh_budget %>%
autoplot(Wealth)
hh_budget %>%
model(RW(Wealth ~ drift())) %>%
forecast(h = 24) %>%
autoplot(hh_budget) +
labs(title = "Household Wealth")
This data shows the annual wealth of the following countries for
Australia, Japan, Canada and USA from 1995-2016. The time series graph,
shows us there is increase and decrease of wealth overtime within these
countries. The drift method would be used to forecast the future wealth
based on the average wealth observed in these countries.
Australian takeaway food turnover (aus_retail)
data("aus_retail")
aus_retail
## # A tsibble: 64,532 x 5 [1M]
## # Key: State, Industry [152]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurant… A3349849A 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Sep 4.2
## 7 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Oct 4.8
## 8 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Nov 5.4
## 9 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Dec 6.9
## 10 Australian Capital Territory Cafes, restaurant… A3349849A 1983 Jan 3.8
## # ℹ 64,522 more rows
help("aus_retail")
take_away = aus_retail %>%
filter( Industry == "Takeaway food services")
take_away %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
take_away %>%
model(
`Drift` = RW(Turnover ~ drift())) %>%
forecast(h = 20) %>%
autoplot(aus_retail) + labs(
title = "Australian takeaway food turnover"
) + facet_wrap(~State, scales = "free")
This data shows the Australian Retail Trade Turnover in the Takeaway
Food Industry. There was increase and decrease in takeaway industry
overtime. The drift method os great method for forecasting the increase
and decrease overtime
Question 5.2 Use the Facebook stock price (data set gafa_stock) to do the following
data("gafa_stock")
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ℹ 5,022 more rows
facebook_stock = gafa_stock %>%
filter(Symbol == "FB", year(Date) >= 2015) %>%
mutate(day = row_number()) %>%
update_tsibble(index = day, regular = TRUE)
facebook_stock %>%
autoplot(Close) + labs(title = " Daily Clsoing Price of Facebook Stock", y = "USD$ Closing Price")
facebook_stock %>%
model( `Drift` = RW(Close ~ drift())) %>%
forecast(h = 45) %>%
autoplot(facebook_stock) + labs(title = " Daily Clsoing Price of Facebook Stock", y = "USD$ Closing Price")
facebook_stock %>%
model( `Drift` = RW(Close ~ drift())) %>%
forecast(h = 45) %>%
autoplot(facebook_stock) + labs(title = " Daily Clsoing Price of Facebook Stock", y = "USD$ Closing Price") +
geom_segment(aes(x = 1, y = 74.05, xend = 1000, yend = 134.45),
colour = "orange", linetype = "dashed")
## Warning in geom_segment(aes(x = 1, y = 74.05, xend = 1000, yend = 134.45), : All aesthetics have length 1, but the data has 1006 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
facebook_stock %>%
model(
`Mean` = MEAN(Close),
`Naïve` = NAIVE(Close),
`Seasonal naïve` = SNAIVE(Close),
`Drift` = RW(Close ~ drift())
) %>%
forecast(h = 350) %>%
autoplot(facebook_stock, level = NULL) + labs(title = " Daily Clsoing Price of Facebook Stock", y = "USD$ Closing Price")
## Warning: 1 error encountered for Seasonal naïve
## [1] Non-seasonal model specification provided, use RW() or provide a different lag specification.
## Warning: Removed 350 rows containing missing values or values outside the scale range
## (`geom_line()`).
Due to the pattern observed in the facebook daily closing price, a drift
method would be used to forecast any future values. The time series
fuction first shows a a steady increase over time with decrease starting
in day 875. There is a lack of seasonality or cycle observed in the
data, so a drift method would best capture any future values Question
5.3
Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.
# Extract data of interest
recent_production <- aus_production |>
filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals() + ggtitle("Beer Production Residuals")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)
# Ljung Test
fit %>%
augment() %>%
features(.innov, ljung_box, lag = 8)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Beer) 32.3 0.0000834
Question 5.4
Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.
Australian Exports series from global_economy
australiia_exes = global_economy %>%
filter( Country == "Australia")
aussie_fit = australiia_exes %>% model(NAIVE(Exports))
aussie_fit %>% gg_tsresiduals() + ggtitle("Austrialia Export Residuals")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_bin()`).
aussie_fit %>% forecast() |> autoplot(australiia_exes)
aussie_fit %>%
augment() %>%
features(.innov, ljung_box, lag = 8)
## # A tibble: 1 × 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 Australia NAIVE(Exports) 15.5 0.0508
In residuals over time plot we can observe the residuals begin scattered and fluctuating around the zero, indicating a that an accurate data is being captured. In our ACF plot, most the bars stay inside of the blue dotted line, meaning there is little autocorrelation in the model. This future supported by the Ljung-Box test, where our p value being equal to 0.0501. Our distribution plot of residuals has bell shaped graph, showing the errors in the modelto be balanced.
Bricks series from aus_production
bricks_series = aus_production %>%
filter(!is.na(Bricks))
bricks_series_fit = bricks_series %>% model(SNAIVE(Bricks))
bricks_series_fit %>% gg_tsresiduals() + ggtitle("Austrialia Residuals Bricks Production")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_bin()`).
bricks_series_fit %>% forecast() |>
autoplot(aus_production)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
bricks_series_fit %>%
augment() %>%
features(.innov ,box_pierce, lag = 8)
## # A tibble: 1 × 3
## .model bp_stat bp_pvalue
## <chr> <dbl> <dbl>
## 1 SNAIVE(Bricks) 267. 0
The residual plot have points that fluctuate up and down failing to be centered around zero. This indicating the model will not capture accurate patterns in the seasons.In the ACF plot a few lines cross the dotted blue lines, meaning the there are values that are correlated. In the histogram of the residuals are skewed to the right, and has a normal distribution. We will reject the probability of white noise since p value is equal to 0, showing that the values are highly related.
Question 5.7
For your retail time series (from Exercise 7 in Section 2.10)
set.seed(48462896)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries |>
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
c. Fit a seasonal naïve model using SNAIVE() applied to your training
data (myseries_train).
fit <- myseries_train |>
model(SNAIVE(Turnover))
fit |> gg_tsresiduals() + ggtitle("Aussie Retail Residuals Plot")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).
fc <- fit |>
forecast(new_data = anti_join(myseries, myseries_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc |> autoplot(myseries)
fit |> accuracy()
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Western… Hardwar… SNAIV… Trai… 3.39 11.5 8.77 6.65 13.9 1 1 0.823
fc |> accuracy(myseries)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Hardwar… Test 59.2 64.2 59.2 34.7 34.7 6.76 5.60 0.923
The amount of training data we use in the forecast will have an impact on the accuracy.If we have more training data we can make better predictions. If we dont have enough training data, it makes it difficult to observe the trends and create a reliable forecast with our data. But, if we have too much data it will affect our recent data, because there will be old data included in the new trends we observe.