library(fpp3)
In this document, we will be going through exercises 2.1, 2.2, 2.3, 2.4, 2.5, and 2.8 from Forecasting: Principles and Practice (3rd ed).
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
#help("aus_production")
aus_production |>
select(Bricks) |>
head()
## # A tsibble: 6 x 2 [1Q]
## Bricks Quarter
## <dbl> <qtr>
## 1 189 1956 Q1
## 2 204 1956 Q2
## 3 208 1956 Q3
## 4 197 1956 Q4
## 5 187 1957 Q1
## 6 214 1957 Q2
The aus_production dataset is the estimates of selected indicators of manufacturing production in Australia. For bricks, the clay brick production in millions of bricks is recorded. The time interval for this series is quarterly.
Below is a time plot of the series:
aus_production |>
select(Bricks) |>
autoplot(Bricks, na.rm = TRUE)
#help("pelt")
pelt |>
select(Lynx) |>
head()
## # A tsibble: 6 x 2 [1Y]
## Lynx Year
## <dbl> <dbl>
## 1 30090 1845
## 2 45150 1846
## 3 49150 1847
## 4 39520 1848
## 5 21230 1849
## 6 8420 1850
The pelt dataset includes Hudson Bay Company trading records for the amount of Snowshoe Hare and Canadian Lynx furs traded from 1845 to 1935. For Lynx, it is the number of Canadian Lynx pelts traded. The interval for the series is annual.
Below is a timeplot of the series:
pelt |>
select(Lynx) |>
autoplot(Lynx)
#help("gafa_stock")
gafa_stock |>
select(Close) |>
head()
## # A tsibble: 6 x 3 [!]
## # Key: Symbol [1]
## Close Date Symbol
## <dbl> <date> <chr>
## 1 79.0 2014-01-02 AAPL
## 2 77.3 2014-01-03 AAPL
## 3 77.7 2014-01-06 AAPL
## 4 77.1 2014-01-07 AAPL
## 5 77.6 2014-01-08 AAPL
## 6 76.6 2014-01-09 AAPL
The gafa_stock dataset includes historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. For Close, it is the closing price of each stock. The interval for each series is daily.
Below is a timeplot including each series:
gafa_stock |>
select(Close) |>
autoplot(Close)
#help("vic_elec")
vic_elec |>
select(Demand) |>
head()
## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
## Demand Time
## <dbl> <dttm>
## 1 4383. 2012-01-01 00:00:00
## 2 4263. 2012-01-01 00:30:00
## 3 4049. 2012-01-01 01:00:00
## 4 3878. 2012-01-01 01:30:00
## 5 4036. 2012-01-01 02:00:00
## 6 3866. 2012-01-01 02:30:00
The vic_elec dataset includes operational electricity demand for Victoria, Australia. For Demand, it is the total electricity demand in MWh. The interval for the series is half-hourly.
Below is a timeplot of the series:
vic_elec |>
select(Demand) |>
autoplot(Demand) + labs(title = "Half-Hourly Electricity Demand for Victoria, Australia", y = "Electricity Demand (MWh)")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
gafa_stock |>
select(Close) |>
group_by(Symbol) |>
filter(Close == max(Close))
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Close Date Symbol
## <dbl> <date> <chr>
## 1 232. 2018-10-03 AAPL
## 2 2040. 2018-09-04 AMZN
## 3 218. 2018-07-25 FB
## 4 1268. 2018-07-26 GOOG
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv("https://bit.ly/fpptute1", show_col_types = FALSE)
head(tute1)
## # A tibble: 6 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The USgas package contains data on the demand for natural gas in the US.
install.packages("USgas")
mytimeseries <- USgas::us_total |>
as_tsibble(index = year, key = state)
new_england <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
mytimeseries |>
filter(state %in% new_england) |>
autoplot(y) + labs(title = "Annual Natural Gas Consumption For The New England Area", y = "Gas Consumption (cu. ft)")
mytimeseries <- openxlsx::read.xlsx("https://bit.ly/fpptourism")
head(mytimeseries)
## Quarter Region State Purpose Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
head(tsibble::tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
mytimeseries |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter, key = c(Region, State, Purpose)) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
suppressWarnings(
mytimeseries |>
group_by(Region, Purpose) |>
summarise(Trips = mean(Trips)) |>
arrange(desc(Trips)) |>
head(5)
)
## # A tsibble: 5 x 4 [1Q]
## # Key: Region, Purpose [3]
## # Groups: Region [2]
## Region Purpose Quarter Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
## 2 Sydney Business 2001 Q4 948.
## 3 Sydney Visiting 2016 Q4 921.
## 4 Sydney Visiting 2017 Q4 920.
## 5 Sydney Visiting 2017 Q1 916.
If averaging per quarter, then the combination of region and purpose with the maximum number of overnight trips is Melbourne and Visiting in the 2017 Q4 with 985.28 trips.
mytimeseries |>
index_by(year = 0) |>
group_by(Region, Purpose) |>
summarise(Trips = mean(Trips)) |>
arrange(desc(Trips)) |>
head(5)
## # A tsibble: 5 x 4 [?]
## # Key: Region, Purpose [5]
## # Groups: Region [3]
## Region Purpose year Trips
## <chr> <chr> <dbl> <dbl>
## 1 Sydney Visiting 0 747.
## 2 Melbourne Visiting 0 619.
## 3 Sydney Business 0 602.
## 4 North Coast NSW Holiday 0 588.
## 5 Sydney Holiday 0 550.
If averaging without time considerations, then the combination of region and purpose with the maximum number of overnight trips is Sydney and Visiting with 747.27 trips.
mytimeseries |>
group_by(State) |>
summarise(Trips = sum(Trips)) |>
head()
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
us_employment |>
filter(Title == "Total Private") -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 4 [1M]
## # Key: Series_ID [1]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
suppressMessages(
mytimeseries |>
autoplot()
)
suppressMessages(
mytimeseries |>
gg_season()
)
suppressMessages(
mytimeseries |>
gg_subseries()
)
suppressMessages(
mytimeseries |>
gg_lag()
)
suppressMessages(
mytimeseries |>
ACF(Employed) |>
autoplot()
)
There is a clear upward trend in total private employment. What initially appears to be seasonality in employment from the time series plot turns out to be multi-year cyclical boom and busts.
Despite being consistently on the rise, unemployment is always fighting against the cyclical nature of the economy.
The very slight increase in the average over the seasons is the only consistent pattern which I would attribute to the general upward trend.
Any years which have a recession, like 2008, see a sharp cut in private employment numbers.
aus_production |>
select(Bricks) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1Q]
## Bricks Quarter
## <dbl> <qtr>
## 1 189 1956 Q1
## 2 204 1956 Q2
## 3 208 1956 Q3
## 4 197 1956 Q4
## 5 187 1957 Q1
## 6 214 1957 Q2
suppressMessages(
mytimeseries |>
autoplot(na.rm = TRUE)
)
suppressMessages(
mytimeseries |>
gg_season(na.rm = TRUE)
)
suppressMessages(
mytimeseries |>
gg_subseries(na.rm = TRUE)
)
suppressMessages(
mytimeseries |>
gg_lag()
)
## Warning: Removed 20 rows containing missing values (gg_lag).
suppressMessages(
mytimeseries |>
ACF(Bricks) |>
autoplot(na.rm = TRUE)
)
The trend was increasing production until the 1980s and then it seemed to have stabilized. There is seasonality with quarter 2 and 3 having higher production than 1 and 4. There do seem to be boom and bust cycles as well.
Brick production has petered off for some reason which is strange considering housing is quite often almost always being made wherever it is habitable.
The more moderate temperatures in quarter 2 and 3 likely lead to increased demand for bricks as more construction work is taking place.
1992 in particular seemed to have a bad crash in production.
pelt |>
select(Hare) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1Y]
## Hare Year
## <dbl> <dbl>
## 1 19580 1845
## 2 19600 1846
## 3 19610 1847
## 4 11990 1848
## 5 28040 1849
## 6 58000 1850
suppressMessages(
mytimeseries |>
autoplot()
)
suppressMessages(
mytimeseries |>
gg_season()
)
Annual data does not have seasons, thus we can not run a season graph.
suppressMessages(
mytimeseries |>
gg_subseries()
)
suppressMessages(
mytimeseries |>
gg_lag()
)
suppressMessages(
mytimeseries |>
ACF(Hare) |>
autoplot()
)
There is no seasonality with this series being annual. There are boom and bust cycles apparent. There does not seem to be a trend either.
The harvesting of hare pelts is almost certainly tied to a lifecycle that can be seen in the correlogram. Every 5 years hares seem to cycle between lowest population and maximum.
There is no seasonality with this series being annual.
1863 seems to be a particularly standout year with a large harvest of hare pelts.
PBS |>
select(Cost) |>
filter(ATC2 == "H02") |>
group_by(ATC2) |>
summarise(Cost = sum(Cost)) -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 3 [1M]
## # Key: ATC2 [1]
## ATC2 Month Cost
## <chr> <mth> <dbl>
## 1 H02 1991 Jul 429795
## 2 H02 1991 Aug 400906
## 3 H02 1991 Sep 432159
## 4 H02 1991 Oct 492543
## 5 H02 1991 Nov 502369
## 6 H02 1991 Dec 602652
suppressMessages(
mytimeseries |>
autoplot()
)
suppressMessages(
mytimeseries |>
gg_season()
)
suppressMessages(
mytimeseries |>
gg_subseries()
)
suppressMessages(
mytimeseries |>
gg_lag()
)
suppressMessages(
mytimeseries |>
ACF(Cost) |>
autoplot()
)
There is seasonality and upward trend apparent. However, cyclicity does not seem to be obvious.
The costs for H02 class prescriptions seem to sharply decrease when February comes around and build back up to the peak at January.
The costs for H02 class prescriptions seem to sharply decrease when February comes around and build back up to the peak at January.
Around 2007 there seems to be a separation from the norm with the last quarter of the year declining in cost rather than increasing.
us_gasoline -> mytimeseries
head(mytimeseries)
## # A tsibble: 6 x 2 [1W]
## Week Barrels
## <week> <dbl>
## 1 1991 W06 6.62
## 2 1991 W07 6.43
## 3 1991 W08 6.58
## 4 1991 W09 7.22
## 5 1991 W10 6.88
## 6 1991 W11 6.95
suppressMessages(
mytimeseries |>
autoplot()
)
suppressMessages(
mytimeseries |>
gg_season()
)
suppressMessages(
mytimeseries |>
gg_subseries()
)
suppressMessages(
mytimeseries |>
gg_lag()
)
suppressMessages(
mytimeseries |>
ACF(Barrels) |>
autoplot()
)
There is a general upward trend, however no obvious seasonality. Yet there does seem to be cyclicity.
Oil barrel prices are very similar to stock prices in their high correlation between each other and unpredictability.
It is surprising that prices do not seem to fluctuate with periods such as summer that people would be using their cars to travel more.
The inflection point of 2007 where the trend reverses temporarily.