library(tsibble)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(fpp3)
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ ggplot2 4.0.0.9000
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## Warning: package 'fable' was built under R version 4.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ lubridate::interval() masks tsibble::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(USgas)
library(dplyr)
library(ggplot2)
library(readxl)
“Quarterly production of selected commodities in Australia”
#aus_production help
?aus_production
The dataset is titled as “Quarterly production of selected commodities in Australia”, and is described as containing quarterly estimated of selected indicators of manufacturing production in Australia. It is of the tsibble time series class, and is a quarterly tsibble consisting of six values: Beer, Tobacco, Bricks, Cement, Electricity, and Gas. The source of the data is the Australian Bureau of Statistics.
Beer - Beer production in megalitres Tobacco - Tobacco and cigarette production in tonnes. Bricks - Clay brick production in millions of bricks. Cement - Portland cement production in thousands of tonnes. Electricity - Electricity production in gigawatt hours. Gas - Gas production in petajoules.
#interval of aus_production dataset
interval(aus_production)
## Warning: tz(): Don't know how to compute timezone for object of class
## tbl_ts/tbl_df/tbl/data.frame; returning "UTC".
## <Interval[0]>
The aus_production dataset contains 1 tsibble object with 6 different time series variables. It is a single dataset tracking 6 different production metrics over the same time period. The data represents a quarterly time interval, reporting 4 observations per year.
#Time series plot of Bricks from aus_production
aus_production |>
autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
“Pelt trading records”
#pelt help
?pelt
This pelt dataset is titled “Pelt trading records and is described as the Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. The data houses company trade records. It is of the tsibble time series class, and is an annual tsibble consisting of two values: Hare, and Lynx. The source of the data is the Hudson Bay Company.
Hare - The number of Snowshoe Hare pelts traded. Lynx - The number of Canadian Lynx pelts traded.
#interval of pelt dataset
interval(pelt)
## Warning: tz(): Don't know how to compute timezone for object of class
## tbl_ts/tbl_df/tbl/data.frame; returning "UTC".
## <Interval[0]>
The pelt dataset contains 1 tsibble object with two time series variables. It is a single dataset tracking 2 different commodities traded over the same time period of 1 year. There is one observations for each year ranging from 1845 to 1935
#time series plot of pelt Lynx
pelt |>
autoplot(Lynx)
“GAFA stock prices”
#gafa_stock help
?gafa_stock
The gafa_stock dataset is titled “GAFA stock prices” and is described as containing historical stock prices from 2014 to 2018 for Google, Amazon, Facebook, and Apple in $USD. The dataset is a tsibble with irrigular trading days. The variables are Open, High, Low, Close, Adj_Close, and Volume. The source of the data is Yahoo Finance historical data.
Open - The opening price for the stock. High - The stock’s highest trading price. Low - The stock’s lowest trading price. Close - The closing price for the stock. Adj_Close - The adjusted closing price for the stock. Volume - The amount of stock traded.
#interval of gafa_stock
interval(gafa_stock)
## Warning: tz(): Don't know how to compute timezone for object of class
## tbl_ts/tbl_df/tbl/data.frame; returning "UTC".
## <Interval[0]>
The gafa_stock dataset contains 1 tsibble and 6 variables representing attributes of a stock. The interval of the dataset is considered to be irregular.
#gafa_stock time series plot of Close
gafa_stock |>
autoplot(Close)
“Half-hourly electricity demand for Victoria, Australia”
#vic_elec help
?vic_elec
The dataset vic_elec is described as a half-hourly tsibble containing three variables: Demand, Temperature, and Holiday. The data is for operation demand of electricity generating units. The source of the data is the Australian Energy Market Operator.
#interval of vic_elec dataset
interval(vic_elec)
## Warning: tz(): Don't know how to compute timezone for object of class
## tbl_ts/tbl_df/tbl/data.frame; returning "UTC".
## <Interval[0]>
The vic_elec dataset contains 1 tsibble and 3 variables: Demand, Temperature, and Holiday. The interval is half-hourly.
Demand - Total electricity demand in MWh. Temperature - Temperature of Melbourne (BOM site 086071). Holiday - Indicator for if that day is a public holiday.
#time series plot of Demand
vic_elec |>
autoplot(Demand)
After filtering the data we can see that AAPL, had a max closing data of 2018-10-03, AMZN 2018-09-04, FB 2018-07-25, and GOOG 2018-07-26. An interesting observation is that FB and GOOG had max closing prices within a day of each other.
#filter max Close
gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close)) |>
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
#importing data
tute1 <- readr::read_csv("https://bit.ly/fpptute1")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
#creating custom tsibble
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Below are two plots of the quarterly time series for labelled Sales, AdBudget, and GDP. What you will notice is that the first plot is missing the facet_grid, which is properly placed in the second plot. By excluding the facet_grid we observe that the individual time series do now become separated into their separate windows, and instead are plotted together along the same scale. When facet-grid is included, each series has its own plot, and each plot has the appropriate scale for the data represented.
#time series data without facet_grid
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
#time series data with facet_grid
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
head(us_total)
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
us_total_tsibble <- us_total |>
as_tsibble(index = year, key = state)
us_total_tsibble
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # ℹ 1,256 more rows
new_england <- c("Maine", "Vermont", "New Hampshire",
"Massachusetts", "Connecticut", "Rhode Island")
us_total_tsibble |>
filter(state %in% new_england) |>
autoplot(y) +
labs(
title = "Annual Natural Gas Consumption in New England",
y = "Natural Gas Consumption",
x = "Year"
)
download.file("https://bit.ly/fpptourism",
destfile = "tourism.xlsx",
mode = "wb")
tourism2 <- read_excel("tourism.xlsx")
head(tourism2)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
#Tourism Tsibble
tourism_tsibble <- tourism2 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(
index = Quarter,
key = c(Region, State, Purpose)
)
tourism_tsibble
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
tourism_tsibble |>
group_by(Region, Purpose) |>
summarise(avg_trips = mean(Trips, na.rm = TRUE)) |>
ungroup() |>
filter(avg_trips == max(avg_trips))
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter avg_trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
#Tourism State
tourism_state <- tourism_tsibble |>
index_by(Quarter) |>
group_by(State) |>
summarise(Trips = sum(Trips)) |>
ungroup() |>
as_tsibble(index = Quarter, key = State)
tourism_state
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
The visualization below tells us that there are different scales across states. New South Wales and Victoria have the highest trip volumes, and Queensland shows strong positive growth. ACT, Northern Territory, and Tasmania have lower volume.
Most states show upward trends, especially Victoria and New South Wales. Western Australia shows some dip occurring in the middle before recovering.
Quarterly fluctuations are visible in all states. You can clearly see the peaks and troughs in the visualizations which suggest seasonal patterns in tourism, with peaks likely occurring during the summer and holidays.
#Tourism State Plot with Facet Wrapping
tourism_state |>
autoplot(Trips) +
facet_wrap(~ State, scale = "free_y") +
labs(
title = "Total Overnight Trips by State",
y = "Trips (thousands)",
x = "Quarter"
)
Total Private Employment
# Filter for Total Private employment
total_private <- us_employment |>
filter(Title == "Total Private")
# Autoplot - overall trend
total_private |> autoplot(Employed) +
labs(title = "Total Private Employment in US")
# Seasonal plot
total_private |> gg_season(Employed) +
labs(title = "Seasonal Plot: Total Private Employment")
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Subseries plot
total_private |> gg_subseries(Employed) +
labs(title = "Subseries Plot: Total Private Employment")
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Lag plot
total_private |> gg_lag(Employed, geom = "point") +
labs(title = "Lag Plot: Total Private Employment")
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# ACF plot
total_private |> ACF(Employed) |> autoplot() +
labs(title = "ACF: Total Private Employment")
The autocorrelation function plot for Total Private employment shows strong upward trend. This is obvious due to the ACF slow decay across all lags. The autocorrelations are all high (near 1.0) and well above the significant bounds. There is no clear sign of seasonality, as there are no cyclical patterns. The first few values being very close to 1 means employment in one month is correlated with the next month.
Bricks
# Autoplot
aus_production |> autoplot(Bricks) +
labs(title = "Australian Brick Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Seasonal plot
aus_production |> gg_season(Bricks) +
labs(title = "Seasonal Plot: Brick Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Subseries plot
aus_production |> gg_subseries(Bricks) +
labs(title = "Subseries Plot: Brick Production")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Lag plot
aus_production |> gg_lag(Bricks, geom = "point") +
labs(title = "Lag Plot: Brick Production")
## Warning: Removed 20 rows containing missing values (gg_lag).
# ACF plot
aus_production |> ACF(Bricks) |> autoplot() +
labs(title = "ACF: Brick Production")
The Bricks production ACF plot shows signs of strong seasonality as seen in the scalloped pattern. The peaks are apparent at lags 4, 8, 12, 16, and 20, which is every 4 quarters annually. What this tells us is that there is a strong quarterly seasonality and that brick production follows a clear seasonal cycle. All values remain positive above the blue line.
Regarding the correlation, lag 4 had a high correlation while lag 8 was more moderate and lag 12 even lower, signifying seasonal demand in construction. Brick production in one quarter is less correlated with quarters 2 seasons away.
Some unusal peaks consist of the mid 1970s to early 1980s where peak production was experienced. In contrast, the 1960s showed the lowest production period. And then the 2000s had another boost in production.
Pelt
# 1. Autoplot - shows the famous lynx-hare cycle
pelt |> autoplot(Hare) +
labs(title = "Hare Pelts Traded (Annual Data)")
# 2. Lag plot - will show the cyclical pattern
pelt |> gg_lag(Hare, geom = "point", lags = 1:9) +
labs(title = "Lag Plot: Hare Pelts")
# 3. ACF plot - will show the ~10 year cycle
pelt |> ACF(Hare) |> autoplot() +
labs(title = "ACF: Hare Pelts")
# Skip gg_season() and gg_subseries() - they don't work with annual data
In contrast to Bricks, pelt show a cyclical pattern, but not much seasonal. This makes sense as the data is annual and not seasonal. Lags 9-10 show a positive correlation, as opposed to lags 4-5 showing negative correlations and lag 1-2 nearing zero. The cycles are driven by natural ecological patterns.
# Filter for H02
h02 <- PBS |>
filter(ATC2 == "H02")
# Need to aggregate if multiple series
h02_total <- h02 |>
summarise(Cost = sum(Cost))
# Autoplot
h02_total |> autoplot(Cost) +
labs(title = "H02 Drug Cost (PBS)")
# Seasonal plot
h02_total |> gg_season(Cost) +
labs(title = "Seasonal Plot: H02 Cost")
# Subseries plot
h02_total |> gg_subseries(Cost) +
labs(title = "Subseries Plot: H02 Cost")
# Lag plot
h02_total |> gg_lag(Cost, geom = "point") +
labs(title = "Lag Plot: H02 Cost")
# ACF plot
h02_total |> ACF(Cost) |> autoplot() +
labs(title = "ACF: H02 Cost")
The H02 pharmaceutical cost data shows both a strong upward trend and a pronounced monthly seasonality. Costs approximately doubled over the 17-year period. The most distinctive seasonal feature is a dramatic peak in Januar. After that January peak, costs drop sharply in February and remain relatively stable through November before spiking again in December. The ACF plot shows a pattern of slow decay and oscillating peaks at 12 month intervals. The lag plots show much tighter linear relationships. H02 costs represent a clear example of a time series with both strong trend and stable seasonality.
# Autoplot
us_gasoline |> autoplot(Barrels) +
labs(title = "US Gasoline Production")
# Seasonal plot
us_gasoline |> gg_season(Barrels) +
labs(title = "Seasonal Plot: Gasoline Production")
# Subseries plot
us_gasoline |> gg_subseries(Barrels) +
labs(title = "Subseries Plot: Gasoline Production")
# Lag plot
us_gasoline |> gg_lag(Barrels, geom = "point") +
labs(title = "Lag Plot: Gasoline Production")
# ACF plot
us_gasoline |> ACF(Barrels) |> autoplot() +
labs(title = "ACF: Gasoline Production")
US gasoline production shows a strong upward trend with production increasing from approximately 7 to 9.5 million barrels per week. It also shows not very consistent seasonal pattern. The ACF plot reveals extremely high persistence with slow linear decay, indicating being non-stationary dominated by trend, while the seasonal plot appears chaotic with lines crossing erratically and no repeatable annual shape. The lag plots show remarkably tight linear relationships with minimal scatter, suggesting highly predictable smooth changes despite the week to week volatility visible in the seasonal plot. This combination of strong trend, high short-term volatility, and absence of seasonality reflects continuous industrial production influenced primarily by long-term economic growth and irregular supply disruptions rather than predictable calendar effects.