library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tsibble)
Explore the following four time series: Bricks
from
aus_production
, Lynx
from pelt
,
Close
from gafa_stock
, Demand
from vic_elec
.
?
(or help()
) to find out about the
data in each series.autoplot()
to produce a time plot of each
series.This data contains quarterly production of beer, tobacco, bricks, cement, electricty, and gas in Austrailia from 1956 to 2010, sourced from the Australian Bureau of Statistics.
?aus_production
head(aus_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
interval(aus_production)
## <interval[1]>
## [1] 1Q
aus_production %>%
autoplot(Bricks) +
labs(title = "Bricks Quarterly Production in Australia",
x = "Quarter",
y = "Bricks (in millions)")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
The chart shows a clear upward trend in brick production from late 1950s, reaching its peak in late 1980s. Following the peak, there is downward trend in brick production. There is evidence of seasonal pattern.
This data contains Hudson Company annual trading records for Snowshoe Hare and Canadian furs from 1845 to 1935.
?pelt
head(pelt)
## # A tsibble: 6 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
interval(pelt)
## <interval[1]>
## [1] 1Y
pelt %>%
autoplot(Lynx) +
labs(title = "Canadian Lynx Pelts Annual Trading Records",
x = "Year",
y = "Number of Lynx Pelts")
There is no long term upward or downward trend. The Lynx trading data shows a strong cyclinal pattern with peaks about every 10 years.
This data contains historical daily stock prices in $USD from 2014 to 2018 for Google, Amazon, Facebook, and Apples.
?gafa_stock
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
gafa_stock %>%
autoplot(Close) +
labs(title = "Daily Stock Closing Prices from 2014 - 2018",
x = "Time",
y = "Closing Stock Prices")
All four stocks shows an upward trend, especially in the late 2010s.There is some dips, showing the volatility of the stocks.
This data contains half-hourly electricity demand in Victoria, Australia.
?vic_elec
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
interval(vic_elec)
## <interval[1]>
## [1] 30m
vic_elec %>%
autoplot(Demand) +
labs(title = "Electricity Demand in Victoria, Australia",
x = "Time",
y = "Demand (in MWh)")
The chart is more “noisy” compared to the last 3 charts. Electricity demand in Australia shows a strong seasonal pattern with peaks mid year (summer) and between end of the year and beginning of the next year (winter).There is no clear no term trends over the three year data.
Use filter()
to find what days corresponded to the peak
closing price for each of the four stocks in
gafa_stock
.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Download the file tute1.csv
from the book website, open
it in Excel (or some other spreadsheet application), and review its
contents. You should find four columns of information. Columns B through
D each contain a quarterly series, labelled Sales, AdBudget and GDP.
Sales contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.
You can read the data into R with the following script:
tute1 <- readr::read_csv("https://raw.githubusercontent.com/suswong/DATA-624/refs/heads/main/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
Convert the data to time series
# Code provided by text
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Construct time series plots of each of the three series.
# Code provided by text
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include
facet_grid()
When you do not include the facet_grid()
, the three time
series will be plotted on a single graph, instead of on three separate
panels. This makes it harder to interpret trends as all three time
series will be sharing the same y-axis. Different time series usually
have different scales. Using the facet_grid()
function
allows a fair comparison between the time series as it allows the series
to keep its scale.
# modifying the code provided by text
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The USgas
package contains data on the demand for
natural gas in the US.
Install the USgas
package.
#install.packages("USgas")
library(USgas)
Create a tsibble from us_total
with year as the index
and state as the key.
This dataset contains information on US annual total natual gas consumption by state between 1997 and 2019.
#?us_total
#head(us_total)
gas_total <- us_total %>%
as_tsibble(index = year, key = state)
head(gas_total)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
gas_total %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
autoplot() +
labs(title = "Annual Total Natural Gas Consumption from 1997-2019",
x = "Year",
y = "Consumption (Million Cubic Feet)")
## Plot variable not specified, automatically selected `.vars = y`
When the annual natural gas consumption by state for the New England area are plot on one graph, we can see Massachusetts shows the highest overall gas consumption among the New England states and Vermont has the lowest consumption.However, it was difficult to read trends in gas consumption for Vermont as it looks like a straight line with no trend
When we plot each state individually, we can see Vermont’s gas consumption has been very gradually and steadily increasing over time. There are no clear trends in Rhode Island consumption but its gas consumption is consistently lower than Connecticut and Massachusetts and higher than Maine, Vermont, and New Hampshire.
gas_total %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
autoplot() +
labs(title = "Annual Total Natural Gas Consumption from 1997-2019",
x = "Year",
y = "Consumption (Million Cubic Feet)") +
facet_wrap(~state, scales = "free_y", ncol = 2 ) +
theme(legend.position = "none")
## Plot variable not specified, automatically selected `.vars = y`
Download tourism.xlsx
from the book website and read it
into R using readxl::read_excel()
.
#install.packages("readxl")
#library(readxl)
# I was unable to get the xlsx raw file link from my GitHub to work so I converted it to csv file.
tourism_data <- read.csv("https://raw.githubusercontent.com/suswong/DATA-624/refs/heads/main/tourism.csv")
head(tourism_data)
## Quarter Region State Purpose Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
Create a tsibble which is identical to the tourism
tsibble from the tsibble
package.
# Note
tourism_tsibble <- tourism_data %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism_tsibble)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Find what combination of Region
and Purpose
had the maximum number of overnight trips on average.
Tourists visiting Sydney has the highest number of overnight trips on average.
tourism_data %>%
group_by(Region, Purpose) %>%
summarise(avg_trips = mean(Trips)) %>%
arrange(-avg_trips)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 304 × 3
## # Groups: Region [76]
## Region Purpose avg_trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # ℹ 294 more rows
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
state_tourism <- tourism_data %>%
group_by(Quarter, State) %>%
summarise(total_trips = sum(Trips)) %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(State))
## `summarise()` has grouped output by 'Quarter'. You can override using the
## `.groups` argument.
head(state_tourism)
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## # Groups: @ Quarter [6]
## Quarter State total_trips
## <qtr> <chr> <dbl>
## 1 1998 Q1 ACT 551.
## 2 1998 Q2 ACT 416.
## 3 1998 Q3 ACT 436.
## 4 1998 Q4 ACT 450.
## 5 1999 Q1 ACT 379.
## 6 1999 Q2 ACT 558.
Use the following graphics functions: autoplot()
,
gg_season()
, gg_subseries()
,
gg_lag()
, ACF()
and explore features from the
following time series: Total Private Employed
from
us_employment
, Bricks
from
aus_production
, Hare
from pelt
,
“H02” Cost
from PBS
, and Barrels
from us_gasoline
.
There is a strong upward trend in employment over time. This is not
surprising as population increase over time, employment will increase
too. The gg_season()
and gg_subseries()
plots
shows seasonal pattern of employment increasing from January and peak
during the summer months. There is a large dip in employment in
2008-2010 (financial crisis).
?us_employment
total_employed <- us_employment %>%
filter(Title == "Total Private")
autoplot(total_employed, Employed) +
labs(title = "Autoplot of Total Private Employed",
x = "Month",
y = "Total Employed")
gg_season(total_employed, Employed) +
labs(title = "Seasonal Plot of Total Private Employed",
x = "Month",
y = "Total Employed")
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
gg_subseries(total_employed, Employed) +
labs(title = "Seasonal Subseries Plot of Total Private Employed")
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
gg_lag(total_employed, Employed) +
labs(title = "Lag Plot of Total Private Employed",
y = "Total Employed")
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
total_employed %>%
ACF(Employed) %>%
autoplot()+
labs(title = "Autocorrection Plot of Total Private Employed")
Over the years there is a clear overall upward trend in brick production from late 1950s, reaching its peak in late 1980s. Following the peak, there is overall downward trend in brick production.
At the same time, there is also evidence of seasonal pattern. The
gg_season()
and gg_subseries()
plots shows
brick production increases from quarter 1 to quarter 3 and decreases
after. There are notable dips in mid 1970s and mid 1980s.
aus_production %>%
autoplot(Bricks) +
labs(title = "Bricks Quarterly Production in Australia",
x = "Quarter",
y = "Bricks (in millions)")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot()+
labs(title = "Autocorrection Plot of Total Private Employed")
There is no long term upward or downward trend. The hare trading data shows a strong cyclinal pattern with peaks about every 10 years. Some notable unusual years are high peaks aorund 1865 and 1885.
pelt %>%
autoplot(Hare) +
labs(title = "Canadian Hare Pelts Annual Trading Records",
x = "Year",
y = "Number of Lynx Pelts")
# This dataset contains yearly data so we cannot use this function here
#gg_season(pelt, Hare)
# This dataset contains yearly data so we cannot use this function here
gg_subseries(pelt, Hare)
gg_lag(pelt, Hare)
pelt %>%
ACF(Hare) %>%
autoplot()
There is an overall upward trend of the cost of the drug over the
years. Seasonal pattern is evident in the gg_season()
plot
where the cost of the drug peak at the end of the year to the begginnig
of the following year. There is sharp drop cost in February and the cost
gradually increases to the end of the year. There is an unusual number
of sales (dip) in March 2008 where in most cases there is an increase
number of sales from February to March.
cost <- PBS %>%
filter(ATC2 == "H02") %>%
select(Month, Concession, Type, Cost) %>%
summarise(TotalC = sum(Cost))
autoplot(cost) +
labs(title = "Autoplot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `.vars = TotalC`
gg_season(cost) +
labs(title = "Seasonal Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`
gg_subseries(cost) +
labs(title = "Seasonal Subseries Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`
gg_lag(cost) +
labs(title = "Lag Plot of Total Cost of Drug H02 (Monthly)")
## Plot variable not specified, automatically selected `y = TotalC`
cost %>%
ACF(TotalC) %>%
autoplot() +
labs(title = "Autcorrection Plot of Total Cost of Drug H02 (Monthly)")
This time series shows a overall trend upward from 1990s to around
2006, trend downward until mid 2010s, and finally an upward trends. The
gg_season()
and gg_subseries()
plots shows
seasonal pattern of increases of gasoline supplied in the summer months
and decreases as winter arrives. There are notable unusual dips from
2008 to 2010s (financial crisis).
us_gasoline %>%
autoplot(Barrels)
gg_season(us_gasoline, Barrels)
gg_subseries(us_gasoline, Barrels)
gg_lag(us_gasoline, Barrels)
us_gasoline %>%
ACF(Barrels) %>%
autoplot()