#chunks
knitr::opts_chunk$set(eval=TRUE, message=FALSE, warning=FALSE, fig.height=5, fig.align='center')
#libraries
library(tidyverse)
library(fpp3)
library(readxl)
#random seed
set.seed(42)
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
We loaded the relevant datasets in R to examine the time series data, and used the help() or? functions to obtain comprehensive details about each series.
#Check first 5 rows of the data
head(aus_production)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
head(pelt)
## # A tsibble: 6 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
aus_production: a half-hourly tsibble with 218 observations of six variables. Estimates published quarterly for a few chosen manufacturing production metrics in Australia.
pelt: an annual tsibble with 91 observations of three variables. Trading data for Canadian lynx furs and snowshoe hares from 1845 to 1935 kept by the Hudson Bay Company.
gafa_stock: a tsibble containing data on irregular trading days with 5032 observations of 8 variables. Historical Google, Amazon, Facebook, and Apple stock values from 2014 to 2018 (pricing is in USD).
vic_elec: a half-hourly tsibble with 52608 observations of five values.This data relates to operational demand (the amount of demand satisfied by generation imports into the area as well as locally scheduled, semi-scheduled, and non-scheduled intermittent generating units with an aggregate capacity more than 30 MWh).
#Learn details about the data
help(aus_production)
help(pelt)
?gafa_stock
?vic_elec
Bricks, aus_production (clay brick production in millions of bricks): every quarter (every 3 months).
Lynx, pelt (the number of Canadian Lynx pelts traded): annually.
close, gafa_stock (the closing price for the stock): daily.
Demand, vic_elec (total electricity demand in MWh): half-hourly.
Quarterly production of bricks plot shows the quarterly production of bricks manufactured since 1956. The graph would typically show seasonal trends caused by economic factors influencing construction demand, such as booms and recessions.
Annual lynx pelt numbers graph shows the number of lynx pelts captured each year beginning in the mid-19th century. The plot can reveal ecological cycles.
Daily closing stock price graph depicts the daily closing prices of stocks over time. We can see a clear picture of the stock’s price fluctuations on a daily basis. Sharp spikes or drops could be linked to market events, product launches, or financial reports.
Half-hour electricity demand in Victoria plot depicts half-hourly electricity demand in Victoria during a specific period in 2012. It shows peak load times during the day, which are frequently found in the morning and late afternoon.
#Time plot for Bricks, aus_production, quarterly
autoplot(aus_production, Bricks) +
ggtitle("Brick Production Over Time") #+
# xlab("Quarter") +
# ylab("Bricks Produced (Millions)")
#Time plot for Lynx, pelt, annually
autoplot(pelt, Lynx) +
ggtitle("Lynx Pelts Over Time") #+
# xlab("Year") +
# ylab("Number of Lynx Pelts")
#Time plot for Close, gafa_stock, daily
autoplot(gafa_stock, Close) +
ggtitle("GAFA stock prices daily") #+
# xlab("Date") +
# ylab("Closing Price")
#Time plot for Demand, vic_elec, half-hourly
autoplot(vic_elec, Demand)
Each plot accurately represents the time intervals, and for the final plot, we customized the axis labels and title.
#Time plot for Demand, vic_elec, half-hourly
autoplot(vic_elec, Demand) +
ggtitle("Half-hourly electricity demand for Victoria") +
xlab("Time") +
ylab("Electricity Demand (MW)")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
The dates when each of the four stocks in the GAFA dataset (Google, Apple, Facebook, and Amazon) reached their peak closing prices:
Google (GOOG): July 26, 2018, peak closing price is $1268.33.
Amazon (AMZN): September 4, 2018, peak closing price is $2039.51.
Facebook (FB): July 25, 2018, peak closing price is $217.50.
Apple (AAPL): October 3, 2018, peak closing price is $232.07.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
ungroup()
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
There are four columns in the file tute1.csv. It covers the years 1981 to 2005 and provides economic and company-specific financial metrics every quarter:
Quarter - the quarterly period for each entry, beginning in March 1981.
Sales - a small company’s quarterly sales adjusted for inflation.
AdBudget- the advertising budgets for the same quarters, adjusted for inflation.
GDP - the gross domestic product for each quarter, adjusted for inflation.
#Read data from Github
tute1 <- read.csv("https://raw.githubusercontent.com/ex-pr/DATA624/main/timeseries_graphics/tute1.csv")
View(tute1)
#Convert the data to time series
mytimeseries <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
#Construct time series plots of each of the three series
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
facet_grid() helps to generate a multi-panel plot. In our case, the AdBudget, GDP, and Sales series were plotted in different panels. This division facilitates pattern comparison within each series independently of the scales of the other series.
Once we remove facet_grid(), all series are plotted in the same graph region, overlaying them on a single coordinate system. This arrangement may help mto compare the series directly on the same axes. However, if the series have different scales or units, it can also make the plot more jumbled and more difficult to read.
#Remove facet_grid() from the previous plot
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The USgas package contains data on the demand for natural gas in the US.
The dataset offers data on the annual natural gas consumption in the United States from 1949 to 2019 at both the aggregate and state levels. There are three variables:
Year - an integer representing the observation year;
State - a character, the US state indicator;
y - an integer representing the combined annual natural gas consumption of all US states (in million cubic feet).
#Install and use USgas library
#install.packages('USgas')
library(USgas)
as_tsibble() function creates a time series tibble, which helps to manage and evaluate time series data in an organized manner.
index is the column in the dataset that represents the time index.
Key specifies the dataset’s grouping or panel identifier, enabling the handling of multiple time series at once.
#Transform to tsibble
usgas_series <- as_tsibble(us_total, index = year, key = state)
The graph shows the trends in natural gas consumption in the New England region by state up to 2019.
The two states with the highest energy consumption are Connecticut and Massachusetts; over time, Massachusetts has seen a notable increase in energy consumption, which may be related to increased industrial activity or population growth.
Maine and New Hampshire demonstrate stable and reduced levels of consumption, indicating a reduction in demand or effective use of alternative energy sources.
Vermont and Rhode Island use less natural gas than other states, which may be due to their smaller populations or higher reliance on renewable energy sources.
#Filter New England states
new_england <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
new_england_data <- usgas_series %>%
filter(state %in% new_england)
#Plot filtered data for the New England area
ggplot(new_england_data, aes(x = year, y = y, color = state)) +
geom_line() +
labs(title = "Natural Gas Consumption by State in New England area",
x = "Year",
y = "Natural Gas Consumption (million cubic feet)") +
facet_grid(state ~., scales = "free_y")
Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
The tourism tsibble from the tsibble package contains the quarterly overnight trips from 1998 Q1 to 2016 Q4 across Australia.
There are five variables and 23,408 rows:
Quarter: Quarter of the year (index);
Region: The tourism regions are formed through the aggregation of Statistical Local Areas (SLAs) which are defined by the various State and Territory tourism authorities according to their research and marketing needs;
State: Australian states and territories;
Goal: The purpose of the stopover visit (Holiday, Visiting friends and relatives, Business, Other reason);
Journeys: Thousands of overnight journeys.
#Load xlsx file
tourism_data <- readxl::read_excel("C:/Users/daria/Downloads/tourism.xlsx")
#Create tsibble
tourism_tsibble <- tourism_data %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
#Check the data
head(tourism_tsibble)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
#Compare tsibbles
all.equal(tourism_tsibble, tourism)
## [1] TRUE
Melbourne and Visiting had the highest average number of overnight trips. This means that, of all the regions and purposes examined, Melbourne was the most popular destination specifically for the purpose of visiting, making it the leading combination in terms of overnight trips on average.
#The average number of overnight trips for each combination of Region and Purpose
tourism_tsibble %>%
group_by(Region, Purpose) %>%
summarise(AverageTrips = mean(Trips), .groups = "drop") %>%
filter(AverageTrips == max(AverageTrips)) #Region and Purpose with the maximum number of overnight trips on average
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter AverageTrips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
This will generate a new tsibble with each row representing a state and its total trips, combining all purposes and regions.
tourism_tsibble_by_state <- tourism_tsibble %>%
group_by(State) %>%
summarise(TotalTrips = sum(Trips), .groups = "drop")
head(tourism_tsibble_by_state)
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter TotalTrips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
#“Total Private” Employed from us_employment
total_private <- us_employment %>%
filter(Title == "Total Private")
autoplot(total_private, Employed) +
ggtitle("“Total Private” Employed in the US: Trend and Seasonality")
gg_season(total_private, Employed) +
ggtitle("Seasonality in Private Employment in the US")
gg_subseries(total_private, Employed) +
ggtitle("Subseries Plot for Private Employment in the US")
gg_lag(total_private, Employed) +
ggtitle("Lag Plot for Private Employment in the US")
ACF(total_private, Employed) %>%
autoplot() +
ggtitle("ACF for Private Employment in the US")
ACF(total_private, Employed)
## # A tsibble: 29 x 3 [1M]
## # Key: Series_ID [1]
## Series_ID lag acf
## <chr> <cf_lag> <dbl>
## 1 CEU0500000001 1M 0.997
## 2 CEU0500000001 2M 0.993
## 3 CEU0500000001 3M 0.990
## 4 CEU0500000001 4M 0.986
## 5 CEU0500000001 5M 0.983
## 6 CEU0500000001 6M 0.980
## 7 CEU0500000001 7M 0.977
## 8 CEU0500000001 8M 0.974
## 9 CEU0500000001 9M 0.971
## 10 CEU0500000001 10M 0.968
## # ℹ 19 more rows
#Bricks from aus_production
autoplot(aus_production, Bricks) +
ggtitle("Bricks Production in Australia: Trend and Seasonality")
gg_season(aus_production, Bricks) +
ggtitle("Seasonality in Bricks Production")
gg_subseries(aus_production, Bricks) +
ggtitle("Subseries Plot for Bricks Production")
gg_lag(aus_production, Bricks) +
ggtitle("Lag Plot for Bricks Production")
ACF(aus_production, Bricks) %>%
autoplot() +
ggtitle("ACF for Bricks Production")
ACF(aus_production, Bricks)
## # A tsibble: 22 x 2 [1Q]
## lag acf
## <cf_lag> <dbl>
## 1 1Q 0.900
## 2 2Q 0.815
## 3 3Q 0.813
## 4 4Q 0.828
## 5 5Q 0.720
## 6 6Q 0.642
## 7 7Q 0.655
## 8 8Q 0.692
## 9 9Q 0.609
## 10 10Q 0.556
## # ℹ 12 more rows
#Hare from pelt
autoplot(pelt, Hare) +
ggtitle("Hare Pelts: Trend and Seasonality")
#gg_season(pelt, Hare) +
#ggtitle("Seasonality in Hare Pelts")
gg_subseries(pelt, Hare) +
ggtitle("Subseries Plot for Hare Pelts")
gg_lag(pelt, Hare) +
ggtitle("Lag Plot for Hare Pelts")
ACF(pelt, Hare) %>%
autoplot() +
ggtitle("ACF for Hare Pelts")
ACF(pelt, Hare)
## # A tsibble: 19 x 2 [1Y]
## lag acf
## <cf_lag> <dbl>
## 1 1Y 0.658
## 2 2Y 0.214
## 3 3Y -0.155
## 4 4Y -0.401
## 5 5Y -0.493
## 6 6Y -0.401
## 7 7Y -0.168
## 8 8Y 0.113
## 9 9Y 0.307
## 10 10Y 0.340
## 11 11Y 0.296
## 12 12Y 0.206
## 13 13Y 0.0372
## 14 14Y -0.153
## 15 15Y -0.285
## 16 16Y -0.295
## 17 17Y -0.202
## 18 18Y -0.0676
## 19 19Y 0.0956
#Filter HO2 cost
h02_cost <- PBS %>% filter(ATC2 == "H02")
#H02 Cost from PBS
autoplot(h02_cost, Cost) +
ggtitle("H02 Costs: Trend and Seasonality")
gg_season(h02_cost, Cost) +
ggtitle("Seasonality in H02 Costs")
gg_subseries(h02_cost, Cost) +
ggtitle("Subseries Plot for H02 Costs")
#gg_lag(h02_cost, Cost) +
#ggtitle("Lag Plot for H02 Costs")
ACF(h02_cost, Cost) %>%
autoplot() +
ggtitle("ACF for H02 Costs")
ACF(h02_cost, Cost)
## # A tsibble: 92 x 6 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Concession Type ATC1 ATC2 lag acf
## <chr> <chr> <chr> <chr> <cf_lag> <dbl>
## 1 Concessional Co-payments H H02 1M 0.834
## 2 Concessional Co-payments H H02 2M 0.679
## 3 Concessional Co-payments H H02 3M 0.514
## 4 Concessional Co-payments H H02 4M 0.352
## 5 Concessional Co-payments H H02 5M 0.264
## 6 Concessional Co-payments H H02 6M 0.219
## 7 Concessional Co-payments H H02 7M 0.253
## 8 Concessional Co-payments H H02 8M 0.337
## 9 Concessional Co-payments H H02 9M 0.464
## 10 Concessional Co-payments H H02 10M 0.574
## # ℹ 82 more rows
#Barrels from us_gasoline
autoplot(us_gasoline, Barrels) +
ggtitle("Gasoline Barrels in the US: Trend and Seasonality")
gg_season(us_gasoline, Barrels) +
ggtitle("Seasonality in Gasoline Barrels")
gg_subseries(us_gasoline, Barrels) +
ggtitle("Subseries Plot for Gasoline Barrels")
gg_lag(us_gasoline, Barrels) +
ggtitle("Lag Plot for Gasoline Barrels")
ACF(us_gasoline, Barrels) %>%
autoplot() +
ggtitle("ACF for Gasoline Barrels")
ACF(us_gasoline, Barrels)
## # A tsibble: 31 x 2 [1W]
## lag acf
## <cf_lag> <dbl>
## 1 1W 0.893
## 2 2W 0.882
## 3 3W 0.873
## 4 4W 0.866
## 5 5W 0.847
## 6 6W 0.844
## 7 7W 0.832
## 8 8W 0.831
## 9 9W 0.822
## 10 10W 0.808
## # ℹ 21 more rows
There is a discernible seasonal pattern for private employment, with higher employment in summer months—especially in June and July, and slight decreases in other months toward the end of the year.
Cyclical behavior is observable over extended periods of time, and it exhibits discernible declines during economic downturns or recessions (such as the decline in the early 1980s and the 2008 financial crisis both stand out). There has been a noticeable upward trend over the years, which is a result of a long-term growth in employment in the private sector.
There were notable declines in employment during the early 1980s recession and the 2008 financial crisis. These are notable years where employment did not follow the normal trend.
There are discernible cyclical peaks and troughs, probably as a result of demands from the construction sector.
In general, the trend indicates growth until the 1980s, when it experienced a period of instability and decline. Production then levels off and stabilizes.
The notable decline between 1980 and 1982 might point to a business-related or economic event that lowered output.
Strong cyclical patterns with abrupt increases and decreases over several-year timeframes are indicative of the hare population dynamics.
There isn’t a clear long-term trend, although cycles seem to be getting softer with time.
Unusual population fluctuations are indicated by the sharp peaks and valleys surrounding the 1860s and 1880s.
Although there are some minor cycles, seasonal behavior rather than longer-term cycles is what primarily drives the data.
Costs exhibit a steady upward trend, especially between 1995 and 2005.
There are no dramatic outliers, but the overall upward trend indicates that costs have been rising, with peaks becoming more noticeable over time, particularly after 2000.
The cyclical component is faint. Over time, there have been variations in gasoline demand that may be related to economic factors. Specifically, during times of economic expansion, this demand has increased.
There has been a discernible upward trend in gasoline barrels over time, which suggests the increasing demand. But as the trend goes on, it flattens out a little bit, which could be an indication of market saturation or changes in energy consumption.
There is a noticeable drop in gasoline barrels in 2008-2009, which is most likely due to the global financial crisis, which reduced travel and gasoline demand.
Each series typically has a long-term trend (either upward or downward), regular seasonal patterns, and, on occasion, cyclical behavior. We have also experienced volatility and unusual years that reflect major events such as economic downturns, crises, or natural disasters.