Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Before working with the time series datasets, ensure you have the necessary packages installed and loaded.
# Install packages if not already installed
# install.packages("fpp3") # For time series data and visualization
# Load required libraries
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.2
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## Warning: package 'tsibble' was built under R version 4.4.2
## Warning: package 'tsibbledata' was built under R version 4.4.2
## Warning: package 'feasts' was built under R version 4.4.2
## Warning: package 'fabletools' was built under R version 4.4.2
## Warning: package 'fable' was built under R version 4.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
We will use ? or help() to find information about each dataset.
# Check details about each dataset
?aus_production # Information about "Bricks" series
## starting httpd help server ... done
?pelt # Information about "Lynx" series
?gafa_stock # Information about "Close" series
?vic_elec # Information about "Demand" series
Now, let’s visualize each time series.
# Plot Bricks Production
autoplot(aus_production, Bricks) +
ggtitle("Australian Brick Production") +
ylab("Millions of Units") +
xlab("Year")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
Analysis: * The trend shows variations in brick production over time. *
Some seasonal patterns may be visible, especially cyclic
fluctuations.
# Plot Lynx Pelt Data
autoplot(pelt, Lynx) +
ggtitle("Lynx Pelt Trading in Canada") +
ylab("Number of Pelts") +
xlab("Year")
Analysis: * The data exhibits a clear cyclic pattern, likely
corresponding to the predator-prey cycle. * There are periodic spikes in
lynx populations, followed by declines.
# Plot Closing Stock Prices
autoplot(gafa_stock, Close) +
ggtitle("Closing Stock Prices of GAFA Companies") +
ylab("Stock Price (USD)") +
xlab("Date")
Analysis: * The stock prices generally show an increasing trend over
time. * There might be periods of volatility, reflecting market
fluctuations.
# Customizing plot for electricity demand
autoplot(vic_elec, Demand) +
ggtitle("Electricity Demand in Victoria (Australia)") +
ylab("Electricity Demand (MW)") +
xlab("Year") +
theme_minimal()
Analysis: * Electricity demand fluctuates significantly, possibly
reflecting daily and seasonal consumption patterns. * Spikes may
correspond to extreme weather events (hot summers, cold winters).
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
Ensure you have the necessary packages installed and loaded.
# Install packages if not already installed
# install.packages("tidyverse")
# install.packages("fpp3") # Includes gafa_stock dataset
# Load required libraries
library(fpp3)
library(dplyr)
We will use the group_by() function to group the dataset by stock (Symbol) and then use filter() to extract the days corresponding to the peak closing price.
# Find the peak closing price for each stock
peak_prices <- gafa_stock %>%
group_by(Symbol) %>% # Group by stock symbol (Google, Apple, Facebook, Amazon)
filter(Close == max(Close)) %>% # Filter for the max closing price per stock
arrange(Symbol) # Arrange alphabetically for better readability
# Display the result
print(peak_prices)
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
#a. You can read the data into R with the following script:
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
#Convert the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
#Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Now, let’s remove facet_grid() and see the effect.
# Plot without faceting
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
labs(title = "Time Series of Sales, AdBudget, and GDP",
x = "Year",
y = "Value",
colour = "Variable") +
theme_minimal()
Comparison of Plots * With facet_grid(name ~ ., scales = “free_y”): *
Each variable is plotted in its own panel, making it easier to observe
individual trends. * The scales = “free_y” ensures each panel has an
appropriate y-axis scale. * Without facet_grid(): * All three series
appear on the same plot, sharing a common y-axis. * If the scales differ
greatly (e.g., GDP vs. AdBudget), some lines might appear flat and
unreadable.
The USgas package contains data on the demand for natural gas in the US.
# Install necessary packages if not installed
# install.packages("USgas")
# install.packages("tidyverse")
# install.packages("fpp3")
# Load libraries
library(USgas)
## Warning: package 'USgas' was built under R version 4.4.3
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(fpp3)
# Convert us_total to tsibble with correct column names
us_gas_tsibble <- us_total %>%
as_tsibble(index = year, key = state)
# Check unique states to confirm New England states exist
unique(us_gas_tsibble$state)
## [1] "Alabama" "Alaska"
## [3] "Arizona" "Arkansas"
## [5] "California" "Colorado"
## [7] "Connecticut" "Delaware"
## [9] "District of Columbia" "Federal Offshore -- Gulf of Mexico"
## [11] "Florida" "Georgia"
## [13] "Hawaii" "Idaho"
## [15] "Illinois" "Indiana"
## [17] "Iowa" "Kansas"
## [19] "Kentucky" "Louisiana"
## [21] "Maine" "Maryland"
## [23] "Massachusetts" "Michigan"
## [25] "Minnesota" "Mississippi"
## [27] "Missouri" "Montana"
## [29] "Nebraska" "Nevada"
## [31] "New Hampshire" "New Jersey"
## [33] "New Mexico" "New York"
## [35] "North Carolina" "North Dakota"
## [37] "Ohio" "Oklahoma"
## [39] "Oregon" "Pennsylvania"
## [41] "Rhode Island" "South Carolina"
## [43] "South Dakota" "Tennessee"
## [45] "Texas" "U.S."
## [47] "Utah" "Vermont"
## [49] "Virginia" "Washington"
## [51] "West Virginia" "Wisconsin"
## [53] "Wyoming"
# Filter for New England states
new_england_gas <- us_gas_tsibble %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire",
"Massachusetts", "Connecticut", "Rhode Island"))
# Ensure data exists
print(new_england_gas)
## # A tsibble: 138 x 3 [1Y]
## # Key: state [6]
## year state y
## <int> <chr> <int>
## 1 1997 Connecticut 144708
## 2 1998 Connecticut 131497
## 3 1999 Connecticut 152237
## 4 2000 Connecticut 159712
## 5 2001 Connecticut 146278
## 6 2002 Connecticut 177587
## 7 2003 Connecticut 154075
## 8 2004 Connecticut 162642
## 9 2005 Connecticut 168067
## 10 2006 Connecticut 172682
## # ℹ 128 more rows
# Plot with correct y-axis mapping
new_england_gas %>%
ggplot(aes(x = year, y = y, colour = state, group = state)) + # Added group = state
geom_line() +
labs(title = "Annual Natural Gas Consumption in New England States",
x = "Year",
y = "Consumption (Billion Cubic Feet)",
colour = "State") +
theme_minimal()
Analysis & Interpretation * This plot will show how natural gas
consumption has changed over time in each of the six New England states.
* Trends & Patterns: * Some states might show increasing usage over
time due to population growth and industrial demand. * Others might have
stable or declining trends due to energy efficiency improvements or a
shift to alternative energy sources.
Ensure that the required packages are installed and loaded.
# Load required libraries
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.3
library(tidyverse)
library(fpp3)
Download tourism.xlsx from the book website and place it in your working directory.
# Read the Excel file
tourism_data <- read_excel("tourism.xlsx")
# View the first few rows
head(tourism_data)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
We need to structure the dataset into a tsibble similar to the tourism tsibble in the fpp3 package.
# Convert to a tsibble with Quarter as the index
tourism_tsibble <- tourism_data %>%
mutate(Quarter = yearquarter(Quarter)) %>% # Ensure Quarter is in yearquarter format
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
# Check the structure
glimpse(tourism_tsibble)
## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
We calculate the average number of trips for each combination of Region and Purpose, then find the maximum.
# Find the combination with the highest average trips
max_trips <- tourism_tsibble %>%
group_by(Region, Purpose) %>% # Group by Region and Purpose
summarise(avg_trips = mean(Trips, na.rm = TRUE), .groups = "drop") %>% # Compute average trips
arrange(desc(avg_trips)) %>% # Sort in descending order
slice(1) # Select the top combination
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
# Display result
print(max_trips)
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter avg_trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
Now, we aggregate the total trips by State while combining all Regions and Purposes.
# Aggregate trips by State without grouping by Quarter
state_trips_tsibble <- tourism_tsibble %>%
index_by(Quarter) %>% # Ensure Quarter is treated correctly
summarise(Total_Trips = sum(Trips, na.rm = TRUE), .groups = "drop") %>%
as_tsibble(index = Quarter)
# View the new tsibble
glimpse(state_trips_tsibble)
## Rows: 80
## Columns: 2
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999…
## $ Total_Trips <dbl> 23182.20, 20323.38, 19826.64, 20830.13, 22087.35, 21458.37…
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Ensure the necessary libraries are installed and loaded.
# Install packages if not already installed
# install.packages("fpp3")
# install.packages("ggplot2")
# Load required libraries
library(fpp3)
library(ggplot2)
2.1 “Total Private” Employed from us_employment
# Load the dataset
data("us_employment")
# Filter for "Total Private" employment
total_private <- us_employment %>%
filter(Title == "Total Private")
# Plot using autoplot()
autoplot(total_private, Employed) +
ggtitle("Total Private Employment in the US") +
ylab("Number of Employees") +
xlab("Year")
# Seasonal plot
gg_season(total_private, Employed) +
ggtitle("Seasonal Plot of Total Private Employment")
# Subseries plot
gg_subseries(total_private, Employed) +
ggtitle("Subseries Plot of Total Private Employment")
# Lag plot
gg_lag(total_private, Employed, geom = "point") +
ggtitle("Lag Plot of Total Private Employment")
# ACF plot
total_private %>%
ACF(Employed) %>%
autoplot() +
ggtitle("ACF of Total Private Employment")
Observations: * Trend: The series shows a clear upward trend, indicating
growth in private employment over time. * Seasonality: The seasonal plot
reveals consistent patterns, likely tied to annual cycles (e.g., hiring
during certain months). * Cyclicity: No strong evidence of cyclical
behavior beyond seasonal effects. * Unusual Years: Check for dips during
economic recessions (e.g., 2008 financial crisis).
2.2 Bricks from aus_production
# Load the dataset
data("aus_production")
# Plot using autoplot()
autoplot(aus_production, Bricks) +
ggtitle("Australian Brick Production") +
ylab("Millions of Units") +
xlab("Year")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Seasonal plot
gg_season(aus_production, Bricks) +
ggtitle("Seasonal Plot of Australian Brick Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Subseries plot
gg_subseries(aus_production, Bricks) +
ggtitle("Subseries Plot of Australian Brick Production")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
# Lag plot
gg_lag(aus_production, Bricks, geom = "point") +
ggtitle("Lag Plot of Australian Brick Production")
## Warning: Removed 20 rows containing missing values (gg_lag).
# ACF plot
aus_production %>%
ACF(Bricks) %>%
autoplot() +
ggtitle("ACF of Australian Brick Production")
Observations:
2.3 Hare from pelt
# Load the dataset
data("pelt")
# Plot using autoplot()
autoplot(pelt, Hare) +
ggtitle("Hare Pelts Traded in Canada") +
ylab("Number of Pelts") +
xlab("Year")
# Seasonal plot (not applicable for yearly data)
# gg_season(pelt, Hare) # Not meaningful for yearly data
# Subseries plot (not applicable for yearly data)
# gg_subseries(pelt, Hare) # Not meaningful for yearly data
# Lag plot
gg_lag(pelt, Hare, geom = "point") +
ggtitle("Lag Plot of Hare Pelts")
# ACF plot
pelt %>%
ACF(Hare) %>%
autoplot() +
ggtitle("ACF of Hare Pelts")
Observations: * Trend: The series shows cyclical behavior, likely tied
to predator-prey dynamics. * Seasonality: Not applicable (yearly data).
* Cyclicity: Strong cyclical patterns, consistent with ecological
cycles. * Unusual Years: Look for years with extreme values, possibly
due to environmental factors.
2.4 “H02” Cost from PBS
# Load the dataset
data("PBS")
# Filter for "H02" cost
h02_cost <- PBS %>%
filter(ATC2 == "H02")
# Inspect the structure of h02_cost to identify column names
glimpse(h02_cost)
## Rows: 816
## Columns: 9
## Key: Concession, Type, ATC1, ATC2 [4]
## $ Month <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991 Nov, 1991 Dec,…
## $ Concession <chr> "Concessional", "Concessional", "Concessional", "Concession…
## $ Type <chr> "Co-payments", "Co-payments", "Co-payments", "Co-payments",…
## $ ATC1 <chr> "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H",…
## $ ATC1_desc <chr> "Systemic hormonal preparations, excl. sex hormones and ins…
## $ ATC2 <chr> "H02", "H02", "H02", "H02", "H02", "H02", "H02", "H02", "H0…
## $ ATC2_desc <chr> "CORTICOSTEROIDS FOR SYSTEMIC USE", "CORTICOSTEROIDS FOR SY…
## $ Scripts <dbl> 63261, 53528, 52822, 54016, 49281, 51798, 42436, 52913, 629…
## $ Cost <dbl> 317384.0, 269891.0, 269703.0, 280418.0, 268070.0, 277139.0,…
# Plot using autoplot()
autoplot(h02_cost, Cost) +
ggtitle("H02 Cost in Australia") +
ylab("Cost") +
xlab("Year")
# Seasonal plot
gg_season(h02_cost, Cost) +
ggtitle("Seasonal Plot of H02 Cost")
# Subseries plot
gg_subseries(h02_cost, Cost) +
ggtitle("Subseries Plot of H02 Cost")
# Filter for a single time series
# Based on the structure, let's filter by Concession and Type
h02_single_series <- h02_cost %>%
filter(Concession == "Concessional", Type == "Co-payments")
# Check the filtered data
print(h02_single_series)
## # A tsibble: 204 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-paym… H Systemic… H02 CORTICOS… 63261 317384
## 2 1991 Aug Concessional Co-paym… H Systemic… H02 CORTICOS… 53528 269891
## 3 1991 Sep Concessional Co-paym… H Systemic… H02 CORTICOS… 52822 269703
## 4 1991 Oct Concessional Co-paym… H Systemic… H02 CORTICOS… 54016 280418
## 5 1991 Nov Concessional Co-paym… H Systemic… H02 CORTICOS… 49281 268070
## 6 1991 Dec Concessional Co-paym… H Systemic… H02 CORTICOS… 51798 277139
## 7 1992 Jan Concessional Co-paym… H Systemic… H02 CORTICOS… 42436 221772
## 8 1992 Feb Concessional Co-paym… H Systemic… H02 CORTICOS… 52913 272345
## 9 1992 Mar Concessional Co-paym… H Systemic… H02 CORTICOS… 62908 325700
## 10 1992 Apr Concessional Co-paym… H Systemic… H02 CORTICOS… 68499 349271
## # ℹ 194 more rows
# Lag plot for the filtered series
gg_lag(h02_single_series, Cost, geom = "point") +
ggtitle("Lag Plot of H02 Cost (Filtered Series)") +
xlab("Lag") +
ylab("Cost")
# ACF plot
h02_cost %>%
ACF(Cost) %>%
autoplot() +
ggtitle("ACF of H02 Cost")
Observations: * Trend: The series may show an increasing trend, possibly
due to inflation or increased usage. * Seasonality: Strong seasonal
patterns, likely tied to healthcare usage (e.g., higher costs in
winter). * Cyclicity: No strong evidence of cyclical behavior beyond
seasonal effects. * Unusual Years: Look for years with significant
changes, possibly due to policy changes.
2.5 Barrels from us_gasoline
# Load the dataset
data("us_gasoline")
# Plot using autoplot()
autoplot(us_gasoline, Barrels) +
ggtitle("US Gasoline Production") +
ylab("Barrels") +
xlab("Year")
# Seasonal plot
gg_season(us_gasoline, Barrels) +
ggtitle("Seasonal Plot of US Gasoline Production")
# Subseries plot
gg_subseries(us_gasoline, Barrels) +
ggtitle("Subseries Plot of US Gasoline Production")
# Lag plot
gg_lag(us_gasoline, Barrels, geom = "point") +
ggtitle("Lag Plot of US Gasoline Production")
# ACF plot
us_gasoline %>%
ACF(Barrels) %>%
autoplot() +
ggtitle("ACF of US Gasoline Production")
Observations: * Trend: The series shows an increasing trend, reflecting growing gasoline production. * Seasonality: Strong seasonal patterns, likely tied to driving seasons (e.g., higher production in summer). * Cyclicity: No strong evidence of cyclical behavior beyond seasonal effects. * Unusual Years: Look for years with significant drops, possibly due to economic downturns or supply disruptions.