Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book.
Explore the following four time series: Bricks from
aus_production, Lynx from pelt,
Close from gafa_stock, Demand
from vic_elec.
suppressWarnings(library(fpp3))
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
suppressWarnings(library(ggplot2))
suppressWarnings(library(forecast))
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
Use ? (or help()) to find out about the data in each series.
?aus_production
?pelt
?gafa_stock
?vic_elec
What is the time interval of each series?
The aus_production data is reported on a quarterly
basis, focusing on specific indicators within Australia.
The pelt data spans an annual timeframe from 1845 to
1935.
The gafa_stock data is recorded on a daily basis,
covering trading days from 2014 to 2018.
The vic_elec data provides measurements of electricity
demand at half-hourly intervals.
Use autoplot() to produce a time plot of each
series.
aus_production_plot <- autoplot(aus_production) +
ggtitle("Australian Production of Bricks") +
xlab("Year") +
ylab("Production")
## Plot variable not specified, automatically selected `.vars = Beer`
pelt_plot <- autoplot(pelt) +
ggtitle("Lynx Trappings in Canada") +
xlab("Year") +
ylab("Trappings")
## Plot variable not specified, automatically selected `.vars = Hare`
gafa_stock_plot <- autoplot(gafa_stock) +
ggtitle("GAFA Stock Prices") +
xlab("Date") +
ylab("Stock Price")
## Plot variable not specified, automatically selected `.vars = Open`
vic_elec_plot <- autoplot(vic_elec) +
ggtitle("Electricity Demand in Victoria") +
xlab("Time (in hours)") +
ylab("Demand (in MW)")
## Plot variable not specified, automatically selected `.vars = Demand`
print(aus_production_plot)
print(pelt_plot)
print(gafa_stock_plot)
print(vic_elec_plot)
Use filter() to find what days corresponded to
the peak closing price for each of the four stocks in
gafa_stock.
library(dplyr)
peak_prices <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Date, Symbol, Close)
peak_prices
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Date Symbol Close
## <date> <chr> <dbl>
## 1 2018-10-03 AAPL 232.
## 2 2018-09-04 AMZN 2040.
## 3 2018-07-25 FB 218.
## 4 2018-07-26 GOOG 1268.
Download the file tute1.csv from the book website, open
it in Excel (or some other spreadsheet application), and review its
contents. You should find four columns of information. Columns B through
D each contain a quarterly series, labelled Sales, AdBudget and GDP.
Sales contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.
a. You can read the data into R with the following script:
tute1 <- read.csv('https://raw.githubusercontent.com/uplotnik/DATA-624/refs/heads/main/tute1.csv')
head(tute1)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
## 6 Jun-82 944.4 602.0 254.0
b. Convert the data to time series
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
c. Construct time series plots of each of the three series
autoplot(mytimeseries, facets=TRUE)
Check what happens when you don’t include facet_grid().
autoplot(mytimeseries)
When facet grid() is not utilized, all variables are
displayed on a single grid with a uniform scale. This combined plotting
does not enhance our analysis, and we can more clearly observe the
distinct trends of each variable when they are presented
individually.
The USgas package contains data on the demand for
natural gas in the US.
a. Install the USgas package.
#install.packages("USgas")
library(USgas)
## Warning: package 'USgas' was built under R version 4.3.3
library(tsibble)
b. Create a tsibble from us_total with year as
the index and state as the key.
us_tsibble <- as_tsibble(us_total, index = year, key = state)
c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
new_england_data <- us_tsibble %>% filter(state %in% new_england_states)
head(new_england_data)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Connecticut 144708
## 2 1998 Connecticut 131497
## 3 1999 Connecticut 152237
## 4 2000 Connecticut 159712
## 5 2001 Connecticut 146278
## 6 2002 Connecticut 177587
ggplot(new_england_data, aes(x = year, y = y, color = state)) +
geom_line() +
labs(title = "Annual Natural Gas Consumption by State in New England",
x = "Year",
y = "Natural Gas Consumption") +
theme_minimal()
a. Download tourism.xlsx from the book website
and read it into R using readxl::read_excel().
tourism_data <- readxl::read_excel('/Users/ulianaplotnikova/Desktop/DATA624/tourism.xlsx')
head(tourism_data)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
b. Create a tsibble which is identical to the
tourism tsibble from the tsibble package.
library(lubridate)
tourism_data$Quarter <- yearquarter(tourism_data$Quarter)
tourism_tsibble <- tourism_data %>%
as_tsibble(index = Quarter, key = c(Region, Purpose))
# Check for uniqueness in the new tsibble
any(duplicated(tourism_tsibble$Quarter))
## [1] TRUE
c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
# average overnight trips
average_trips <- tourism_tsibble %>%
group_by(Region, Purpose) %>%
summarise(average_overnight_trips = mean(Trips, na.rm = TRUE))
# maximum average trips
max_trips <- average_trips %>%
filter(average_overnight_trips == max(average_overnight_trips))
max_trips<-max_trips %>%
arrange(desc(average_overnight_trips))
head(max_trips,10)
## # A tsibble: 10 x 4 [1Q]
## # Key: Region, Purpose [10]
## # Groups: Region [10]
## Region Purpose Quarter average_overnight_trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
## 2 Sydney Business 2001 Q4 948.
## 3 South Coast Holiday 1998 Q1 915.
## 4 North Coast NSW Holiday 2016 Q1 906.
## 5 Brisbane Visiting 2016 Q4 796.
## 6 Gold Coast Holiday 2002 Q1 711.
## 7 Sunshine Coast Holiday 2005 Q1 617.
## 8 Australia's South West Holiday 2016 Q1 612.
## 9 Great Ocean Road Holiday 1998 Q1 548.
## 10 Experience Perth Visiting 2016 Q1 538.
d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
# total trips by state
total_trips_by_state <- tourism_tsibble %>%
group_by(State, Purpose) %>%
summarise(total_trips = sum(Trips, na.rm = TRUE)) %>%
as_tsibble()
ggplot(total_trips_by_state, aes(x = State, y = total_trips, fill = State)) +
geom_bar(stat = "identity", show.legend = FALSE) + # Disable legend if state color is unique
scale_fill_brewer(palette = "Set3") + # Using RColorBrewer for a nice color palette
theme_minimal() +
labs(title = "Total Trips by State", x = "State", y = "Total Trips") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Adjust x-axis text for better readability
Use the following graphics functions: autoplot(),
gg_season(), gg_subseries(),
gg_lag(), ACF() and explore features from the
following time series: “Total Private” Employed from us_employment,
Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and
Barrels from us_gasoline.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
“Total Private” Employed from us_employment
us_employment %>%filter(Title == 'Total Private') %>% autoplot(Employed) + labs(title = 'Total Private Employment')
us_employment %>%filter(Title == 'Total Private') %>% gg_season(Employed) + labs(title = "Seasonal Plot of Total Private Employment")
us_employment %>%filter(Title == 'Total Private') %>% gg_subseries(Employed) + labs(title = 'Subseries Plot of Total Private Employment')
us_employment %>%filter(Title == 'Total Private') %>% gg_lag(Employed) + labs(title = 'Lag Plot of Total Private Employment')
us_employment %>% filter(Title == "Total Private") %>%ACF(Employed) %>% autoplot()
The employment data reflects a predominantly positive trend throughout the analyzed time series period. There is no indication of seasonal variation within the data, as it displays similar characteristics across each month. A review of the lag and autocorrelation plots suggests that the overall data is largely driven by the prevailing macro trend, with each data point exhibiting a strong correlation to the previous points in the time series.
Bricks from aus_production
head(aus_production,4)
## # A tsibble: 4 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
autoplot(aus_production, Bricks) + ggtitle("Bricks Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(aus_production, Bricks) + ggtitle("Seasonal Plot of Bricks Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>% gg_subseries(Bricks) + labs(title = 'Subseries Plot of Bricks')
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>% gg_lag(Bricks, geom = "point") + labs(title = 'Lag Plot of Bricks Production')
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>% ACF(Bricks) %>%autoplot()
The data has demonstrated a strong upward trend until reaching its peak in 1980, after which it has been experiencing a monthly downward trend. In general, the data has shown seasonal fluctuations, with production peaking in the third quarter and reaching its lowest level in the first quarter. Moreover, the data appears to suggest the presence of cyclical trends.
Hare from pelt
head(pelt,4)
## # A tsibble: 4 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
autoplot(pelt, Hare) + ggtitle("Hare Pelts")
gg_subseries(pelt, Hare) + ggtitle("Subseries Plot of Hare Pelts")
pelt%>%gg_lag( Hare, geom = "point") + ggtitle("Lag Plot of Hare Pelts")
pelt %>%ACF(Hare) %>%autoplot()+labs(title = "ACF Plot of Hare Pelts")
While analyzing Hare from Pelt data, I encountered challenges in detecting seasonal trends because the data lacked segmentation into intra-year seasonal intervals (for instance, days, weeks, months, or quarters). However, despite the absence of clear overall trends, the data clearly demonstrates cyclical behavior. The ACF plot indicates that this cyclicality manifests over a ten-year timeframe, with peaks occurring every decade and troughs emerging five years thereafter, also on a ten-year cycle.
“H02” Cost from PBS
head(PBS,4)
## # A tsibble: 4 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-paymen… A Alimenta… A01 STOMATOL… 18228 67877
## 2 1991 Aug Concessional Co-paymen… A Alimenta… A01 STOMATOL… 15327 57011
## 3 1991 Sep Concessional Co-paymen… A Alimenta… A01 STOMATOL… 14775 55020
## 4 1991 Oct Concessional Co-paymen… A Alimenta… A01 STOMATOL… 15380 57222
PBS %>%filter(ATC2 == 'H02') %>% autoplot(Cost) + labs(title = "H02 Cost")
PBS %>%filter(ATC2 == 'H02') %>% gg_season(Cost) + labs(title = "Seasonal Plot of H02 Cost")
PBS %>%filter(ATC2 == 'H02') %>% gg_subseries(Cost) + labs(title = "Subseries Plot of H02 Cost")
PBS %>% filter(ATC2 == "H02") %>% ACF(Cost) %>% autoplot() +labs(title = "ACF Plot of H02 Cost")
Barrels from us_gasoline
head(us_gasoline,4)
## # A tsibble: 4 x 2 [1W]
## Week Barrels
## <week> <dbl>
## 1 1991 W06 6.62
## 2 1991 W07 6.43
## 3 1991 W08 6.58
## 4 1991 W09 7.22
autoplot(us_gasoline, Barrels) + ggtitle("Gasoline Barrels")
gg_season(us_gasoline, Barrels) + ggtitle("Seasonal Plot of Gasoline Barrels")
gg_subseries(us_gasoline, Barrels) + ggtitle("Subseries Plot Gasoline Barrels")
us_gasoline %>%gg_lag(Barrels, geom = "point")+labs(title = 'Lag Plot of Gasoline Barrels')
us_gasoline %>%ACF(Barrels) %>%autoplot()+labs(title = 'ACF Plot of Gasoline Barrels')
While analyzing gasoline barrels, it looks like the prevailing trend showed positive growth until the early 2000s, subsequently declining and then experiencing an upward shift. However, indications suggest that we are currently at the onset of a new decline.