Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book.

Exercise 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

suppressWarnings(library(fpp3))
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
suppressWarnings(library(ggplot2))
suppressWarnings(library(forecast))
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Use ? (or help()) to find out about the data in each series.

?aus_production
?pelt
?gafa_stock
?vic_elec

What is the time interval of each series?

The aus_production data is reported on a quarterly basis, focusing on specific indicators within Australia.

The pelt data spans an annual timeframe from 1845 to 1935.

The gafa_stock data is recorded on a daily basis, covering trading days from 2014 to 2018.

The vic_elec data provides measurements of electricity demand at half-hourly intervals.

Use autoplot() to produce a time plot of each series.

aus_production_plot <- autoplot(aus_production) + 
  ggtitle("Australian Production of Bricks") + 
  xlab("Year") + 
  ylab("Production")
## Plot variable not specified, automatically selected `.vars = Beer`
pelt_plot <- autoplot(pelt) + 
  ggtitle("Lynx Trappings in Canada") + 
  xlab("Year") + 
  ylab("Trappings")
## Plot variable not specified, automatically selected `.vars = Hare`
gafa_stock_plot <- autoplot(gafa_stock) + 
  ggtitle("GAFA Stock Prices") + 
  xlab("Date") + 
  ylab("Stock Price")
## Plot variable not specified, automatically selected `.vars = Open`
vic_elec_plot <- autoplot(vic_elec) + 
  ggtitle("Electricity Demand in Victoria") + 
  xlab("Time (in hours)") + 
  ylab("Demand (in MW)")
## Plot variable not specified, automatically selected `.vars = Demand`
print(aus_production_plot)

print(pelt_plot)

print(gafa_stock_plot)

print(vic_elec_plot)

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

library(dplyr)
peak_prices <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Date, Symbol, Close)
peak_prices
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Date       Symbol Close
##   <date>     <chr>  <dbl>
## 1 2018-10-03 AAPL    232.
## 2 2018-09-04 AMZN   2040.
## 3 2018-07-25 FB      218.
## 4 2018-07-26 GOOG   1268.

Exercise 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a. You can read the data into R with the following script:

tute1 <- read.csv('https://raw.githubusercontent.com/uplotnik/DATA-624/refs/heads/main/tute1.csv')
head(tute1)
##        X  Sales AdBudget   GDP
## 1 Mar-81 1020.2    659.2 251.8
## 2 Jun-81  889.2    589.0 290.9
## 3 Sep-81  795.0    512.5 290.8
## 4 Dec-81 1003.9    614.1 292.4
## 5 Mar-82 1057.7    647.2 279.1
## 6 Jun-82  944.4    602.0 254.0

b. Convert the data to time series

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

c. Construct time series plots of each of the three series

autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facet_grid().

autoplot(mytimeseries)

When facet grid() is not utilized, all variables are displayed on a single grid with a uniform scale. This combined plotting does not enhance our analysis, and we can more clearly observe the distinct trends of each variable when they are presented individually.

Exercise 2.4

The USgas package contains data on the demand for natural gas in the US.

a. Install the USgas package.

#install.packages("USgas")
library(USgas)
## Warning: package 'USgas' was built under R version 4.3.3
library(tsibble)

b. Create a tsibble from us_total with year as the index and state as the key.

us_tsibble <- as_tsibble(us_total, index = year, key = state)

c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
new_england_data <- us_tsibble %>% filter(state %in% new_england_states)
head(new_england_data)
## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state            y
##   <int> <chr>        <int>
## 1  1997 Connecticut 144708
## 2  1998 Connecticut 131497
## 3  1999 Connecticut 152237
## 4  2000 Connecticut 159712
## 5  2001 Connecticut 146278
## 6  2002 Connecticut 177587
ggplot(new_england_data, aes(x = year, y = y, color = state)) +
  geom_line() +
  labs(title = "Annual Natural Gas Consumption by State in New England",
       x = "Year",
       y = "Natural Gas Consumption") +
  theme_minimal()

Exercise 2.5

a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism_data <- readxl::read_excel('/Users/ulianaplotnikova/Desktop/DATA624/tourism.xlsx')
head(tourism_data)
## # A tibble: 6 × 5
##   Quarter    Region   State           Purpose  Trips
##   <chr>      <chr>    <chr>           <chr>    <dbl>
## 1 1998-01-01 Adelaide South Australia Business  135.
## 2 1998-04-01 Adelaide South Australia Business  110.
## 3 1998-07-01 Adelaide South Australia Business  166.
## 4 1998-10-01 Adelaide South Australia Business  127.
## 5 1999-01-01 Adelaide South Australia Business  137.
## 6 1999-04-01 Adelaide South Australia Business  200.

b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

library(lubridate)

tourism_data$Quarter <- yearquarter(tourism_data$Quarter)
tourism_tsibble <- tourism_data %>%
  as_tsibble(index = Quarter, key = c(Region, Purpose))
# Check for uniqueness in the new tsibble
any(duplicated(tourism_tsibble$Quarter))
## [1] TRUE

c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

# average overnight trips
average_trips <- tourism_tsibble %>%
  group_by(Region, Purpose) %>%
  summarise(average_overnight_trips = mean(Trips, na.rm = TRUE))

# maximum average trips
max_trips <- average_trips %>%
  filter(average_overnight_trips == max(average_overnight_trips))

max_trips<-max_trips %>%
  arrange(desc(average_overnight_trips)) 
head(max_trips,10)
## # A tsibble: 10 x 4 [1Q]
## # Key:       Region, Purpose [10]
## # Groups:    Region [10]
##    Region                 Purpose  Quarter average_overnight_trips
##    <chr>                  <chr>      <qtr>                   <dbl>
##  1 Melbourne              Visiting 2017 Q4                    985.
##  2 Sydney                 Business 2001 Q4                    948.
##  3 South Coast            Holiday  1998 Q1                    915.
##  4 North Coast NSW        Holiday  2016 Q1                    906.
##  5 Brisbane               Visiting 2016 Q4                    796.
##  6 Gold Coast             Holiday  2002 Q1                    711.
##  7 Sunshine Coast         Holiday  2005 Q1                    617.
##  8 Australia's South West Holiday  2016 Q1                    612.
##  9 Great Ocean Road       Holiday  1998 Q1                    548.
## 10 Experience Perth       Visiting 2016 Q1                    538.

d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

# total trips by state
total_trips_by_state <- tourism_tsibble %>%
  group_by(State, Purpose) %>%
  summarise(total_trips = sum(Trips, na.rm = TRUE)) %>%
  as_tsibble()
ggplot(total_trips_by_state, aes(x = State, y = total_trips, fill = State)) +
  geom_bar(stat = "identity", show.legend = FALSE) +  # Disable legend if state color is unique
  scale_fill_brewer(palette = "Set3") +  # Using RColorBrewer for a nice color palette
  theme_minimal() +
  labs(title = "Total Trips by State", x = "State", y = "Total Trips") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Adjust x-axis text for better readability

Exercise 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

“Total Private” Employed from us_employment

us_employment %>%filter(Title == 'Total Private') %>% autoplot(Employed) + labs(title = 'Total Private Employment')

us_employment %>%filter(Title == 'Total Private') %>% gg_season(Employed) + labs(title = "Seasonal Plot of Total Private Employment")

us_employment %>%filter(Title == 'Total Private') %>% gg_subseries(Employed) + labs(title = 'Subseries Plot of Total Private Employment')

us_employment %>%filter(Title == 'Total Private') %>% gg_lag(Employed) + labs(title = 'Lag Plot of Total Private Employment')

us_employment %>% filter(Title == "Total Private") %>%ACF(Employed) %>% autoplot()

The employment data reflects a predominantly positive trend throughout the analyzed time series period. There is no indication of seasonal variation within the data, as it displays similar characteristics across each month. A review of the lag and autocorrelation plots suggests that the overall data is largely driven by the prevailing macro trend, with each data point exhibiting a strong correlation to the previous points in the time series.

Bricks from aus_production

head(aus_production,4)
## # A tsibble: 4 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
autoplot(aus_production, Bricks) + ggtitle("Bricks Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks) + ggtitle("Seasonal Plot of Bricks Production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production  %>% gg_subseries(Bricks) + labs(title = 'Subseries Plot of Bricks')
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production  %>% gg_lag(Bricks, geom = "point") + labs(title = 'Lag Plot of Bricks Production')
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>% ACF(Bricks) %>%autoplot()

The data has demonstrated a strong upward trend until reaching its peak in 1980, after which it has been experiencing a monthly downward trend. In general, the data has shown seasonal fluctuations, with production peaking in the third quarter and reaching its lowest level in the first quarter. Moreover, the data appears to suggest the presence of cyclical trends.

Hare from pelt

head(pelt,4)
## # A tsibble: 4 x 3 [1Y]
##    Year  Hare  Lynx
##   <dbl> <dbl> <dbl>
## 1  1845 19580 30090
## 2  1846 19600 45150
## 3  1847 19610 49150
## 4  1848 11990 39520
autoplot(pelt, Hare) + ggtitle("Hare Pelts")

gg_subseries(pelt, Hare) + ggtitle("Subseries Plot of Hare Pelts")

pelt%>%gg_lag( Hare, geom = "point") + ggtitle("Lag Plot of Hare Pelts")

pelt %>%ACF(Hare) %>%autoplot()+labs(title = "ACF Plot of Hare Pelts")

While analyzing Hare from Pelt data, I encountered challenges in detecting seasonal trends because the data lacked segmentation into intra-year seasonal intervals (for instance, days, weeks, months, or quarters). However, despite the absence of clear overall trends, the data clearly demonstrates cyclical behavior. The ACF plot indicates that this cyclicality manifests over a ten-year timeframe, with peaks occurring every decade and troughs emerging five years thereafter, also on a ten-year cycle.

“H02” Cost from PBS

head(PBS,4)
## # A tsibble: 4 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [1]
##      Month Concession   Type       ATC1  ATC1_desc ATC2  ATC2_desc Scripts  Cost
##      <mth> <chr>        <chr>      <chr> <chr>     <chr> <chr>       <dbl> <dbl>
## 1 1991 Jul Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   18228 67877
## 2 1991 Aug Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   15327 57011
## 3 1991 Sep Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   14775 55020
## 4 1991 Oct Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   15380 57222
PBS  %>%filter(ATC2 == 'H02')  %>% autoplot(Cost) + labs(title = "H02 Cost")

PBS  %>%filter(ATC2 == 'H02')  %>% gg_season(Cost) + labs(title = "Seasonal Plot of H02 Cost")

PBS  %>%filter(ATC2 == 'H02')  %>% gg_subseries(Cost) + labs(title = "Subseries Plot of H02 Cost")

PBS %>% filter(ATC2 == "H02") %>% ACF(Cost) %>% autoplot() +labs(title = "ACF Plot of H02 Cost")

Barrels from us_gasoline

head(us_gasoline,4)
## # A tsibble: 4 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
autoplot(us_gasoline, Barrels) + ggtitle("Gasoline Barrels")

gg_season(us_gasoline, Barrels) + ggtitle("Seasonal Plot of Gasoline Barrels")

gg_subseries(us_gasoline, Barrels) + ggtitle("Subseries Plot Gasoline Barrels")

us_gasoline %>%gg_lag(Barrels, geom = "point")+labs(title = 'Lag Plot of Gasoline Barrels')

us_gasoline %>%ACF(Barrels) %>%autoplot()+labs(title = 'ACF Plot of Gasoline Barrels')

While analyzing gasoline barrels, it looks like the prevailing trend showed positive growth until the early 2000s, subsequently declining and then experiencing an upward shift. However, indications suggest that we are currently at the onset of a new decline.