DATA624

Import Libraries

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tsibble)

## Warning: package 'tsibble' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## 
## Attaching package: 'tsibble'

## The following object is masked from 'package:lubridate':
## 
##     interval

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1     ✔ tibble  3.2.1
## ✔ purrr   1.0.2     ✔ tidyr   1.3.1
## ✔ readr   2.1.5

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tsibbledata 0.4.1     ✔ fable       0.3.4
## ✔ feasts      0.3.2     ✔ fabletools  0.4.2

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

data("aus_production")
?aus_production

## starting httpd help server ... done

data("pelt")
?pelt

data("gafa_stock")
?gafa_stock

data("vic_elec")
?vic_elec

aus_production %>% 
  autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

pelt %>% 
  autoplot(Lynx)

gafa_stock %>% 
  autoplot(Close)

vic_elec %>% 
  autoplot(Demand)

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% group_by(Symbol) %>%
  filter(Close==max(Close)) %>%
  select(Symbol,Date,Close)

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

2.4

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

#install.packages('USgas')
library(USgas)

## Warning: package 'USgas' was built under R version 4.3.3

data("us_total")
str(us_total)

## 'data.frame':    1266 obs. of  3 variables:
##  $ year : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ state: chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ y    : int  324158 329134 337270 353614 332693 379343 350345 382367 353156 391093 ...

us_total <- us_total %>%
  rename(natural_gas_consumption_mcf = y)
us_total_tsibble <- us_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  as_tsibble(key = state, index = year)
us_total_tsibble

# Plot of annual natural gas consumption
us_total_tsibble %>% autoplot(natural_gas_consumption_mcf)

2.5

Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). Create a tsibble which is identical to the tourism tsibble from the tsibble package. Find what combination of Region and Purpose had the maximum number of overnight trips on average. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_data <- readxl::read_excel("tourism.xlsx")
View(tourism_data)

# Converting data type from "Quarter" to Date and "Trips" to numeric
tourism_data <- tourism_data %>%
  mutate(Quarter = as.Date(Quarter),
         Trips = as.numeric(Trips))

# Creating a tsibble identical to the tourism one
tsib_df <- as_tsibble(tourism_data, key = c(Region, State, Purpose), index = Quarter)


# Combination of Region and Purpose with the maximum number of overnight trips on average
max_avg_trips <- tourism_data %>%
  group_by(Region, Purpose) %>%
  summarise(avg_trips = mean(Trips)) %>%
  arrange(desc(avg_trips))

## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.

head(max_avg_trips)

# Tsibble for Total Trips by State
total_trips_by_state <- tourism_data %>%
  group_by(State) %>%
  summarise(total_trips = sum(Trips)) %>%
  arrange(desc(total_trips))

head(total_trips_by_state)

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

data("PBS")
data("us_employment")
data("us_gasoline")

Employed

us_employment %>% 
  filter(Title == "Total Private") %>% 
  autoplot(Employed) + 
  ggtitle("Autoplot")

us_employment %>%filter(Title == "Total Private") %>% gg_season(Employed) +
  ggtitle("Seasonal Decomposition")

us_employment %>% 
  filter(Title == "Total Private") %>% 
  gg_subseries(Employed) +
  ggtitle("Subseries Plot")

us_employment %>% 
  filter(Title == "Total Private") %>% 
  gg_lag(Employed) +
  ggtitle("Lag Plot")

us_employment %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot() +
  ggtitle("Autocorrelation Function")

The US Employment dataset reveals a steady increase in Total Private employment over the years, with a significant dip around 2008, likely coinciding with the housing market crash. A clear seasonal pattern emerges, with employment rising in the first half of the year, followed by a dip, and then a subsequent increase. The lag plot highlights a strong positive correlation across all lags. I think the seasonal decomposition could be improved by adjusting for population growth, which would offer a more nuanced view of employment trends.

Bricks

aus_production %>% 
  autoplot(Bricks) +
  ggtitle("Autoplot")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_season(Bricks) +
  ggtitle("Seasonal Decomposition")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_subseries(Bricks) +
  ggtitle("Subseries Plot")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_lag(Bricks) +
  ggtitle("Lag Plot")

## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>% 
  ACF(Bricks) %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")

In the AUS Production dataset, brick production does not exhibit a clear long-term trend, but it displays strong seasonality, following an annual cyclical pattern. A notable drop in brick production is evident in the early 1980s. The seasonal plot shows a moderate increase in production during the first and third quarters, with a decline in the fourth quarter. The lag plot reveals a stable positive correlation from season to season.

Pelt

pelt %>% 
  autoplot(Hare) +
  ggtitle("Autoplot")

pelt %>% 
  gg_subseries(Hare)+
  ggtitle("Subseries Plot")

pelt %>% 
  gg_lag(Hare) +
  ggtitle("Lag Plot")

pelt %>% 
  ACF(Hare) %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")

In the Pelts dataset for Hare, there is no clear trend, but a possible seasonal pattern with some cyclical fluctuations can be observed. The number of traded Hare pelts shows sharp variations over several years, with a general decline toward the end of the decade. The lag plot indicates a moderate positive correlation, especially at lag 1.

PBS

PBS %>% 
  filter(ATC2 == "H02")  %>% 
  autoplot(Cost) + 
  ggtitle("Autoplot")

In the PBS dataset, I had some trouble generating plots using functions other than autoplot. When examining each concession for H02, the autoplot suggests a clear seasonal pattern, with significant fluctuations throughout the year, repeating annually. This seasonality is evident across all concession types, suggesting it might be a common business trend for the company. The peaks toward the end of the year could indicate a possible pattern of fiscal year-end loading.

Gasoline

us_gasoline %>% 
  autoplot() + 
  ggtitle("Autoplot")

## Plot variable not specified, automatically selected `.vars = Barrels`

us_gasoline %>% 
  gg_season() +
  ggtitle("Seasonal Decomposition")

## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  gg_subseries()+
  ggtitle("Subseries Plot")

## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  gg_lag() +
  ggtitle("Lag Plot")

## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  ACF() %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")

## Response variable not specified, automatically selected `var = Barrels`

The Gasoline Barrels series shows an overall upward trend over time, accompanied by some seasonality, though the data appears quite noisy. Despite the noise, there are noticeable peaks and declines at specific times each month. The lag plot reveals a positive correlation, although there is some overplotting. Overall, no particularly unusual years stand out in the data, likely due to the overplotting effect.

DATA624_HW1

2024-09-08

Import Libraries

2.1

2.2

2.3

2.4

2.5

2.8

Employed

Bricks

Pelt

PBS

Gasoline