D624 Homework 1: Time series graphics

Exercise 2.10.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?

aus_production: quarterly
pelt: yearly
gafa_stock: daily (but trading days)
vic_elec: every thirty minutes

Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.

Exercise 2.10.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Section 2.1 ‘tsibble objects’ provides a model for how to approach this but I could only get the symbol with the highest value, however I was able to learn about the group_by function from Shana Green’s work referenced below.

Exercise 2.10.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

and here’s what the graphic looks like when we don’t include facet_grid()

Exercise 2.10.4

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

Exercise 2.10.5

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

The read excel file and the tsibble “tourism” look identical except for the format of the quarters, but I think that gets taken care of by converting the excel into a time series like we did in Exercise 3.

The Region and Purpose with the highest number of overnight trips on average was Sydney for Visiting. I learned how to do this problem after reviewing the work of Orli Khaimova referenced below.

## # A tibble: 1 × 2
##   Region Purpose 
##   <chr>  <chr>   
## 1 Sydney Visiting

I similarly had to borrow from Orli to create a new tsibble of total trips by state

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
## # Groups:    State @ Quarter [640]
##    Quarter State Total_Trips
##      <qtr> <chr>       <dbl>
##  1 1998 Q1 ACT          551.
##  2 1998 Q2 ACT          416.
##  3 1998 Q3 ACT          436.
##  4 1998 Q4 ACT          450.
##  5 1999 Q1 ACT          379.
##  6 1999 Q2 ACT          558.
##  7 1999 Q3 ACT          449.
##  8 1999 Q4 ACT          595.
##  9 2000 Q1 ACT          600.
## 10 2000 Q2 ACT          557.
## # ℹ 630 more rows

Exercise 2.10.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?

It looks as if Brick production has yearly seasonality, that there was a long trend of increasing brick production from the 60s to the 80s, after which there were a series of cycles of small booms and busts. It looks as though 1983 may have coincided with a financial crisis in Australia. There also may have been developments of new building material or popularity changes among existing building material that could have influenced brick production outside of macroeconomic building patterns.

gg_lag(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values (gg_lag).

in the gg_lag plot it doesn’t look like we’re seeing strong seasonality in the data. We’ll continue to compare this across the examples.

ACF(aus_production,Bricks) |>
  autoplot()

It looks like we have a decay pattern which could indicate a trend in the data. The dotted blue line is the 95% confidence interval; anything outside the blue area is statistically non zero. We can also be on the look out for sharp cutoffs which could indicate a good time series model with that number of lags. Spikes in the first few lags could indicate an AR model is appropriate.

From the Pelt time series data, we’re looking at the number of hare pelts traded per year and what we primarily notice are large swings between many and few pelts traded. I’m guessing these trends followed booms and busts in the rabbit population due to hunting. If in one year many pelts were harvested the next year there weren’t as many rabbits. If australia had hunting caps to manage the rabbit population then we might have seen more level trading records. However it could also have been due to price of a rabbit pelt leading hunters to focus on hares.

gg_lag(pelt,Hare)

The gg lag plot is fascinating but inscrutable.

ACF(pelt,Hare) |>
  autoplot()

Again, fascinating but inscrutable.

# gg_season and us_gasoline
data("us_gasoline")
gg_season(us_gasoline)

## Plot variable not specified, automatically selected `y = Barrels`

This was a perfect pairing. We can clearly see a year-over-year increase in millions of gasoline barrels supplied per day. In addition we can see a recurring annual seasonality where barrels supplied increases from January to early summer, stays high during summer, takes a dip after summer and comes back up around the holidays.

gg_lag(us_gasoline)

## Plot variable not specified, automatically selected `y = Barrels`

Again, it’s hard to see the seasonality in the lag plots. Perhaps we’d have to reduce the amount of information to look at segments of the total time series data at a time.

ACF(us_gasoline) |>
  autoplot()

## Response variable not specified, automatically selected `var = Barrels`

Looks like we have a strong trend with the slow decay in the ACF plot.

References

These exercises come from ‘Forecasting: Principles and Practice’ by Rob J Hyndman and George Athanasopoulos, 3rd Ed
https://otexts.com/fpp3/

I learned about using group_by() for exercise 2 from Shana Green’s work:
https://rpubs.com/sagreen131/1080464

I borrowed heavily from the work of Orli Khaimova to complete the end of the fifth problem: https://rpubs.com/OrliKhaim/DATA624_HW1

Code Appendix

### Code for Exercise 1

# Load Libraries
#install.packages('fpp3')
library(fpp3)
library(gridExtra)

# Load Data
data("aus_production")
data("pelt")
data("gafa_stock")
data("vic_elec")

# Find out about the data in each series
help(aus_production)
help(pelt)
help(gafa_stock)
help(vic_elec)

# Plot the four time series with different labels in last
plot1 <- autoplot(aus_production,Bricks)
plot2 <- autoplot(pelt,Lynx)
plot3 <- autoplot(gafa_stock,Close)
plot4 <- autoplot(vic_elec,Demand) +
  ggtitle("for Victoria, Australia") + 
  ylab("Electricity Demand in MWh") +
  xlab("Half-hourly")
grid.arrange(plot1, plot2, plot3, plot4, ncol=2)


### Code for Exercise 2

# Load Libraries
library(dplyr)

# Displaying the Symbol Date and Close after
# Filtering out, by Symbol,
# the date with the highest closing price
gafa_stock |>
  group_by(Symbol) |>
  dplyr::filter(Close == max(Close)) |>
  select(Symbol, Date, Close)


### Code for Exercise 3

# Reading the data directly into R
tute1 <- read.csv(url("https://bit.ly/fpptute1"))

# Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

# Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")


# reprinting the graph aboe without facet_grid()
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()


### Code for Exercise 4

# Load libraries and data
#install.packages("USgas")
library(USgas)

# Turn it into a tibble
mytimeseries <- us_total |>
  as_tsibble(key = state, index = year)

# Produce the Graphs
mytimeseries |>
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) |>
  ggplot(aes(x = year, y = y, colour = state)) +
  geom_line() +
  facet_grid(state ~ ., scales = "free_y") +
  ylab("") +
  xlab("") +
  theme(legend.position="none")


### Code for Exercise 5

# Loading libraries and data
library(readxl)
library(tsibble)

# Reading the data from the downloaded excel
tourismus <- read_excel("tourism.xlsx")

# Pulling the tourism tsibble for comparison
data("tourism")

# Convert the data to time series
tourismus <- tourismus |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(key = c(Region, State, Purpose), index = Quarter)


# finding the region and purpose with the highest number of overnight trips on average
tourismus |>
  group_by(Region, Purpose) |>
  mutate(Avg_Trips = mean(Trips)) |>
  ungroup() |>
  filter(Avg_Trips == max(Avg_Trips)) |>
  distinct(Region, Purpose)


# Creating a new tsibble of total trips by state
tourismus <- read_excel("tourism.xlsx")

tourismus %>%
  group_by(Quarter, State) %>%
  mutate(Quarter = yearquarter(Quarter),
         Total_Trips = sum(Trips)) %>%
  select(Quarter, State, Total_Trips) %>%
  distinct() %>%
  as_tsibble(index = Quarter,
             key = State)


# autoplot and Bricks
autoplot(aus_production,Bricks) +
  ggtitle("autoplot - Quarterly Production of Brick in Australia")


gg_lag(aus_production,Bricks)


ACF(aus_production,Bricks) |>
  autoplot()


# gg_subseries and Hare
gg_subseries(pelt,Hare) +
  ggtitle("subseries - Hare pelt trading records")


gg_lag(pelt,Hare)
ACF(pelt,Hare) |>
  autoplot()


# gg_season and us_gasoline
data("us_gasoline")
gg_season(us_gasoline)


gg_lag(us_gasoline)
ACF(us_gasoline) |>
  autoplot()