library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.2     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.2
## ✔ lubridate   1.9.2     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'tsibble' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(USgas)

1.

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.

?aus_production
?pelt
?gafa_stock
?vic_elec

What is the time interval of each series?

aus_production -> quarters pelt -> years gafa_stock -> days vic_elec -> half-hours

Use autoplot() to produce a time plot of each series.

autoplot(aus_production, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

autoplot(pelt, Lynx)

autoplot(gafa_stock, Close)

For the last plot, modify the axis labels and title.

autoplot(vic_elec, Demand) +
  labs(title = "Electricity demand for Victoria, Australia",
       x = "Time (recorded half-hourly)",
       y = "Electricity demand (MWh)")

2.

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

day_max_aapl_close <- gafa_stock %>% 
  filter(Symbol == "AAPL") %>%
  filter(Close == max(Close)) %>%
  pull(Date)
print(paste("The day on which AAPL had its peak closing price was", day_max_aapl_close))

## [1] "The day on which AAPL had its peak closing price was 2018-10-03"

day_max_amazon_close <- gafa_stock %>% 
  filter(Symbol == "AMZN") %>%
  filter(Close == max(Close)) %>%
  pull(Date)
print(paste("The day on which Amazon had its peak closing price was", day_max_amazon_close))

## [1] "The day on which Amazon had its peak closing price was 2018-09-04"

day_max_fb_close <- gafa_stock %>% 
  filter(Symbol == "FB") %>%
  filter(Close == max(Close)) %>%
  pull(Date)
print(paste("The day on which Facebook had its peak closing price was", day_max_fb_close))

## [1] "The day on which Facebook had its peak closing price was 2018-07-25"

day_max_google_close <- gafa_stock %>% 
  filter(Symbol == "GOOG") %>%
  filter(Close == max(Close)) %>%
  pull(Date)
print(paste("The day on which Google had its peak closing price was", day_max_google_close))

## [1] "The day on which Google had its peak closing price was 2018-07-26"

3.

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

When not using facet_grid(), the three time series are positioned on one plot with one above the other on the same y-axis, but with it they are three different plots with separate y-axes (but the same x-axis) stacked on top of each other.

4.

The USgas package contains data on the demand for natural gas in the US.

a.) Install the USgas package. b.) Create a tsibble from us_total with year as the index and state as the key.

us_total <- us_total %>%
  as_tsibble(key = state,
            index = year)

c.) Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

new_england = c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")

us_total %>%
  filter(state %in% new_england) %>%
  autoplot(y) +
    labs(title = "New England Annual Gas Consumption",
       x = "Year",
       y = "Natural Gas Consumption (million cubic feet)")

5.

a.) Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism_df <- readxl::read_excel("tourism.xlsx")

b.) Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_tsibble <- tourism_df %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)

c.) Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_tsibble %>%
  as_tibble() %>%
  select(-Quarter) %>%
  group_by(Region, Purpose) %>%
  summarise(average_trips = mean(Trips)) %>%
  arrange(desc(average_trips))

## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.

## # A tibble: 304 × 3
## # Groups:   Region [76]
##    Region          Purpose  average_trips
##    <chr>           <chr>            <dbl>
##  1 Sydney          Visiting          747.
##  2 Melbourne       Visiting          619.
##  3 Sydney          Business          602.
##  4 North Coast NSW Holiday           588.
##  5 Sydney          Holiday           550.
##  6 Gold Coast      Holiday           528.
##  7 Melbourne       Holiday           507.
##  8 South Coast     Holiday           495.
##  9 Brisbane        Visiting          493.
## 10 Melbourne       Business          478.
## # ℹ 294 more rows

Visiting Sydney had the greatest number of overnight trips on average.

d.) Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

total_trips_by_state <- tourism_tsibble %>%
  group_by(State) %>%
  summarise(total_trips = sum(Trips))

head(total_trips_by_state)

## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter total_trips
##   <chr>   <qtr>       <dbl>
## 1 ACT   1998 Q1        551.
## 2 ACT   1998 Q2        416.
## 3 ACT   1998 Q3        436.
## 4 ACT   1998 Q4        450.
## 5 ACT   1999 Q1        379.
## 6 ACT   1999 Q2        558.

8.

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

Total Private Employed from us_employment

total_private <- us_employment %>%
  filter(Title == 'Total Private')

autoplot(total_private, Employed)

gg_season(total_private, Employed, labels = "both")

gg_subseries(total_private, Employed)

ACF(total_private, Employed) %>%
  autoplot()

The total employed in the private sector shows an increasing trend over time, which we can see clearly in the time series. The seasonal plot reveals that there is no seasonality, because each each year shows roughly constant employment throughout the months. The subseries likewise shows both a lack of seasonality, because the plot for each month looks roughly the same, and an upward trend. The correlogram is decreasing as the lag increases because of the trend: data points closer together in time tend to have more similar values when there is a clear trend, and data points further away tend to have less similar values.

Bricks from aus_production

autoplot(aus_production, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(aus_production, Bricks, labels = "both")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

gg_subseries(aus_production, Bricks)

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

ACF(aus_production, Bricks) %>%
  autoplot()

Clay brick production showed an increasing trend from 1960-1980, and a decreasing trend from 1980-2000. The seasonal plot reveals that production had a peak in Q3 and a trough in Q1. The subseries also shows that Q3 had the most production and Q1 had the least. Additionally, the subseries plot also shows the increasing trend from 1960-1980 and decreasing trend from 1980-2000. The correlogram shows peaks whenever the lag is a multiple of 4 because of the seasonality of the time series; data from the same quarter are more correlated. The correlogram is decreasing as the lag increases because of the trend.

Hare from pelt

autoplot(pelt, Hare)

#gg_season(pelt, Hare, period = year, labels = "both")

#gg_subseries(pelt, Hare)

ACF(pelt, Hare) %>%
  autoplot()

The time series for the number of Snowshoe Hare pelts traded is cyclic in nature. It does not show a trend. We cannot create seasonal plots for the time series because the time interval of the data is a year. The time series is likely cyclic because trading reflects the business cycle, and may also reflect fashion trends, which are known to repeat. The correlogram likewise shows that the data is cyclic because as the lag increases, the correlation increases and decreases, forming clear peaks and troughs (which could almost be approximated by a cosine function.) The data is not merely white noise, because the autocorrelation coefficients are significant roughly every five years.

H02 Cost from PBS

h02_atc2 <- PBS %>%
  filter(ATC2 == 'H02')

autoplot(h02_atc2, Cost)

gg_season(h02_atc2, Cost, labels = "both")

gg_subseries(h02_atc2, Cost)

ACF(h02_atc2, Cost) %>%
  autoplot()

From the plots we can see that there is strong seasonality in the payments for H02 prescriptions. For individuals receiving concessional scripts, co-payments peak around April, then decrease as more and more individuals hit their payment threshold, causing safety net expenditures to increase. The high number of safety net expenditures in January is likely from purchases made in late December, where the data was recorded in January. Safety net expenditures show this seasonality for both the general public and those receiving concessional scripts, but the co-payments made by the general public do not show this seasonality. This is likely because their safety net threshold is higher, and fewer exceed it, so more co-payments are made consistently throughout the year.

Barrels from us_gasoline

autoplot(us_gasoline, Barrels)

gg_season(us_gasoline, Barrels, labels = "both")

gg_subseries(us_gasoline, Barrels)

ACF(us_gasoline, Barrels, lag_max = 52) %>%
  autoplot()

US motor gasoline supplied shows seasonality as well as an upward trend from 1991 on. Gasoline in million barrels per day peaks in the summer and reaches its low point around January. The correlogram also shows both seasonality and the upward trend. It shows that there is a trend because the closest data points have the strongest correlation. It also shows the seasonality because we can see that the correlation is weakest at 26 weeks (half a year), and rises again until 52 weeks (one full year.) The correlation at 52 weeks doesn’t reach the strength of the correlation at 1 week because of the existence of the upward trend, but the values in the same season again show a relatively greater correlation.

DATA 624 Homework 1: Time Series Graphics

Molly Siebecker

2024-09-08

1.

2.

3.

4.

5.

8.

Total Private Employed from us_employment

Bricks from aus_production

Hare from pelt

H02 Cost from PBS

Barrels from us_gasoline