Exercises: 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8

Exercise 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.

help(aus_production)
## starting httpd help server ... done
help(pelt)
help(gafa_stock)
help(vic_elec)
aus_production.desc <- "Quarterly production of selected commodities in Australia." 
pelt.desc <- "Pelt trading records"
gafa_stock.desc <- "GAFA (Google, Amazon, Facebook, Apple) stock prices"
vic_elec.desc <- "Half-hourly electricity demand for Victoria, Australia"

What is the time interval of each series?

The time interval for aus_production is quarterly. The time interval for pelt is annual. The time interval for gafa_stock is daily (on irregular trading days). The time interval for vic_elec is half-hourly.

Use autoplot() to produce a time plot of each series.

autoplot(aus_production,Bricks) + labs(title = aus_production.desc)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

autoplot(pelt,Lynx) + labs(title = pelt.desc)

autoplot(gafa_stock,Close) + labs(title = gafa_stock.desc)

autoplot(vic_elec,Demand) + labs(title = vic_elec.desc)

For the last plot, modify the axis labels and title.

autoplot(vic_elec,Demand) + 
  labs(
    title = vic_elec.desc, 
    subtitle = "Plotting total electrical demand in MWh") +
  xlab("Date and Time (in 30 min intervals)") + 
  ylab("Electricity Demand (MWh)")

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% 
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Exercise 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a. You can read the data into R

tute1 <- readr::read_csv(file.path(projPath, "tute1.csv"))
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)

b. Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

c. Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

When I don’t include facet_grid, all of the series are graphed on one plot.

Exercise 2.4

The USgas package contains data on the demand for natural gas in the US.

a. Install the USgas package.

I installed USgas in my setup block at the top.

b. Create a tsibble from us_total with year as the index and state as the key.

us_total_ts <- us_total %>%
  as_tsibble(index = year, key = state) 

c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

us_total_ts_NE <- us_total_ts %>%
  filter(state %in% c( "Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island"))

ggplot(data = us_total_ts_NE,
    aes(x = year, y = y, colour = state)
  ) + 
  geom_line() + 
  labs(title = "Annual Natural Gas Consumption by State for the New England area") +
  ylab("Gas Consumption")

Exercise 2.5

a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism_xlsx <- readxl::read_excel(file.path(projPath, "tourism.xlsx"))

b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

glimpse(tsibble::tourism)
## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
tourism_ts <- tourism_xlsx %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)

glimpse(tourism_ts)
## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…

c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_ts %>%
  group_by(Region, Purpose) %>%
  summarize(avg_trips = mean(Trips),
    .groups = "drop") %>%
  select(Region, Purpose, avg_trips) %>%
  arrange(desc(avg_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## # A tsibble: 24,320 x 4 [1Q]
## # Key:       Region, Purpose [304]
##    Region          Purpose  avg_trips Quarter
##    <chr>           <chr>        <dbl>   <qtr>
##  1 Melbourne       Visiting      985. 2017 Q4
##  2 Sydney          Business      948. 2001 Q4
##  3 Sydney          Visiting      921. 2016 Q4
##  4 Sydney          Visiting      920. 2017 Q4
##  5 Sydney          Visiting      916. 2017 Q1
##  6 South Coast     Holiday       915. 1998 Q1
##  7 North Coast NSW Holiday       906. 2016 Q1
##  8 Sydney          Business      892. 2017 Q3
##  9 Sydney          Business      884. 2017 Q2
## 10 Sydney          Visiting      882. 2013 Q4
## # ℹ 24,310 more rows

I honestly can’t seem to figure out if there is a way to drop the Index (Quarter) grouping that is present in the tsibble. So I am going to use the non tsibble data.

tourism_ts %>%
  as_tibble() %>%
  group_by(Region, Purpose) %>%
  summarize(avg_trips = mean(Trips),
    .groups = "drop") %>%
  select(Region, Purpose, avg_trips) %>%
  arrange(desc(avg_trips))
## # A tibble: 304 × 3
##    Region          Purpose  avg_trips
##    <chr>           <chr>        <dbl>
##  1 Sydney          Visiting      747.
##  2 Melbourne       Visiting      619.
##  3 Sydney          Business      602.
##  4 North Coast NSW Holiday       588.
##  5 Sydney          Holiday       550.
##  6 Gold Coast      Holiday       528.
##  7 Melbourne       Holiday       507.
##  8 South Coast     Holiday       495.
##  9 Brisbane        Visiting      493.
## 10 Melbourne       Business      478.
## # ℹ 294 more rows

Now I can see that across the full range of time, the average number of trips was highest for Syndey with the Visitng purpose. Avg_trips was 747.

d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

I think the implication here is that because we still want this to be a tsibble, that the columns should be Quarter, State, total trips.

tourism_ts_d <- tourism_ts %>% 
  group_by(State) %>%
  summarize(total_trips = sum(Trips)) 
head(tourism_ts_d)
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter total_trips
##   <chr>   <qtr>       <dbl>
## 1 ACT   1998 Q1        551.
## 2 ACT   1998 Q2        416.
## 3 ACT   1998 Q3        436.
## 4 ACT   1998 Q4        450.
## 5 ACT   1999 Q1        379.
## 6 ACT   1999 Q2        558.

Exercise 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Exploratory Analysis Below:

Employed from us_employment

#help(us_employment)
#help(aus_production)
#help(pelt)
#help(PBS)
#help(us_gasoline)
us_employment.desc <- "US monthly employment data"

# Exploring
head(us_employment)
## # A tsibble: 6 x 4 [1M]
## # Key:       Series_ID [1]
##      Month Series_ID     Title         Employed
##      <mth> <chr>         <chr>            <dbl>
## 1 1939 Jan CEU0500000001 Total Private    25338
## 2 1939 Feb CEU0500000001 Total Private    25447
## 3 1939 Mar CEU0500000001 Total Private    25833
## 4 1939 Apr CEU0500000001 Total Private    25801
## 5 1939 May CEU0500000001 Total Private    26113
## 6 1939 Jun CEU0500000001 Total Private    26485
# Plotting
autoplot(us_employment %>% filter(Title == "Total Private"), Employed)

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_season(Employed)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_subseries( Employed)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_lag(Employed, lags = 1:12)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

us_employment %>% 
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?

It looks like there is a positive trend in employment. For seasonality, there seems to be a subtle increase in roughly the first half of the year, after which it appears flatter. The lag plots imply strong autocorrelation. There are some unusual dips in the mid 1970s, early 1980s, early 1990s, early 2000s, and 2008.

Bricks from aus_production

#aus_production.desc <- "Quarterly production of selected commodities in Australia." # already done earlier


# Plotting
aus_production %>%
  autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_season(y = Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_subseries(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%  
  gg_lag(Bricks, lags = 1:4)
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>%  
  ACF(Bricks) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?

The Australia clay brick production shows signs up seasonality cyclicty and trend. From the seasonality aspect, production is lowest in Q1 and highest in Q2 and Q3. From the cyclicity standpoint, we see these drops in production an subsequent growth up again every ~ 4-6 years or so. For the trend and unusual years, the lag and autocorrelation plots imply moderate autocorrelation. It looks like production trended up from 1956 - 1980, and has since trended down. (I’m a little confused on if the proper way to look at this then would be to divide the data into pre and post 1980.)

Hare from pelt

#pelt.desc <- "Pelt trading records" # already done earlier

# Plotting
pelt %>%
  autoplot(Hare)

pelt %>%
  gg_subseries(Hare) 

pelt %>%  
  gg_lag(Hare)

pelt %>%  
  ACF(Hare) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?

There is definite cyclicity, with cycles of spikes and valleys every 10ish or so years. It’s hard to guess at any seasonality with this annual data data set.

“H02” Cost from PBS

PBS.desc <- "Monthly Medicare Australia prescription data"

# Exploring
unique(PBS$ATC2)
##  [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A09" "A10" "A11" "A12" "A14"
## [13] "A15" "B01" "B02" "B03" "B05" "C01" "C02" "C03" "C04" "C05" "C07" "C08"
## [25] "C09" "C10" "D"   "D01" "D02" "D04" "D05" "D06" "D07" "D08" "D10" "D11"
## [37] "G01" "G02" "G03" "G04" "H01" "H02" "H03" "H04" "H05" "J01" "J02" "J04"
## [49] "J05" "J06" "J07" "L01" "L02" "L03" "L04" "M01" "M02" "M03" "M04" "M05"
## [61] "N02" "N03" "N04" "N05" "N06" "N07" "P01" "P02" "P03" "R"   "R01" "R03"
## [73] "R05" "R06" "S"   "S01" "S02" "S03" "V01" "V03" "V04" "V06" "V07" "Z"
# Plotting
PBS %>% 
  filter(ATC2 == "H02") %>%
  autoplot(Cost) + 
  facet_grid(Concession ~ Type, scales = "free_y")

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_season(Cost)

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_subseries(Cost)

#PBS %>% 
#  filter(ATC2 == "H02") %>%
#  gg_lag(Cost, lags = 1:12)
# The above code cause en error, so I'm trying to find the unique time series

PBS %>%
  filter(ATC2 == "H02") %>%
  as_tibble() %>% # because without this, it kept using the time groupings too                          
  group_by(Concession, Type, ATC1, ATC2) %>%    
  summarise(
    Total_Cost = sum(Cost),                    
    Total_Scripts = sum(Scripts)
  )
## `summarise()` has grouped output by 'Concession', 'Type', 'ATC1'. You can
## override using the `.groups` argument.
## # A tibble: 4 × 6
## # Groups:   Concession, Type, ATC1 [4]
##   Concession   Type        ATC1  ATC2  Total_Cost Total_Scripts
##   <chr>        <chr>       <chr> <chr>      <dbl>         <dbl>
## 1 Concessional Co-payments H     H02   100043176.      16884470
## 2 Concessional Safety net  H     H02    45609431.       5305140
## 3 General      Co-payments H     H02     3799151.        508692
## 4 General      Safety net  H     H02     7251449.       1301852
PBS %>%
  filter(ATC2 == "H02", Concession == "Concessional", Type == "Co-payments") %>%
  gg_lag(Cost, lags = 1:12) +
  labs(title = "Lag Plot: Concessional Co-payments") 

PBS %>%
  filter(ATC2 == "H02", Concession == "Concessional", Type == "Safety net") %>%
  gg_lag(Cost, lags = 1:12) +
  labs(title = "Lag Plot: Concessional Safety Net")

PBS %>%
  filter(ATC2 == "H02", Concession == "General", Type == "Co-payments") %>%
  gg_lag(Cost, lags = 1:12) +
  labs(title = "Lag Plot: General Co-payments")

PBS %>%
  filter(ATC2 == "H02", Concession == "General", Type == "Safety net") %>%
  gg_lag(Cost, lags = 1:12) +
  labs(title = "Lag Plot: General Safety Net")

PBS %>% 
  filter(ATC2 == "H02") %>%
  ACF(Cost) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?

There are four different series here: Concessional Co-payments, General Co-payments, Concessional Safety Net and General Saftey Net. We see seasonality in all but General Co-payments. The safety net costs sink low in February, and trend up throughout the year until the following February. The concessional co-payments see a gentle peak around April. Over the years, Concessional has trended upwards in cost. Some interesting years can be seen with General Safety Net and General Co-payments costs dropping between 1994-1997, and then staying lower since then.

Barrels from us_gasoline.

us_gasoline.desc <- "US finished motor gasoline product supplied"

# Plotting
us_gasoline %>%
  autoplot(Barrels)

us_gasoline %>%
  gg_season(Barrels)

us_gasoline %>%
  gg_subseries(Barrels)

us_gasoline %>%
  gg_lag(Barrels)

us_gasoline %>%
  ACF(Barrels) %>%
  autoplot()

Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?

Barrels of gasoline supplied trended steadily upwards until around 2006, trended down 2006-2012, then trended up again. We see pretty strong signals of autocorrelation.