library(tidyverse)
library(fpp3)

EXERCISE 2.1

Explore the following four time series: - Bricks from aus_production - Lynx from pelt - Close from gafa_stock - Demand from vic_elec.

a. Use ? (or help()) to find out about the data in each series.

# read in the time series 
?aus_production
?pelt
?gafa_stock
?vic_elec

b. What is the time interval of each series?

aus_production

head(aus_production)
## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7
# Time period 
min(aus_production$Quarter)
## <yearquarter[1]>
## [1] "1956 Q1"
## # Year starts on: January
max(aus_production$Quarter)
## <yearquarter[1]>
## [1] "2010 Q2"
## # Year starts on: January
# Interval of each observation 
interval(aus_production)
## <interval[1]>
## [1] 1Q

The interval for the aus_production series is quarterly from 1956 to the second quarter (Q2) of 2010

pelt

head(pelt)
## # A tsibble: 6 x 3 [1Y]
##    Year  Hare  Lynx
##   <dbl> <dbl> <dbl>
## 1  1845 19580 30090
## 2  1846 19600 45150
## 3  1847 19610 49150
## 4  1848 11990 39520
## 5  1849 28040 21230
## 6  1850 58000  8420
# Time period 
min(pelt$Year)
## [1] 1845
max(pelt$Year)
## [1] 1935
# Interval
interval(pelt)
## <interval[1]>
## [1] 1Y

For the pelt time series data was collected over one year intervals from 1845 to 1935

gafa_stock

head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
# Time period 
min(gafa_stock$Date)
## [1] "2014-01-02"
max(gafa_stock$Date)
## [1] "2018-12-31"
# Interval
interval(gafa_stock)
## <interval[1]>
## [1] !

The interval of the stock data is 1 business day from January 2nd, 2014 to December 31st, 2018.

head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE
# Time period 
min(vic_elec$Time)
## [1] "2012-01-01 AEDT"
max(vic_elec$Time)
## [1] "2014-12-31 23:30:00 AEDT"
# Interval
interval(vic_elec)
## <interval[1]>
## [1] 30m

The interval for the vic_elec time series is every 30 minutes from January 1st, 2012 at midnight to December 31st, 2014 at 11:30 PM

c. Use autoplot() to produce a time plot of each series.

# aus_production - Bricks

aus_production %>% autoplot(Bricks) + labs(y = " Number of Bricks (millions)", title = "Australian Clay Brick Production")

# pelt - Lynx

pelt %>% autoplot(Lynx) + labs(y = "Number of pelts", title = "Canadian Lynx Traded")

# gafa_stock

gafa_stock %>% autoplot(Close) + labs(ylabel = "US Dollars", title = "Historical Closing Stock Prices")

# vic_elec - Demand

vic_elec %>% autoplot(Demand)

d. For the last plot, modify the axis labels and title.

vic_elec %>% autoplot(Demand) + labs(x = "Year", y = "Demand (MWh)", title = "Electricity Demand for Victoria, Australia")

# EXERCISE 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock

head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
# Get the unique Stocks in the data set 
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB"   "GOOG"
gafa_stock %>% autoplot(Close)

# Peak close for AAPL
gafa_stock %>% filter(Symbol == "AAPL") %>% arrange(desc(Close)) %>% head(1)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Symbol`, `Date` first.
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800

The peak closing price for AAPL - Apple Inc was $232.07 on 10/03/2018

# Peak close for AMZN
gafa_stock %>% filter(Symbol == "AMZN") %>% arrange(desc(Close)) %>% head(1)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Symbol`, `Date` first.
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 AMZN   2018-09-04 2026. 2050.  2013 2040.     2040. 5721100

The peak closing price for AMZN - Amazon.com Inc was $2039.51 on 09/04/2018

# Peak close for FB
gafa_stock %>% filter(Symbol == "FB") %>% arrange(desc(Close)) %>% head(1)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Symbol`, `Date` first.
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 FB     2018-07-25  216.  219.  214.  218.      218. 58954200

The peak closing price for FB - Meta Platforms Inc was $217.50 on 07/25/2018

# Peak close for GOOG
gafa_stock %>% filter(Symbol == "GOOG") %>% arrange(desc(Close)) %>% head(1)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Symbol`, `Date` first.
## # A tsibble: 1 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close  Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
## 1 GOOG   2018-07-26  1251 1270. 1249. 1268.     1268. 2405600

The peak closing price for GOOG - Alphabet Inc was $1268.33 on 07/26/2018

EXERCISE 2.3

a. You can read the data into R with the following script:

tute1 <- read_csv("https://raw.githubusercontent.com/D-hartog/DATA624/refs/heads/main/HW1/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254

b. Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter, fiscal_start = 1)) %>%
  as_tsibble(index = Quarter)

head(mytimeseries)
## # A tsibble: 6 x 4 [1Q]
##   Quarter Sales AdBudget   GDP
##     <qtr> <dbl>    <dbl> <dbl>
## 1 1981 Q1 1020.     659.  252.
## 2 1981 Q2  889.     589   291.
## 3 1981 Q3  795      512.  291.
## 4 1981 Q4 1004.     614.  292.
## 5 1982 Q1 1058.     647.  279.
## 6 1982 Q2  944.     602   254

c. Construct time series plots of each of the three series

mytimeseries %>% 
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid()

mytimeseries %>% 
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

When the facet grid function is not added, each variable is plotted on the same graph. Since they all share the y - axis the detail of the trends gets lost, especially for GDP.

EXERCISE 2.4

The USgas package contains data on the demand for natural gas in the US.

a. Install the USgas package.

# install.packages("USgas")

library(USgas)
## Warning: package 'USgas' was built under R version 4.3.3
head(us_total)
##   year   state      y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343

b. Create a tsibble from us_total with year as the index and state as the key.

# 
ustotal_tsibble <- us_total %>% as_tsibble(index = year, key = state)

c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

ustotal_tsibble %>% filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>% autoplot(y) + labs(x = "Year", y= "Consumption", title = "Natural Gas Consumption Among New England States")

EXERCISE 2.5

a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

Download file from here

tourism <- readxl::read_excel("tourism.xlsx")

head(tourism)
## # A tibble: 6 × 5
##   Quarter    Region   State           Purpose  Trips
##   <chr>      <chr>    <chr>           <chr>    <dbl>
## 1 1998-01-01 Adelaide South Australia Business  135.
## 2 1998-04-01 Adelaide South Australia Business  110.
## 3 1998-07-01 Adelaide South Australia Business  166.
## 4 1998-10-01 Adelaide South Australia Business  127.
## 5 1999-01-01 Adelaide South Australia Business  137.
## 6 1999-04-01 Adelaide South Australia Business  200.

b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_ts <- tourism %>% mutate(Quarter = yearquarter(Quarter, fiscal_start = 1)) %>% as_tsibble(index = Quarter, key = c(Region, State, Purpose))

head(tourism_ts)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism %>% group_by(Region, Purpose) %>% summarise(Avg_overnight_trips = mean(Trips)) %>% arrange(desc(Avg_overnight_trips)) %>% head(1)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 1 × 3
## # Groups:   Region [1]
##   Region Purpose  Avg_overnight_trips
##   <chr>  <chr>                  <dbl>
## 1 Sydney Visiting                747.

d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

state_total_trips_ts <- tourism_ts %>% group_by(State) %>% summarise(Total_trips = sum(Trips))

head(state_total_trips_ts)
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter Total_trips
##   <chr>   <qtr>       <dbl>
## 1 ACT   1998 Q1        551.
## 2 ACT   1998 Q2        416.
## 3 ACT   1998 Q3        436.
## 4 ACT   1998 Q4        450.
## 5 ACT   1999 Q1        379.
## 6 ACT   1999 Q2        558.

EXCERCISE 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series:

a. “Total Private” Employed from us_employment

head(us_employment)
## # A tsibble: 6 x 4 [1M]
## # Key:       Series_ID [1]
##      Month Series_ID     Title         Employed
##      <mth> <chr>         <chr>            <dbl>
## 1 1939 Jan CEU0500000001 Total Private    25338
## 2 1939 Feb CEU0500000001 Total Private    25447
## 3 1939 Mar CEU0500000001 Total Private    25833
## 4 1939 Apr CEU0500000001 Total Private    25801
## 5 1939 May CEU0500000001 Total Private    26113
## 6 1939 Jun CEU0500000001 Total Private    26485
# Filter data frame for Total private employed 

total_private <- us_employment %>% filter(Title == "Total Private")

Can you spot any seasonality, cyclicity and trend?

total_private %>% autoplot(Employed) + labs(title = "US Employment: Total Private")

total_private %>% filter(year(Month) > 1960 & year(Month) < 1965) %>% autoplot(Employed)

Using the autoplot() function we can see a clear positive trend from 1939 to 2020 and the trend appears to be linear. There also appears to be some cyclic behavior. In the first graph above, the cycles are not consistently spaced, occurring between approximately 5-10 years. When we zoom in to the years 1961-1965 there are what might appear to be seasonal trends.

What do you learn about the series? The series contains observations from 1939 to 2020 in one month intervals. Despite there being some fluctuations with in a year and over several years, the trend in the number of employed persons continues to rise.

What can you say about the seasonal patterns?

total_private %>% gg_season(Employed, period = "year") + labs(title = "Seasonal Trends in US Employment: Total Private ")

Using gg_season to plot the trends across every year, we see small linear trends from the beginning to the end of each year but not a clear seasonal pattern as was initally visualized in the zoomed in autoplot.

total_private %>% gg_lag(period = "month", lags = 1:24)
## Plot variable not specified, automatically selected `y = Employed`

total_private %>% ACF(Employed, lag_max = 100) %>% autoplot()

There are strong autocorrelations between the lags and the autocorrelation plot demonstrates a slow decrrease in the strength of the correlations as the lags get further out. This aligns with the linear trend we visualized when plotting the whole time series.

Can you identify any unusual years?

total_private %>% filter(year(Month) > 2005 & year(Month) < 2011) %>% autoplot(Employed)

The most obvious unusual year that broke from yearly trend was in 2008. This was an unusual year as there was a larger drop in employment starting in the middle of the year that continued downward in the start of 2009. This is in contrast to other years where we might see a slight drop towards the end of the year followed by a steady incline at the start of the next year. This can be explained by the 2008 financial crisis and recession.

b. Bricks from aus_production

bricks <- aus_production %>% select(Bricks)

head(bricks)
## # A tsibble: 6 x 2 [1Q]
##   Bricks Quarter
##    <dbl>   <qtr>
## 1    189 1956 Q1
## 2    204 1956 Q2
## 3    208 1956 Q3
## 4    197 1956 Q4
## 5    187 1957 Q1
## 6    214 1957 Q2
min(bricks$Quarter)
## <yearquarter[1]>
## [1] "1956 Q1"
## # Year starts on: January
max(bricks$Quarter)
## <yearquarter[1]>
## [1] "2010 Q2"
## # Year starts on: January

Can you spot any seasonality, cyclic and trend?

bricks %>% autoplot(Bricks) + labs(title = "Australian Clay Brick Production")

bricks %>% filter(year(Quarter) <= 1965) %>% autoplot(Bricks)

Plotting the time series does reveal an initial upward trend in the number of clay bricks being produced, as well as seasonality and cyclic trends. At about 1975 there is abig dorp in the production followed by a period of increased production before another drop off in the early 80’s. The production picks up again but after this, the production of clay bricks follows a downward trend. The season trends still persist as well as the cycles with a large drop in production every 5-7 years.

What do you learn about the series? The series spans the years from 1956 to the second quarter of 2010. The production of clay bricks seems to be fairly predictable up until the mid 1970’s. After this there is visually more variation in the clay brick production.

What can you say about the seasonal patterns?

bricks %>% gg_season(Bricks) + labs(title = "Seasonal Trends")

When visualizing the trends between each quarter in a given year, for most early years there is an increase in the amount of bricks being produced from Q1 to Q2, production steadies during Q2 to Q3 or slightly increases, and from Q3 to Q4 there is a slight decrease. It is harder to identify a consistent seasonal trend in later years.

bricks %>% gg_lag(Bricks, lags = 1:9, geom = "point")
## Warning: Removed 20 rows containing missing values (gg_lag).

bricks %>% ACF(Bricks, lag_max = 36) %>% autoplot()

The lag plots above show strong positive correlations over 9 lags and when plotting the correlation coefficients we also see the typical pattern where the slow decrease in correlations indicates the overall linear trend and the seasonality can be seen spikes in the correlation coefficients at every 4th lag.

Can you identify any unusual years? As the trend was continuing to increase, in the early 1980’s there was a big drop in the early 1980’s.

c. Hare from pelt

hare <- pelt %>% select(Hare)

head(hare)
## # A tsibble: 6 x 2 [1Y]
##    Hare  Year
##   <dbl> <dbl>
## 1 19580  1845
## 2 19600  1846
## 3 19610  1847
## 4 11990  1848
## 5 28040  1849
## 6 58000  1850

Can you spot any seasonality, cyclicity and trend?

hare %>% autoplot(Hare) + labs(title = "Number of Snowshoe Hare pelts Traded")

There does not appear to be any linear trend in the amount of hare pelts traded in the time frame data was collect. There does appear to be cyclic patterns as the peaks and valleys span 3-5 years.

What do you learn about the series? The data in the series is collected every year from 1845 to 1935. Over the span of 90 years the amount of pelts traded follows a cyclical behavior.

What can you say about the seasonal patterns? Since there is only data for each year, it is not possible to get an accurate assessment on seasonality.

hare %>% gg_lag(Hare, lags = 1:10, geom = "point")

hare %>% ACF(Hare, lag_max = 25) %>% autoplot()

The autocorrelation plot shows repeating periods of positive and negative spikes in a sinusodal wave pattern. The autocorrelations are also decaying towards 0, getting weaker the further the lags are away from each other.

Can you identify any unusual years?

hare %>% arrange(desc(Hare))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by ``, `Year` first.
## # A tsibble: 91 x 2 [1Y]
##      Hare  Year
##     <dbl> <dbl>
##  1 152650  1863
##  2 148360  1864
##  3 134860  1886
##  4 134850  1885
##  5 103790  1887
##  6 101250  1875
##  7  97120  1876
##  8  89760  1933
##  9  88480  1853
## 10  88060  1856
## # ℹ 81 more rows

In the autolot we can clearly see two spikes in the mid to late 1800’s when the amount of Hare pelts were being traded. When we sort the data we can see that between 1863-1864 and 1885 - 1886 there were periods noted in the graph.

d. “H02” Cost from PBS

H02 <- PBS %>% filter(ATC2 == "H02")

H02
## # A tsibble: 816 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [4]
##       Month Concession   Type     ATC1  ATC1_desc ATC2  ATC2_desc Scripts   Cost
##       <mth> <chr>        <chr>    <chr> <chr>     <chr> <chr>       <dbl>  <dbl>
##  1 1991 Jul Concessional Co-paym… H     Systemic… H02   CORTICOS…   63261 317384
##  2 1991 Aug Concessional Co-paym… H     Systemic… H02   CORTICOS…   53528 269891
##  3 1991 Sep Concessional Co-paym… H     Systemic… H02   CORTICOS…   52822 269703
##  4 1991 Oct Concessional Co-paym… H     Systemic… H02   CORTICOS…   54016 280418
##  5 1991 Nov Concessional Co-paym… H     Systemic… H02   CORTICOS…   49281 268070
##  6 1991 Dec Concessional Co-paym… H     Systemic… H02   CORTICOS…   51798 277139
##  7 1992 Jan Concessional Co-paym… H     Systemic… H02   CORTICOS…   42436 221772
##  8 1992 Feb Concessional Co-paym… H     Systemic… H02   CORTICOS…   52913 272345
##  9 1992 Mar Concessional Co-paym… H     Systemic… H02   CORTICOS…   62908 325700
## 10 1992 Apr Concessional Co-paym… H     Systemic… H02   CORTICOS…   68499 349271
## # ℹ 806 more rows

Can you spot any seasonality, cyclicity and trend?

H02 %>% autoplot(Cost) +
  facet_grid(Type ~ Concession, scales = "free_y")

There are four combinations that we can look at regarding the cost of scripts. - Concessional scripts and co-payments: The cost from concessional scripts and co-payments appears to increase over time with seasonal features. - Concessional scripts and safety net costs appear to have cyclical patterns. The peak safety net/concessional costs appear to increase over time. - General/co payment costs do not appear to have any trends but potentially have seasonality. - General/safety net costs also follow a cyclical pattern with two clear spikes in the beginning of the time series

What do you learn about the series? Monthly data about the total number of scripts and the cost of the scripts are collected. The four different groups appear to have individual patterns of trend, seasaonality and cycles.

What can you say about the seasonal patterns?

1. Concessional/safety net costs

H02 %>% filter((Concession == "Concessional") & (Type == "Safety net")) %>% gg_season(Cost) + labs(title = "Concessional/safety net costs")

# Lag plot 
H02 %>% filter((Concession == "Concessional") & (Type == "Safety net")) %>% gg_lag(Cost, geom = "point") + labs(title = "Concessional/safety net costs")

# ACF plot 
H02 %>% filter((Concession == "Concessional") & (Type == "Safety net")) %>% ACF(Cost, lag_max = 72) %>% autoplot() + labs(title = "Concessional/safety net costs")

The seasonal, lag and autocorrelation plots for concessional/safety net costs indicate a cyclica pattern and the decaying autocorrelations also point to some trend over time.

2. Concessional/co-payment costs

H02 %>% filter((Concession == "Concessional") & (Type == "Co-payments")) %>% gg_season(Cost) + labs(title = "Concessional/co-payment costs")

# Lag plot 
H02 %>% filter((Concession == "Concessional") & (Type == "Co-payments")) %>% gg_lag(Cost, geom = "point") + labs(title = "Concessional/co-payment costs")

# ACF plot 
H02 %>% filter((Concession == "Concessional") & (Type == "Co-payments")) %>% ACF(Cost, lag_max = 72) %>% autoplot() + labs(title = "Concessional/co-payment costs")

The seasonal plot shows that for most years costs of scripts are higher from March to July and then start to drop towards the end of the year. There are strong autocorrelations in the data that indicate previous months are good indicators of the cost of the current year. The autocorrelation plot provides a lot of information, the peaks at the multiples of 12 being highly correlated indicates seasonality, the decay towards 0 indicates a linear trend and the sinusoidal wave form indicates a cyclical pattern in the data.

3. General/safety net costs

# Seasonal plot
H02 %>% filter((Concession == "General") & (Type == "Safety net")) %>% gg_season(Cost) + labs(title = "General/safety net costs")

# Lag plot 
H02 %>% filter((Concession == "General") & (Type == "Safety net")) %>% gg_lag(Cost, geom = "point") + labs(title = "General/safety net costs")

# ACF plot 
H02 %>% filter((Concession == "General") & (Type == "Safety net")) %>% ACF(Cost, lag_max = 72) %>% autoplot() + labs(title = "General/safety net costs")

The seasonal plot of the cost of the General type/safety net shows higher costs at the beginning and at then end of the year. The lowest costs are in February to May, where April marks the beginning of the rise in costs. In the lag plot there appears to be two clusters but moderate correlations are calculated and plotted. The autocorrelation plot shows consistent periods of positive and negative correlations confirming seasonality.

4. General/co-payments costs

# Time series plot
H02 %>% filter((Concession == "General") & (Type == "Co-payments")) %>% autoplot(Cost) + labs(title = "General/co-payment costs")

# Seasonal plot
H02 %>% filter((Concession == "General") & (Type == "Co-payments")) %>% gg_season(Cost) + labs(title = "General/co-payment costs")

# Lag plot 
H02 %>% filter((Concession == "General") & (Type == "Co-payments")) %>% gg_lag(Cost, geom = "point") + labs(title = "General/co-payment costs")

# ACF plot 
H02 %>% filter((Concession == "General") & (Type == "Co-payments")) %>% ACF(Cost, lag_max = 72) %>% autoplot() + labs(title = "General/co-payment costs")

I re-plotted the time series to visualize it more clearly. The seasonal plots of general/co payments costs is difficult to see a clear pattern between the years. The lag plots over 9 lags demonstrate positive correlations but the autocorrelation plots confirm weak correlations. Beyond 21 lags the correlations are negative except at 23 (however it is close to 0). Visually inspecting the autocorrelation plot there is no pattern in the peaks and valleys, indicating weak seasonality in the plot.

Can you identify any unusual years? There does not appear to be any unusual years.

e. Barrels from us_gasoline

Can you spot any seasonality, cyclicity and trend?/What do you learn about the series?

?us_gasoline

head(us_gasoline)
## # A tsibble: 6 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
## 5 1991 W10    6.88
## 6 1991 W11    6.95
us_gasoline %>% autoplot(Barrels) + labs(y = "Barrels (million/day)", title = "US Finished Motor Gasoline Product Supplied")

The series contains information in weekly intervals from the 6th week of 1991 to the 3rd week of 2017.. Visualizing the numbers of gasoline barrels supplied overtime we can clearly see a linear trend up until the mid 2000’s. The trend reverses and then again looks to trend upwards. The plot also reveals a seasonality component as well. Zooming in on a few early years gives a better visual on trends within one year, where we can see the seasonal trends.

us_gasoline %>% filter(year(Week) <= 1993) %>% autoplot(Barrels)

What can you say about the seasonal patterns?

us_gasoline %>% gg_season(Barrels)

Looking at seasonality using gg_seasons, we see some consistent patterns year to year. There appears to be an upward trend up until the middle of the year and the number of barrels produced seem to level out or slightly drop.

us_gasoline %>% gg_lag(Barrels, lags = 1:10, geom = "point") + labs(title = "Lag plots")

Inspecting the lag plots, there are strong positive correlations over all lags up to 10. The data points appear to spread out as the lags get further spaced. An autocorrelation plot will help to provide further insight into the presence of seasonality.

# Lags over 4 years 
us_gasoline %>% ACF(Barrels, lag_max = 208) %>% autoplot()

Inspecting the lag plot autocorrelation plot for a max lag of 208 (equivalent to 4 years) we can see peaks and valleys at regular intervals indicating. I decided to plot more than what was visualized in the lag plots to get a better idea of the long term trend. The slow decrease in the ACF as the lags increases is due to the trend, while the “scalloped” shape is due to the seasonality.

Can you identify any unusual years?

Visually inspecting the first plot of the time series, the first area that draws attention is the change from an upward trend to a downward trend during the years 2008 - 2010. This may reflect the stock market crisis and recession that hit the US and the reduced demand. Another period that sticks out to me is around 2015/2016. The seasonal pattern appears to have abnormal shape compared to other time periods in the series.