library(fpp3)

HA HW1

Exercise 1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

  • Use ? (or help()) to find out about the data in each series.

  • What is the time interval of each series?

bricks <- aus_production |> 
  select(Bricks)

lynx <- pelt |> 
  select(Lynx)

close <- gafa_stock |> 
  select(Close)

demand <- vic_elec |> 
  select(Demand)

The time intervals of each series:

  • Bricks is quarterly
head(bricks)
## # A tsibble: 6 x 2 [1Q]
##   Bricks Quarter
##    <dbl>   <qtr>
## 1    189 1956 Q1
## 2    204 1956 Q2
## 3    208 1956 Q3
## 4    197 1956 Q4
## 5    187 1957 Q1
## 6    214 1957 Q2
  • Lynx is yearly
head(lynx)
## # A tsibble: 6 x 2 [1Y]
##    Lynx  Year
##   <dbl> <dbl>
## 1 30090  1845
## 2 45150  1846
## 3 49150  1847
## 4 39520  1848
## 5 21230  1849
## 6  8420  1850
  • Close is daily
close <- close |> mutate(Date = as_date(Date)) |> 
  as_tsibble(index = Date)

head(close)
## # A tsibble: 6 x 3 [1D]
## # Key:       Symbol [1]
##   Close Date       Symbol
##   <dbl> <date>     <chr> 
## 1  79.0 2014-01-02 AAPL  
## 2  77.3 2014-01-03 AAPL  
## 3  77.7 2014-01-06 AAPL  
## 4  77.1 2014-01-07 AAPL  
## 5  77.6 2014-01-08 AAPL  
## 6  76.6 2014-01-09 AAPL
  • Demand is every 30 mins
head(demand)
## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
##   Demand Time               
##    <dbl> <dttm>             
## 1  4383. 2012-01-01 00:00:00
## 2  4263. 2012-01-01 00:30:00
## 3  4049. 2012-01-01 01:00:00
## 4  3878. 2012-01-01 01:30:00
## 5  4036. 2012-01-01 02:00:00
## 6  3866. 2012-01-01 02:30:00
  • Use autoplot() to produce a time plot of each series.

  • For the last plot, modify the axis labels and title.

Bricks, Lynx, Close, and Demand

Bricks

autoplot(bricks, Bricks)

autoplot(bricks, Bricks) + 
  labs(title="Quarterly Bricks Production (AUS)",
       x= "Quarter",
       y= "# of Bricks Produced (Millions)") +
  theme_minimal()

Lynx

autoplot(lynx, Lynx)

autoplot(lynx, Lynx) +
  labs(title = "Lynx Data", 
       x= "Year (Annual)", 
       y = "Number of Lynx Pelt Traded") +
  theme_minimal()

Close

autoplot(close, Close)

autoplot(close, Close) +
  labs(title = "Daily Closing Stock Price",
       x="Daily", 
       y = "Closing Stock Price ($)") + 
  theme_minimal()

Demand

autoplot(demand, Demand)

autoplot(demand,Demand) + 
  labs(title = "Electricity Demand in Victoria, Australia ", 
       x = "Every Half Hour",
       y = "Electricity Demand (Mhw)") + 
  theme_minimal()

Exercise 2

First, we need to group the data by Symbol so then we can filter Close to have the maximum or peak closing price for each stock. We then select the columns we want to see which are Date, Symbol, and Close.

close |> 
  group_by(Symbol) |> 
  filter(Close == max(Close)) |> 
  select(Date, Symbol, Close)
## # A tsibble: 4 x 3 [1D]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Date       Symbol Close
##   <date>     <chr>  <dbl>
## 1 2018-10-03 AAPL    232.
## 2 2018-09-04 AMZN   2040.
## 3 2018-07-25 FB      218.
## 4 2018-07-26 GOOG   1268.
# fidn the days where the closing price was at its peak i.e. max

Exercise 3

Code provided by textbook. We downloaded the tute1.csv file and used view() to examine the data. Using the as_tsibble() function, we convert the data to a time series where the time interval is quarterly setting the index to Quarter. Plotting the timeseries a line plot using ggplot() and geom_line() and utilized facet_grid() to create a subgrid of the timeseries.

Parts

a.

You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")
#View(tute1)

b.

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

c. 

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  labs(title = "No facet_grid()") +
  geom_line() 

Check what happens when you don’t include facet_grid().

Exercise 4

First, we install the package USgas and converted Usgas to a timeseries using as_tsibble() and setting the index to year and key to state. Afterwards, we wanted to see the gas consumption for the following states; Maine, Vermont, New Hampshire, Massachussetts, Connecticut and Rhode Island. To do that using the filter() function and used autoplot() to visualize the timeseries.

Install the USgas package.

#install.packages("USgas")
library(USgas)

Create a tsibble from us_total with year as the index and state as the key.

head(us_total)
##   year   state      y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
us_total <- us_total |> 
  as_tsibble(index = year, key = state)
head(us_total)
## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state        y
##   <int> <chr>    <int>
## 1  1997 Alabama 324158
## 2  1998 Alabama 329134
## 3  1999 Alabama 337270
## 4  2000 Alabama 353614
## 5  2001 Alabama 332693
## 6  2002 Alabama 379343

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

filtered_us_total <- us_total |> 
  filter(state %in% c("Maine", "Vermont", "New Hamposhire", "Massachusetss", "Connecticut", "Rhode Island")) 

autoplot(filtered_us_total, y ) +
  labs(title='US Annual Total Natural Gas Consumption', 
       x = "Year", 
       y = "Total Natural Gas Consumption (MMcf") +
  theme_minimal()

Exercise 5

Using read_excel() from readxl package to load in the data, in order to create a tsibble object that is identical to the tourism tsibble from the tsibble package. Created the tourism tsibble object with setting the index to Quarter, a timeseries with quarterly time interval, and setting the key to Region, State, and Purpose to mimic the original tourism tsibble. Then, we wanted find which combination of Region and Purpose had the highest average of overnight trips. We did that by first grouping the timeseries by the Region and Purpose then using summarize() to calculate the mean() of Trips. Next, filtered the timeseries for the max overnight trips of each possible combination. FInally, arranged() the output in descending order to show the the combination with the highest average of overnight trips.

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

library(readxl)
data <- read_excel('tourism.xlsx')

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

head(tourism) # tourism from the tsibble package
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.
tourism_2 <- data |> 
  mutate(Quarter = yearquarter(Quarter)) |> 
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

head(tourism_2)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_2 |> 
  group_by(Region, Purpose) |> 
  summarize(AverageTrips = mean(Trips, na.rm = TRUE)) |> 
  filter(AverageTrips == max(AverageTrips)) |> 
  arrange(desc(AverageTrips)) |> head()
## # A tsibble: 6 x 4 [1Q]
## # Key:       Region, Purpose [6]
## # Groups:    Region [6]
##   Region          Purpose  Quarter AverageTrips
##   <chr>           <chr>      <qtr>        <dbl>
## 1 Melbourne       Visiting 2017 Q4         985.
## 2 Sydney          Business 2001 Q4         948.
## 3 South Coast     Holiday  1998 Q1         915.
## 4 North Coast NSW Holiday  2016 Q1         906.
## 5 Brisbane        Visiting 2016 Q4         796.
## 6 Gold Coast      Holiday  2002 Q1         711.

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_2 |> 
  group_by(State) |> 
  summarize(TotalTrips = sum(Trips)) |> 
  as_tsibble(index = Quarter) |> 
  head()
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter TotalTrips
##   <chr>   <qtr>      <dbl>
## 1 ACT   1998 Q1       551.
## 2 ACT   1998 Q2       416.
## 3 ACT   1998 Q3       436.
## 4 ACT   1998 Q4       450.
## 5 ACT   1999 Q1       379.
## 6 ACT   1999 Q2       558.

Exercise 8

Created a function that will generate the following plots each timeseries input, autoplot(), gg_season(), gg_subseries(), gg_lag(), and ACF(), since we will be performing similar operation on multiple timeseries. While answering the following question:

  • Can you spot any seasonality, cyclicity and trend?

  • What do you learn about the series?

  • What can you say about the seasonal patterns?

  • Can you identify any unusual years?

plot_time_series <- function(data) {
  
  p1 <- autoplot(data) + theme_minimal()
  
  p2 <- gg_season(data) + theme_minimal()
  
  p3 <- gg_subseries(data) + theme_minimal()
  
  p4 <- gg_lag(data, geom = "point") + theme_minimal()
  
  p5 <- ACF(data) |> 
    autoplot() + theme_minimal()
  
  print(p1)
  print(p2)
  print(p3)
  print(p4)
  print(p5)
}
plot_time_series_without_gg_season <- function(data) {
  
  p1 <- autoplot(data) + theme_minimal()
  
  p3 <- gg_subseries(data) + theme_minimal()
  
  p4 <- gg_lag(data, geom = "point") + theme_minimal()
  
  p5 <- ACF(data) |> 
    autoplot() + theme_minimal()
  
  print(p1)
  print(p3)
  print(p4)
  print(p5)
}

Time Series :

us_employment

  • Can you spot any seasonality, cyclicity and trend?

Overall, this time series has only linearly increasing trend. Some may argue that there may be a seasonal aspect to it when taking a look at the peaks and valleys generated from autoplot() but looking at gg_season() plot it is almost perfect parallel line indicating little to no seasonality. Look the plots, we can say that job growth in private sector in the US has been linearly increasing over the years. It seems that the years 2001 and 2008 had impacts on job growth where we a decrease in the number of employed in the private sector. Moreover, there is a strongt positive correlation between lag as n is from 1 to 9 due the general increasing trend of the timeseries.

total_private <- us_employment |> 
  filter(Title == "Total Private") |> select(Month, Employed)
print(plot_time_series(total_private))

aus_production

From 1960 - 1970s, there is an increased trend of bricks production in Australia. Then, that trend starts decreasing in the 1980s. The plots do not suggest any cyclic behavior but I suspect there exists some seasonality within each quarter. The plots supports my suspicion of a strong seasonality in the time series with its strong positive correlation from lag 1 to 9.

print(plot_time_series(bricks))

pelt

This timeseries demonstrates neither an increasing nor decreasing trend but rather a cyclic pattern. Now, taking a look the lag plots there is no obvious correlation from prior time intervals. Moreover, the ACF() plots supports the observation that this time series exhibits cyclical behavior.

hare <- pelt |> select(-Lynx) |> as_tsibble(index = Year)
print(plot_time_series_without_gg_season(hare))

gg_season() doesnt work on this particular time series since a season suggest sub-yearly data.

PBS

PBS time series exhibits a somewhat increasing trend along with a seasonal patterns where cost is usually at its peak during December. From this time series, we learned that Australian presciption cost is at its lowest in the month of February, repeating for each consecutive years.

h02 <- PBS |> 
  filter(ATC2 == "H02") |> 
  as_tibble() |> 
  select(Month, Cost) |> 
  group_by(Month) |> 
  summarize(Cost = sum(Cost)) |> 
  as_tsibble(index = Month)
print(plot_time_series(h02))

us_gasoline

Us_gasoline time series increasing trend with a seasonal pattern. When inspecting the lag plot, all 52 weeks are closely grouped together along the positive correlation line indicating that us gas price generally increase every year. Then, this is supported by AFC() plot where the values are all above 0.50 correlation.

print(plot_time_series(us_gasoline))