Homework 1

Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book.

1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.

What is the time interval of each series?

The time interval for aus_production is from 1956 to 2010, for pelt it is 1845 to 1935, for gafa_stock it is every business day on the market from 2014-2018, and the vic-elec it is every 30 minutes from 2012 to 2014.

data("aus_production")
?aus_production

data("pelt")
?pelt

data("gafa_stock")
?gafa_stock

data("vic_elec")
?vic_elec

aus_production %>% 
  autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

pelt %>% 
  autoplot(Lynx)

gafa_stock %>% 
  autoplot(Close)

vic_elec %>% 
  autoplot(Demand) +
  labs(x = "Date", y = "Demand") +
  ggtitle("Electricity Demand Over Time")

Use autoplot() to produce a time plot of each series.

For the last plot, modify the axis labels and title.

2. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% group_by(Symbol) %>%
  filter(Close==max(Close)) %>%
  select(Symbol,
         Date,
         Close)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

3. Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

A. You can read the data into R with the following script:

tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

view(tute1)

B. Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearmonth(Quarter)) %>%
  as_tsibble(index = Quarter)

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y") +
  ggtitle("facet_grid")

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  ggtitle("No facet_grid")

C. Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.

Create a tsibble from us_total with year as the index and state as the key.

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

library(USgas)
data("us_total")
str(us_total)

## 'data.frame':    1266 obs. of  3 variables:
##  $ year : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ state: chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ y    : int  324158 329134 337270 353614 332693 379343 350345 382367 353156 391093 ...

us_total <- us_total %>%
  rename(natural_gas_consumption_mcf = y)
us_total_tsibble <- us_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  as_tsibble(key = state, index = year)
us_total_tsibble

## # A tsibble: 138 x 3 [1Y]
## # Key:       state [6]
##     year state       natural_gas_consumption_mcf
##    <int> <chr>                             <int>
##  1  1997 Connecticut                      144708
##  2  1998 Connecticut                      131497
##  3  1999 Connecticut                      152237
##  4  2000 Connecticut                      159712
##  5  2001 Connecticut                      146278
##  6  2002 Connecticut                      177587
##  7  2003 Connecticut                      154075
##  8  2004 Connecticut                      162642
##  9  2005 Connecticut                      168067
## 10  2006 Connecticut                      172682
## # ℹ 128 more rows

us_total_tsibble %>% autoplot(natural_gas_consumption_mcf)

5. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism <- readxl::read_excel("tourism.xlsx")

tourism_ts <- tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)

tourism_ts %>%
  group_by(Region, Purpose) %>%
  mutate(Avg_Trips = mean(Trips)) %>%
  ungroup() %>%
  filter(Avg_Trips == max(Avg_Trips)) %>%
  distinct(Region, Purpose)

## # A tibble: 1 × 2
##   Region Purpose 
##   <chr>  <chr>   
## 1 Sydney Visiting

tourism %>%
  group_by(Quarter, State) %>%
  mutate(Quarter = yearquarter(Quarter),
         Total_Trips = sum(Trips)) %>%
  select(Quarter, State, Total_Trips) %>%
  distinct() %>%
  as_tsibble(index = Quarter,
             key = State)

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
## # Groups:    State @ Quarter [640]
##    Quarter State Total_Trips
##      <qtr> <chr>       <dbl>
##  1 1998 Q1 ACT          551.
##  2 1998 Q2 ACT          416.
##  3 1998 Q3 ACT          436.
##  4 1998 Q4 ACT          450.
##  5 1999 Q1 ACT          379.
##  6 1999 Q2 ACT          558.
##  7 1999 Q3 ACT          449.
##  8 1999 Q4 ACT          595.
##  9 2000 Q1 ACT          600.
## 10 2000 Q2 ACT          557.
## # ℹ 630 more rows

8. Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?

What do you learn about the series?

What can you say about the seasonal patterns?

Can you identify any unusual years?

set.seed(21)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries, Turnover)

myseries %>% gg_season(Turnover)

myseries %>% gg_subseries(Turnover)

myseries %>% gg_lag(Turnover, geom = "point")

myseries %>% ACF(Turnover) %>% autoplot()