Homework 1

Exercise 2.1

Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.

Use autoplot() to plot some of the series in these data sets.
What is the time interval of each series?

Exercise 2.1 Answer

gafa_stock: A tsibble indexed by irregular trading days for historical stock prices from 2014 to 2018 for Google, Amazon, Facebook and Apple.
PBS: A monthly tsibble containing the following:
- Scripts: The total number of scripts.
- Cost: The cost of the scripts in $AUD.
vic_elecA half-hourly tsibble that contains the following:
- Demand: Total electricity demand for Victoria, Australia in MWh.
- Temperature: Temperature of Melbourne (BOM site 086071).
- Holiday: Indicator for if that day is a public holiday.
pelt: An annual tsibble that contains trade record data for the Hudson Bay Company for Showshoe Hare and Canadian Lynx furs from 1845 to 1935 (for all areas of the company). Contains the following:
- Hare: The number of Snowshoe Hare pelts traded.
- Lynx: The number of Canadian Lynx pelts traded.

Exercise 2.1a Answer

For gafa_stock, let’s create a time series plot that plots each of the adjusted close prices for each of the companies on 4 separate lines.

gafa_stock %>%
  group_by(Symbol) %>%
  autoplot(Adj_Close) +
  labs(y = "Adjusted Close",
       title = "Adjusted Close Stock Prices")

## `mutate_if()` ignored the following grouping variables:
## • Column `Symbol`

For PBS, we can use autoplot() to create a time series plot. Let’s consider just the cost of aggregate scripts where ATC2 == A01 and the Concession == Concessional.

PBS %>%
  filter(ATC2 == "A01" & Concession == "Concessional") %>%
  select(Month, Concession, Type, Cost) %>%
  summarise(Cost = sum(Cost)) %>%
  autoplot(Cost)

For vic_elec, let’s consider creating a plot where we plot all of the temperature readings contained within the tsibble.

vic_elec %>%
  autoplot(Temperature) +
  labs(y = "Degrees Celsius",
       title = "Half-hourly temperatures: Melbourne, Australia")

The output above shows us that there is seasonality inherent within the temperature data, which is to be expected.

For pelt, let’s use autoplot() to generate time series plots for the Hare and Lynx variables.

pelt_hare_plot <- pelt %>%
  autoplot(Hare)

pelt_lynx_plot <- pelt %>%
  autoplot(Lynx)

pelt_hare_plot + pelt_lynx_plot

### Question 2.1b Answer

gafa_stock: A tsibble indexed by irregular trading days.
PBS: A monthly tsibble.
vic_elecA half-hourly tsibble.
pelt: An annual tsibble.

Question 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

Question 2.2 Answer

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Date)

## Adding missing grouping variables: `Symbol`

## # A tsibble: 4 x 2 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date      
##   <chr>  <date>    
## 1 AAPL   2018-10-03
## 2 AMZN   2018-09-04
## 3 FB     2018-07-25
## 4 GOOG   2018-07-26

Question 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("https://bit.ly/fpptute1")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

Convert the data to time series

mytimeseries <- tute1 %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

Question 2.3c Answer

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

The exclusion of facet_grid results in three time series ploted on one graph with a common x and y axis, instead of each of the time series plotted on seperate graphs with seperate axes.

Question 2.4

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

Question 2.4b Answer

us_total %>%
  as_tsibble(
    index = year,
    key = state
  )

## # A tsibble: 1,266 x 3 [1Y]
## # Key:       state [53]
##     year state        y
##    <int> <chr>    <int>
##  1  1997 Alabama 324158
##  2  1998 Alabama 329134
##  3  1999 Alabama 337270
##  4  2000 Alabama 353614
##  5  2001 Alabama 332693
##  6  2002 Alabama 379343
##  7  2003 Alabama 350345
##  8  2004 Alabama 382367
##  9  2005 Alabama 353156
## 10  2006 Alabama 391093
## # … with 1,256 more rows

Question 2.4c Answer

us_total %>%
  as_tsibble(index = year, key = state) %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  group_by(state) %>%
  ggplot(aes(x = year, y = y)) +
  geom_line() +
  facet_grid(vars(state), scales = "free_y") +
  labs(title = "Natural Gas Consumption by State for the New England Area",
       y = "Yearly Natural Gas Consumption (million cubic feet)")

Question 2.5

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Question 2.5a Answer

In order to differentiate the tourism tsibble from the tsibble package with the imported .xlsx file, we will be calling the imported .xslx file data tourism_imported.

GET("https://bit.ly/fpptourism", write_disk(TF <- tempfile(fileext = ".xlsx")))

## Response [https://otexts.com/fpp3/extrafiles/tourism.xlsx]
##   Date: 2023-02-06 00:40
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 679 kB
## <ON DISK>  /tmp/Rtmp8YFhC4/file57811db3bc54.xlsx

tourism_imported <- readxl::read_excel(TF)

Question 2.5b Answer

tourism_imported_tsibble <- tourism_imported %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),
             index = Quarter)

tourism_imported_tsibble

## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # … with 24,310 more rows

Question 2.5c Answer

tourism %>%
  as.data.frame() %>%
  group_by(Region, Purpose) %>%
  summarise(Avg_trips = mean(Trips)) %>%
  arrange(desc(Avg_trips)) %>%
  head(5)

## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.

## # A tibble: 5 × 3
## # Groups:   Region [3]
##   Region          Purpose  Avg_trips
##   <chr>           <chr>        <dbl>
## 1 Sydney          Visiting      747.
## 2 Melbourne       Visiting      619.
## 3 Sydney          Business      602.
## 4 North Coast NSW Holiday       588.
## 5 Sydney          Holiday       550.

The output above shows us the top 5 combinations of Region and Purpose that had the maximum number of overnight trips on average, with the number 1 being Sydney for Region and Visiting for Purpose.

Question 2.5d Answer

By converting to a dataframe, we can group by Purposes, Regions, and State, and then sum the Trips column for each unique group using the summarise to get the output shown below. The first five rows are shown.

tourism %>%
  as.data.frame() %>%
  group_by(Region, Purpose, State) %>%
  summarise(Total_Trips_By_State = sum(Trips)) %>%
  head(5)

## `summarise()` has grouped output by 'Region', 'Purpose'. You can override using
## the `.groups` argument.

## # A tibble: 5 × 4
## # Groups:   Region, Purpose [5]
##   Region         Purpose  State           Total_Trips_By_State
##   <chr>          <chr>    <chr>                          <dbl>
## 1 Adelaide       Business South Australia               12442.
## 2 Adelaide       Holiday  South Australia               12523.
## 3 Adelaide       Other    South Australia                4525.
## 4 Adelaide       Visiting South Australia               16415.
## 5 Adelaide Hills Business South Australia                 213.

However, the output above is in the form of a dataframe and not a tsibble. We can’t convert the output above to a tsibble because there is no unique time index. I think what I did above is the right answer, just in the wrong format. For the tsibble object below, the Trips for each of the Quarters for each of the States are aggregated and shown below (just the first five rows), and I am not entirely sure this is the correct answer, since I believe the data frame makes more sense with respect to what the question is asking.

tourism %>%
  group_by(State) %>%
  summarise(Total_Trips_By_State = sum(Trips)) %>%
  head(5)

## # A tsibble: 5 x 3 [1Q]
## # Key:       State [1]
##   State Quarter Total_Trips_By_State
##   <chr>   <qtr>                <dbl>
## 1 ACT   1998 Q1                 551.
## 2 ACT   1998 Q2                 416.
## 3 ACT   1998 Q3                 436.
## 4 ACT   1998 Q4                 450.
## 5 ACT   1999 Q1                 379.

Question 2.8

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(23987)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Explore your chosen retail time series using the following functions:

autoplot(), gg_season(), gg_subseries(), gg_lag(),, ACF() %>% autoplot()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Question 2.8 Answer

The output below shows us that there is a clear and increasing trend. There is also a strong seasonal pattern as well. This seasonal pattern increases in size as the level of the series increases. This graph shows us an overall increase in Turnover. Also note there there is a small decrease in trend in 2011.

myseries %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

The seasonal plot below shows us that there is a large jump in turnover in December, and we are seeing this large jump across all of the years. The graph below also shows us that in recent years, in February, there seems to be a dip in turnover, and then the turnover rises back up the next month.

myseries %>%
  gg_season(labels = "both")

## Plot variable not specified, automatically selected `y = Turnover`

We can come to the same conclusions in this plot as we did for the previous plot. Also of note, all of the subseries plots show an increase in turnover for all of the months, which ties in with the autoplot() plot.

myseries %>%
  gg_subseries()

## Plot variable not specified, automatically selected `y = Turnover`

The lag plot below shows us that the first 12 lags have a strong positive relationship, with lag 12 having the strongest positive relationship, which is to be expected since the autoplot shows us a seasonality pattern which repeats every 12 months.

myseries %>%
  gg_lag(geom = "point", lags = 1:12)

## Plot variable not specified, automatically selected `y = Turnover`

For the plot below, I decided to set the lag_max = 36 since the previous plots revealed that there was a seasonality pattern inherent in the data that repeats itself every 12 months. Therefore a lag_max = 36 would represent a lag of 3 years. As we can see, the ACF plot shows us trend and seasonality. Trend because there is a slow decrease in lag in the ACF plot, and seasonality because of the “scalloped” shape of the ACF plot.

myseries %>%
  ACF(Turnover, lag_max = 36) %>%
  autoplot()

Based on all of the plots that were generated, there is seasonality and trend but there is no cyclicity.

DATA 624 - Homework 1

Peter Phung

2023-02-05