Homework 1

Hide Assignment Information Instructions Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_Symbol, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

Aus_production dataset

help(aus_production)

## starting httpd help server ... done

aus_production

The time interval is quarterly for the aus_production

aus_production %>% 
  autoplot(Bricks) +
  labs(
    y = "Bricks Produced",
    title = "Bricks Produced Quarterly: Melbourne, Australia"
  )

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Gafa_stock dataset

help(gafa_stock)
gafa_stock

The time interval is daily for the gafa_Symbol

gafa_stock %>% 
  autoplot(Close) +
  labs(
    y = "Closing Price",
    title = "Symbols Closing Price"
  )

# Pelt dataset

help(pelt)
pelt

The time interval is quarterly for the pelt is yearly

pelt %>% 
  autoplot(Lynx) +
  labs(
    y = "Lynx Pelts",
    title = "Lynx Pelts Collected Overtime"
  )

Vic_elec dataset

help(vic_elec)
vic_elec

The time interval is quarterly for the vic_elec is 30 minutes

vic_elec %>% 
  autoplot(Demand) +
  labs(
    y = "Demand",
    title = "Electricity Demand Overtime"
  )

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four Symbols in gafa_Symbol

peak_days <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  ungroup()

# Plot the closing prices and highlight the peak prices
gafa_stock %>%
  autoplot(Close) +
  geom_point(data = peak_days, aes(x = Date, y = Close), color = "red", size = 3) +
  labs(y = "Closing Price", title = "Symbol Closing Prices with Peak Days Highlighted") +
  theme_minimal()

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

tute1 <- readr::read_csv("https://raw.githubusercontent.com/Mikhail-Broomes/Data-624/main/Homeworks/Homework%201/tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

What happens when we don’t include facet_grid it makes tyhe grid uniform instead of separating the plots by the name of the series

2.4

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package. Create a tsibble from us_total with year as the index and state as the key.

library(USgas)

## Warning: package 'USgas' was built under R version 4.3.3

us_tibble <- us_total %>%
  as_tsibble(index = year, key = state)
us_tibble

Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

new_england_tibble <- us_tibble %>%
  filter(state %in% c("Maine", "Vermon", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  autoplot(y) +
  labs(title = "Annual Natural Gas Consumption by State in New England",
       y = "Consumption (billion cubic feet)")

new_england_tibble

2.5

Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

library(readxl)

## Warning: package 'readxl' was built under R version 4.3.3

url <- "https://raw.githubusercontent.com/Mikhail-Broomes/Data-624/main/Homeworks/Homework%201/tourism.xlsx"


temp_file <- tempfile(fileext = ".xlsx")
download.file(url, temp_file, mode = "wb")

df_tourism <- read_excel(temp_file)

print(df_tourism)

## # A tibble: 24,320 × 5
##    Quarter    Region   State           Purpose  Trips
##    <chr>      <chr>    <chr>           <chr>    <dbl>
##  1 1998-01-01 Adelaide South Australia Business  135.
##  2 1998-04-01 Adelaide South Australia Business  110.
##  3 1998-07-01 Adelaide South Australia Business  166.
##  4 1998-10-01 Adelaide South Australia Business  127.
##  5 1999-01-01 Adelaide South Australia Business  137.
##  6 1999-04-01 Adelaide South Australia Business  200.
##  7 1999-07-01 Adelaide South Australia Business  169.
##  8 1999-10-01 Adelaide South Australia Business  134.
##  9 2000-01-01 Adelaide South Australia Business  154.
## 10 2000-04-01 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows

Create a tsibble which is identical to the tourism tsibble from the tsibble package.

??tourism
tsibble::tourism

tourism_tsibble <- df_tourism %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter,  key = c("Region", "State", "Purpose"))

tourism_tsibble

Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_tsibble %>%
  group_by(Region, Purpose) %>%
  summarise(mean_trips = mean(Trips)) %>%
  filter(mean_trips == max(mean_trips))%>%
  arrange(desc(mean_trips), by_group=TRUE)

The combination of Melbourne and visiting had the maximum number of overnight trips on average.

Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism_tsibble %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips)) %>%
  as_tsibble(index = Quarter, key = State)

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend?

us_employment

us_employment %>% 
  filter(Title == "Total Private") %>%
  autoplot(Employed) +
  labs(title = "Time Series of US Employment",
       subtitle = "Total Private")

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_season(Employed) +
  labs(title = "Seasonal Plot of US Employment",
       subtitle = "Total Private")

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_subseries(Employed) +
  labs(title = "Subseries Plot of US Employment",
       subtitle = "Total Private")

us_employment %>% 
  filter(Title == "Total Private") %>%
  gg_lag(Employed) +
  labs(title = "Lag Plot of US Employment",
       subtitle = "Total Private")

us_employment %>% 
  filter(Title == "Total Private") %>%
  ACF(Employed) %>% 
  autoplot()

The employment data shows a generally positive trend over the time period. It doesn’t seem to follow a seasonal pattern because it looks similar in each month. Both the lag and autocorrelation plots suggest that the data is mostly influenced by a general trend, with each point closely related to the ones before it.

In the longer time series, there are a few years where the data drops. However, these increases and decreases happen at unpredictable times, making it difficult to identify any regular cycles or their lengths.

Bricks from aus_production

aus_production %>% 
  autoplot(Bricks) +
  labs(title = "Time Series of Bricks Produced",
       subtitle = "Melbourne, Australia")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_season(Bricks) +
  labs(title = "Seasonal Plot of Bricks Produced",
       subtitle = "Melbourne, Australia")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_subseries(Bricks) +
  labs(title = "Subseries Plot of Bricks Produced",
       subtitle = "Melbourne, Australia")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>% 
  gg_lag(Bricks) +
  labs(title = "Lag Plot of Bricks Produced",
       subtitle = "Melbourne, Australia")

## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production %>% 
  ACF(Bricks) %>% 
  autoplot()

The bricks produced data shows a clear seasonality in each quarter. There is also a general upward trend in the data, with the number of bricks produced increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the seasonal pattern.

Hare from pelt

pelt %>% 
  autoplot(Hare) +
  labs(title = "Time Series of Hare Pelts",
       subtitle = "Canada")

pelt %>% 
  gg_subseries(Hare) +
  labs(title = "Subseries Plot of Hare Pelts",
       subtitle = "Canada")

pelt %>% 
  gg_lag(Hare) +
  labs(title = "Lag Plot of Hare Pelts",
       subtitle = "Canada")

pelt %>% 
  ACF(Hare) %>% 
  autoplot()

The hare pelts data shows a clear cyclical pattern, with the number of pelts collected increasing and decreasing in a regular cycle. There is also a general upward trend in the data, with the number of pelts collected increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the cyclical pattern.

“H02” Cost from PBS

PBS %>% 
  filter(ATC2 == "H02") %>%
  autoplot(Cost) +
  labs(title = "Time Series of H02 Cost",
       subtitle = "PBS")

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_season(Cost)

PBS %>% 
  filter(ATC2 == "H02") %>%
  gg_subseries(Cost) +
  labs(title = "Subseries Plot of H02 Cost",
       subtitle = "PBS")

PBS %>% 
  filter(ATC2 == "H02") %>%
  ACF(Cost) %>% 
  autoplot()

The H02 cost data shows a clear seasonality in each month. There is also a general upward trend in the data, with the cost increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the seasonal pattern.

Barrels from us_gasoline

us_gasoline %>% 
  autoplot(Barrels) +
  labs(title = "Time Series of Barrels of Gasoline",
       subtitle = "US")

us_gasoline %>% 
  gg_season(Barrels) +
  labs(title = "Seasonal Plot of Barrels of Gasoline",
       subtitle = "US")

us_gasoline %>% 
  gg_subseries(Barrels) +
  labs(title = "Subseries Plot of Barrels of Gasoline",
       subtitle = "US")

us_gasoline %>% 
  gg_lag(Barrels) +
  labs(title = "Lag Plot of Barrels of Gasoline",
       subtitle = "US")

us_gasoline %>% 
  ACF(Barrels) %>% 
  autoplot()

From the data, it is clear that there is a pattern in the data, with the number of barrels of gasoline consumed increasing and decreasing in a regular cycle. There is also a general upward trend in the data, with the number of barrels consumed increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the seasonal pattern.

Data 624

Mikhail Broomes

2024-09-08