624-HW1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

help(aus_production)
help(pelt)
help(gafa_stock)
help(vic_elec)

##glimpse(aus_production)
autoplot(aus_production, Bricks) + ggtitle("Bricks Production Per Quarter")

Quarterly

#glimpse(pelt)
autoplot(pelt, Lynx) + ggtitle("Lynx Pelts Traded Per Year")

Yearly

#glimpse(gafa_stock)
autoplot(gafa_stock, Close) + ggtitle("GAFA Stock Close Prices Daily")

Daily

#glimpse(vic_elec)
autoplot(vic_elec, Demand) +
  ggtitle("Electricity Demand Per Half Hour") +
  labs(x = "Time (half hour)", y = "Elec Demand (Megawatts)")

Every-half-hour

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

peak_closing_prices <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close, na.rm = TRUE)) %>%
  select(Symbol, Date, Close)

peak_closing_prices

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

1 AAPL 2018-10-03 232. 2 AMZN 2018-09-04 2040. 3 FB 2018-07-25 218. 4 GOOG 2018-07-26 1268.

Download and explore tutel

tute1 <- readr::read_csv("/Users/aelsaeyed/Downloads/tute1.csv")
View(tute1)

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Facet_grid ensures that multiple subplots are made rather than one plot with three lines on it

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

us_gas_tsibble <- as_tsibble(us_total, key = state, index = year)

new_england_states <- c("Maine", "Vermont", "New Hampshire", 
                        "Massachusetts", "Connecticut", "Rhode Island")

new_england_gas <- us_gas_tsibble %>%
  filter(state %in% new_england_states)

autoplot(new_england_gas, y) +
  facet_grid(state ~ .) +
  ggtitle("Annual Natural Gas Consumption in New England States") +
  labs(x = "Year", y = "Total Consumption (Billion Cubic Feet)")

Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). Create a tsibble which is identical to the tourism tsibble from the tsibble package. Find what combination of Region and Purpose had the maximum number of overnight trips on average. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tourism <- readxl::read_excel("/Users/aelsaeyed/Downloads/tourism.xlsx")
View(tourism)

help(tourism)

tourism <- tourism %>%
  mutate(Quarter = yearquarter(Quarter))  #quarter is a string in the downloaded file, changing it to yearquarter

tourism_tsibble <- tourism %>%
  as_tsibble(key = c(Region, Purpose), index = Quarter)

max_average_trips <- tourism_tsibble %>%
  group_by(Region, Purpose) %>%
  summarise(avg_trips = mean(Trips, na.rm = TRUE)) %>%
  arrange(desc(avg_trips)) %>%
  slice(1)

max_average_trips

## # A tsibble: 76 x 4 [1Q]
## # Key:       Region, Purpose [76]
## # Groups:    Region [76]
##    Region                     Purpose  Quarter avg_trips
##    <chr>                      <chr>      <qtr>     <dbl>
##  1 Adelaide                   Visiting 2017 Q1     270. 
##  2 Adelaide Hills             Visiting 2002 Q4      81.1
##  3 Alice Springs              Holiday  1998 Q3      76.5
##  4 Australia's Coral Coast    Holiday  2014 Q3     198. 
##  5 Australia's Golden Outback Business 2017 Q3     174. 
##  6 Australia's North West     Business 2016 Q3     297. 
##  7 Australia's South West     Holiday  2016 Q1     612. 
##  8 Ballarat                   Visiting 2004 Q1     103. 
##  9 Barkly                     Holiday  1998 Q3      37.9
## 10 Barossa                    Holiday  2006 Q1      51.0
## # ℹ 66 more rows

“Visiting” the region Adelaide is the combo with the most average trips per quarter.

state_total_trips_tsibble <- tourism_tsibble %>%
  group_by(State) %>%
  summarise(total_trips = sum(Trips, na.rm = TRUE)) %>%
  as_tsibble(key = State, index = Quarter)

state_total_trips_tsibble

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    State Quarter total_trips
##    <chr>   <qtr>       <dbl>
##  1 ACT   1998 Q1        551.
##  2 ACT   1998 Q2        416.
##  3 ACT   1998 Q3        436.
##  4 ACT   1998 Q4        450.
##  5 ACT   1999 Q1        379.
##  6 ACT   1999 Q2        558.
##  7 ACT   1999 Q3        449.
##  8 ACT   1999 Q4        595.
##  9 ACT   2000 Q1        600.
## 10 ACT   2000 Q2        557.
## # ℹ 630 more rows

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

#glimpse(us_employment)

US Employment

total_private_employment <- us_employment %>%
  filter(Title == "Total Private")

autoplot(total_private_employment, Employed) +
  ggtitle("Total Monthly Private Employment") +
  labs(x = "Month", y = "Number Employed")

gg_season(total_private_employment, Employed) +
  ggtitle("Seasonal Plot of Total Private Employment")

gg_subseries(total_private_employment, Employed) +
  ggtitle("Subseries Plot of Total Private Employment")

gg_lag(total_private_employment, Employed) +
  ggtitle("Lag Plot of Total Private Employment")

ACF(total_private_employment, Employed) %>%
  autoplot() +
  ggtitle("ACF of Total Private Employment")

These graphs show an interesting upward trend in total private employment. Every year the total private employment goes up but there are dips in certain years. Looking at it from decade to decade, there seems to be a big mid-decade dip in the 40’s, 50’s, 70’s, 80’s, 90’s, and 00’s, with the other years being outliers. More granularly it seems that each year sees a rise and fall in employment which indicates a seasonal cycle within each year as well as in each decade.

Australian Brick Production

# Autoplot
autoplot(aus_production, Bricks) + ggtitle("Bricks Production Over Time")

# Seasonality
gg_season(aus_production, Bricks)

# Subseries plot
gg_subseries(aus_production, Bricks)

# Lag plot
gg_lag(aus_production, Bricks)

# Autocorrelation
ACF(aus_production, Bricks) %>% autoplot()

These graphs show that brick production gradually increased as a general trend from the 60s onward and hit their peak in the early 80, after which production started to fall. We also see some mid-decade dips in productivity and a cyclical rise and fall from year to year. Brick making might be seasonal- Q1 sees markedly less brick production than 2, and 3 is the peak. Q4 sees a gradual reduction.

Hare Pelts Traded

# Autoplot
autoplot(pelt, Hare) + ggtitle("Hare Pelts Traded Over Time")

# Subseries plot
gg_subseries(pelt, Hare)

# Lag plot
gg_lag(pelt, Hare)

# Autocorrelation
ACF(pelt, Hare) %>% autoplot()

#gg_season(pelt, Hare)

When trying to run gg_season, I got the error: “The data must contain at least one observation per seasonal period.” It seems there may be missing data and/or its not granular enough.

Overall it looks like pelt sales rise and spike mid-decade each decade before falling quickly, with particularly big spikes in the mid 60s and mid 80s.

Insurance Prescription Subsidies

pbs_ho2_only <- PBS %>%
  filter(ATC2 == "H02")

#glimpse(pbs_ho2_only)

# Autoplot
autoplot(pbs_ho2_only, Cost) + ggtitle("H02 Cost Over Time")

# Seasonality
gg_season(pbs_ho2_only, Cost)

# Subseries plot
gg_subseries(pbs_ho2_only, Cost)

# Autocorrelation
ACF(pbs_ho2_only, Cost) %>% autoplot()

I originally filtered the column ATC2 for “H02”, but it seems another column has several categories. From the ACF plot above I could see that it was the Concession and Type columns, so I filtered those and tried it again.

pbs_ho2_concession_type <- pbs_ho2_only %>%
  filter(Concession == "Concessional") %>%
  filter(Type == "Co-payments")

#glimpse(pbs_ho2_concession_type)

# Lag plot
gg_lag(pbs_ho2_concession_type, Cost)

There is a general upward trend for all the insurance copays, with a general trend of lower copays in the early year and higher in the late year for safety net. Concessional copays have the opposite trend, with no easily identifiable trend for general copays.

Gasoline Barrels Supplied

# Autoplot
autoplot(us_gasoline, Barrels) + ggtitle("Barrels of Gasoline Over Time")

# Seasonality
gg_season(us_gasoline, Barrels)

# Subseries plot
gg_subseries(us_gasoline, Barrels)

# Lag plot
gg_lag(us_gasoline, Barrels)

# Autocorrelation
ACF(us_gasoline, Barrels) %>% autoplot()

The general trend of gasoline production is upwards, and there does seem to be some seasonality- theres a rise and fall every year. There is also a plateau in the trend in the years leading up to 2009, after which production dips.