library(fpp3)
library(cowplot)
library(knitr)
library(httr)

Exercise 2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

data(aus_production, pelt, gafa_stock, vic_elec)
#To prevent the markdown file from opening Web pages for the help files, eval is set to false here.
?aus_production
?pelt
?gafa_stock
?vic_elec

The data in the aus_production data set was measured quarterly (although the help file seems to mistakenly call the data set a half-hourly tsibble). The data in the pelt data set was measured annually. The data in the gafa_stock data set was measured daily (excluding days the stock market wasn’t open). The data in the vic_elec data set was measured half-hourly.

p1 <- autoplot(aus_production, Bricks)
p2 <- autoplot(pelt, Lynx)
p3 <- autoplot(gafa_stock, Close)
p4 <- autoplot(vic_elec, Demand)
plot_grid(p1, p2, p3, p4)

p4 <- p4 +
    labs(title = "Electricity Demand Over Time for Victoria, Australia",
         subtitle = "Source: Australian Energy Market Operator",
         x = "Half-Hour",
         y = "Total Demand in MWh")
p4

Exercise 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock_max <- gafa_stock |>
    group_by(Symbol) |>
    filter(Close == max(Close))
kable(gafa_stock_max, format="simple")
Symbol Date Open High Low Close Adj_Close Volume
AAPL 2018-10-03 230.05 233.470 229.78 232.07 230.2755 28654800
AMZN 2018-09-04 2026.50 2050.500 2013.00 2039.51 2039.5100 5721100
FB 2018-07-25 215.72 218.620 214.27 217.50 217.5000 58954200
GOOG 2018-07-26 1251.00 1269.771 1249.02 1268.33 1268.3300 2405600

Exercise 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

fn1 <- "https://raw.githubusercontent.com/geedoubledee/data624_homework1/main/tute1.csv"
tute1 <- readr::read_csv(fn1)
mytimeseries <- tute1 |>
    mutate(Quarter = yearquarter(Quarter)) |>
    as_tsibble(index = Quarter)
mytimeseries |>
    pivot_longer(-Quarter) |>
    ggplot(aes(x = Quarter, y = value, colour = name)) +
    geom_line() +
    facet_grid(name ~ ., scales = "free_y")

mytimeseries |>
    pivot_longer(-Quarter) |>
    ggplot(aes(x = Quarter, y = value, colour = name)) +
    geom_line()

When you don’t include facet_grid(), the AdBudget, GDP, and Sales columns of data are plotted together rather than separately. Since none of the value ranges of these columns overlap, it is visually a little weird to plot them together. You can see more detailed information for each category when you plot them separately, and it doesn’t really take up more room than plotting them together anyway, so I think plotting them separately is preferred here.

Exercise 2.4

The USgas package contains data on the demand for natural gas in the US.

library(USgas)
data(us_total)
us_total_ts <- us_total |>
    as_tsibble(key = state, index = year)
new_england <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
us_total_ts |>
    filter(state %in% new_england) |>
    mutate(y = y / 1000) |>
    autoplot(y) -> p5
p5 <- p5 +
    labs(title = "New England Total Natural Gas Consumption",
         subtitle = "Source: https://www.eia.gov/",
         x = "Year",
         y = "Billion Cubic Feet")
p5

Exercise 2.5

fn2 <- "https://github.com/geedoubledee/data624_homework1/raw/main/tourism.xlsx"
temp <- tempfile(fileext = ".xlsx")
req <- GET(fn2, authenticate(Sys.getenv("GITHUB_PAT"), ""),
           write_disk(path = temp))
tourism_duplicate <- readxl::read_excel(temp)
unlink(temp)
data(tourism)
keys <- c("Region", "State", "Purpose")
tourism_duplicate <- tourism_duplicate |>
    mutate(Quarter = yearquarter(Quarter)) |>
    as_tsibble(key = all_of(keys), index = Quarter)

After creating the new tsibble object, we check to see if it is identical to the original.

identical(tourism, tourism_duplicate)
## [1] FALSE

The new tsibble object we created is not considered identical to the original, and we can narrow down why.

check <- as.matrix(tourism == tourism_duplicate)
tourism_duplicate[!check]
## [1]   7.545404 450.050853  37.947745
tourism[!check]
## [1]   7.545404 450.050853  37.947745

The same three values in the Trips column of both objects look identical, but are not considered so computationally. We can confirm the Trips column in both objects meets the all.equal standard for double precision floating numbers, however.

all.equal(tourism$Trips, tourism_duplicate$Trips)
## [1] TRUE

Thus, we have functionally identical objects.

tourism_duplicate_max_avg <- as_tibble(tourism_duplicate) |>
    group_by(Region, Purpose) |>
    summarize(Avg_Trips = mean(Trips)) |>
    ungroup() |>
    filter(Avg_Trips == max(Avg_Trips))
kable(tourism_duplicate_max_avg, format="simple")
Region Purpose Avg_Trips
Sydney Visiting 747.27
tourism_duplicate_by_state <- as_tibble(tourism_duplicate) |>
    select(-Region, -Purpose) |>
    group_by(Quarter, State) |>
    summarize(Trips = sum(Trips)) |>
    as_tsibble(index = Quarter, key = State)
kable(head(tourism_duplicate_by_state, n = 10), format="simple")
Quarter State Trips
1998 Q1 ACT 551.0019
1998 Q2 ACT 416.0256
1998 Q3 ACT 436.0290
1998 Q4 ACT 449.7984
1999 Q1 ACT 378.5728
1999 Q2 ACT 558.1781
1999 Q3 ACT 448.9012
1999 Q4 ACT 594.8254
2000 Q1 ACT 599.6685
2000 Q2 ACT 557.1351

Exercise 2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), and ACF() to explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

data(us_employment)
data(PBS)
data(us_gasoline)
total_priv <- us_employment |>
    filter(Title == "Total Private") 
p6 <- total_priv |>
    autoplot(Employed)
p7 <- total_priv |>
    gg_season(Employed, period = "year")
p6

p7

The “Total Private” Employed data from the us_employment dataset exhibits a general upward trend. Seasonally, the lowest value for the year is generally recorded in January. Many companies use the calendar year or a close approximation to it as their fiscal year, so it makes sense that headcount decisions for the next year might finalize in December, explaining why there are always fewer people employed at the beginning of one year than there were at the end of the previous.

After the January low point, the value tends to increase until June. Then the value usually levels off until the end of the year. Positions are probably being filled more in the first half of the year than in the second half. In my experience, company executives usually feel optimistic about the year in Q1 and Q2 and are less optimistic by Q3 and Q4. Hiring pauses are normal in the second half of the year, when there’s not a lot of time left to hit sales goals and so forth, so keeping costs down becomes more important, and that often means waiting until Q1 or Q2 of the following year to start hiring again.

There were 5-year cycles between 1940-1960 where values would level off, then dip down after a growth period, eliminating about half the growth that occurred up until then. The growth/expansion cycles after that are of different lengths and varying impact.

There is an outlier year, 1945, where there is a huge drop in the value between August and September.

p8 <- autoplot(aus_production, Bricks)
p9 <- gg_season(aus_production, Bricks, period = "year")
p10 <- gg_subseries(aus_production, Bricks)
p11 <- gg_lag(aus_production, Bricks, geom = "point")
p8

p9

p10

p11

In the Bricks data from the aus_production dataset, we see that there was an upward trend until the early 1980s, at which point a downward trend begins.

Seasonally, Q3 is generally the busiest period, and Q1 and Q4 are generally the least busy. There are some outlier years in which Q2 is busier than Q3:

remove <- c("Beer", "Tobacco", "Cement", "Electricity", "Gas")
aus_production_tbl <- aus_production |>
    as_tibble() |>
    separate_wider_delim(Quarter, delim = " ", names = c("Year", "Quarter")) |>
    select(-all_of(remove)) |>
    pivot_wider(names_from = Quarter, values_from = Bricks) |>
    filter(Q2 > Q3)
kable(aus_production_tbl, format = "simple")
Year Q1 Q2 Q3 Q4
1974 478 526 518 417
1978 421 487 470 482
1982 501 560 512 412
1989 510 571 556 509
1990 458 510 494 460
1991 372 436 422 423
1995 430 457 417 370
2000 416 447 421 379

Cyclically, beginning in the 1970s, there are big dips around every five years or so that eliminate much of the growth gained in that five-yearish period. The dips are big enough such that since the general downward trend began in the early 1980s, the growth peaks after the dips never surpass previous growth peaks, even when the dips are not as big as previous dips were.

p12 <- autoplot(pelt, Hare)
p13 <- gg_lag(pelt, Hare, geom = "point")
p14 <- pelt |>
    ACF(Hare) |>
    autoplot()
p12

p13

p14

The Hare data from the pelt dataset does not exhibit an upward or downward trend. There appear to be 10-year cycles of rising to a peak and falling to a valley. 1863/1864 and 1885/1886 were exceptional peaks. Plotting the ACF makes it more clear that peaks are 10 years apart, and valleys are too.

ho2 <- PBS |>
    filter(ATC2 == "H02")
p15 <- ho2 |>
    autoplot(Cost)
p16 <- ho2 |>
    gg_season(Cost, period = "year")
p15

p16

In the PBS dataset, I think we’re seeing the concessional safety net and the general safety net both have payment maximums that patients can hit, typically reducing cost to almost nothing between February/March to May. Costs then begin to rise again around June until November, when they rise drastically through January, after which they decline drastically. Concessional copayments on the other hand are lower at the end and beginning of the year and larger in the middle. I can’t see a pattern with general copayments, but they’re much lower than their counterparts. General safety net costs trended downward and have leveled off. Concessional copayments and safety net costs have trended upward.

p17 <- autoplot(us_gasoline, Barrels)
p18 <- gg_season(us_gasoline, Barrels, period = "year")
p17

p18

The us_gasoline dataset exhibits an upward trend. I can’t discern any other patterns unfortunately despite looking at a few different plots to see if they highlighted anything more interesting (that I could see and understand anyway).