library(fpp3)
library(cowplot)
library(knitr)
library(httr)
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
data(aus_production, pelt, gafa_stock, vic_elec)
#To prevent the markdown file from opening Web pages for the help files, eval is set to false here.
?aus_production
?pelt
?gafa_stock
?vic_elec
The data in the aus_production data set was measured quarterly (although the help file seems to mistakenly call the data set a half-hourly tsibble). The data in the pelt data set was measured annually. The data in the gafa_stock data set was measured daily (excluding days the stock market wasn’t open). The data in the vic_elec data set was measured half-hourly.
p1 <- autoplot(aus_production, Bricks)
p2 <- autoplot(pelt, Lynx)
p3 <- autoplot(gafa_stock, Close)
p4 <- autoplot(vic_elec, Demand)
plot_grid(p1, p2, p3, p4)
p4 <- p4 +
labs(title = "Electricity Demand Over Time for Victoria, Australia",
subtitle = "Source: Australian Energy Market Operator",
x = "Half-Hour",
y = "Total Demand in MWh")
p4
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
gafa_stock_max <- gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close))
kable(gafa_stock_max, format="simple")
| Symbol | Date | Open | High | Low | Close | Adj_Close | Volume |
|---|---|---|---|---|---|---|---|
| AAPL | 2018-10-03 | 230.05 | 233.470 | 229.78 | 232.07 | 230.2755 | 28654800 |
| AMZN | 2018-09-04 | 2026.50 | 2050.500 | 2013.00 | 2039.51 | 2039.5100 | 5721100 |
| FB | 2018-07-25 | 215.72 | 218.620 | 214.27 | 217.50 | 217.5000 | 58954200 |
| GOOG | 2018-07-26 | 1251.00 | 1269.771 | 1249.02 | 1268.33 | 1268.3300 | 2405600 |
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
fn1 <- "https://raw.githubusercontent.com/geedoubledee/data624_homework1/main/tute1.csv"
tute1 <- readr::read_csv(fn1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
When you don’t include facet_grid(), the AdBudget, GDP, and Sales columns of data are plotted together rather than separately. Since none of the value ranges of these columns overlap, it is visually a little weird to plot them together. You can see more detailed information for each category when you plot them separately, and it doesn’t really take up more room than plotting them together anyway, so I think plotting them separately is preferred here.
The USgas package contains data on the demand for natural gas in the US.
library(USgas)
data(us_total)
us_total_ts <- us_total |>
as_tsibble(key = state, index = year)
new_england <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
us_total_ts |>
filter(state %in% new_england) |>
mutate(y = y / 1000) |>
autoplot(y) -> p5
p5 <- p5 +
labs(title = "New England Total Natural Gas Consumption",
subtitle = "Source: https://www.eia.gov/",
x = "Year",
y = "Billion Cubic Feet")
p5
fn2 <- "https://github.com/geedoubledee/data624_homework1/raw/main/tourism.xlsx"
temp <- tempfile(fileext = ".xlsx")
req <- GET(fn2, authenticate(Sys.getenv("GITHUB_PAT"), ""),
write_disk(path = temp))
tourism_duplicate <- readxl::read_excel(temp)
unlink(temp)
data(tourism)
keys <- c("Region", "State", "Purpose")
tourism_duplicate <- tourism_duplicate |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(key = all_of(keys), index = Quarter)
After creating the new tsibble object, we check to see if it is identical to the original.
identical(tourism, tourism_duplicate)
## [1] FALSE
The new tsibble object we created is not considered identical to the original, and we can narrow down why.
check <- as.matrix(tourism == tourism_duplicate)
tourism_duplicate[!check]
## [1] 7.545404 450.050853 37.947745
tourism[!check]
## [1] 7.545404 450.050853 37.947745
The same three values in the Trips column of both objects look identical, but are not considered so computationally. We can confirm the Trips column in both objects meets the all.equal standard for double precision floating numbers, however.
all.equal(tourism$Trips, tourism_duplicate$Trips)
## [1] TRUE
Thus, we have functionally identical objects.
tourism_duplicate_max_avg <- as_tibble(tourism_duplicate) |>
group_by(Region, Purpose) |>
summarize(Avg_Trips = mean(Trips)) |>
ungroup() |>
filter(Avg_Trips == max(Avg_Trips))
kable(tourism_duplicate_max_avg, format="simple")
| Region | Purpose | Avg_Trips |
|---|---|---|
| Sydney | Visiting | 747.27 |
tourism_duplicate_by_state <- as_tibble(tourism_duplicate) |>
select(-Region, -Purpose) |>
group_by(Quarter, State) |>
summarize(Trips = sum(Trips)) |>
as_tsibble(index = Quarter, key = State)
kable(head(tourism_duplicate_by_state, n = 10), format="simple")
| Quarter | State | Trips |
|---|---|---|
| 1998 Q1 | ACT | 551.0019 |
| 1998 Q2 | ACT | 416.0256 |
| 1998 Q3 | ACT | 436.0290 |
| 1998 Q4 | ACT | 449.7984 |
| 1999 Q1 | ACT | 378.5728 |
| 1999 Q2 | ACT | 558.1781 |
| 1999 Q3 | ACT | 448.9012 |
| 1999 Q4 | ACT | 594.8254 |
| 2000 Q1 | ACT | 599.6685 |
| 2000 Q2 | ACT | 557.1351 |
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), and ACF() to explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
data(us_employment)
data(PBS)
data(us_gasoline)
total_priv <- us_employment |>
filter(Title == "Total Private")
p6 <- total_priv |>
autoplot(Employed)
p7 <- total_priv |>
gg_season(Employed, period = "year")
p6
p7
The “Total Private” Employed data from the us_employment dataset exhibits a general upward trend. Seasonally, the lowest value for the year is generally recorded in January. Many companies use the calendar year or a close approximation to it as their fiscal year, so it makes sense that headcount decisions for the next year might finalize in December, explaining why there are always fewer people employed at the beginning of one year than there were at the end of the previous.
After the January low point, the value tends to increase until June. Then the value usually levels off until the end of the year. Positions are probably being filled more in the first half of the year than in the second half. In my experience, company executives usually feel optimistic about the year in Q1 and Q2 and are less optimistic by Q3 and Q4. Hiring pauses are normal in the second half of the year, when there’s not a lot of time left to hit sales goals and so forth, so keeping costs down becomes more important, and that often means waiting until Q1 or Q2 of the following year to start hiring again.
There were 5-year cycles between 1940-1960 where values would level off, then dip down after a growth period, eliminating about half the growth that occurred up until then. The growth/expansion cycles after that are of different lengths and varying impact.
There is an outlier year, 1945, where there is a huge drop in the value between August and September.
p8 <- autoplot(aus_production, Bricks)
p9 <- gg_season(aus_production, Bricks, period = "year")
p10 <- gg_subseries(aus_production, Bricks)
p11 <- gg_lag(aus_production, Bricks, geom = "point")
p8
p9
p10
p11
In the Bricks data from the aus_production dataset, we see that there was an upward trend until the early 1980s, at which point a downward trend begins.
Seasonally, Q3 is generally the busiest period, and Q1 and Q4 are generally the least busy. There are some outlier years in which Q2 is busier than Q3:
remove <- c("Beer", "Tobacco", "Cement", "Electricity", "Gas")
aus_production_tbl <- aus_production |>
as_tibble() |>
separate_wider_delim(Quarter, delim = " ", names = c("Year", "Quarter")) |>
select(-all_of(remove)) |>
pivot_wider(names_from = Quarter, values_from = Bricks) |>
filter(Q2 > Q3)
kable(aus_production_tbl, format = "simple")
| Year | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| 1974 | 478 | 526 | 518 | 417 |
| 1978 | 421 | 487 | 470 | 482 |
| 1982 | 501 | 560 | 512 | 412 |
| 1989 | 510 | 571 | 556 | 509 |
| 1990 | 458 | 510 | 494 | 460 |
| 1991 | 372 | 436 | 422 | 423 |
| 1995 | 430 | 457 | 417 | 370 |
| 2000 | 416 | 447 | 421 | 379 |
Cyclically, beginning in the 1970s, there are big dips around every five years or so that eliminate much of the growth gained in that five-yearish period. The dips are big enough such that since the general downward trend began in the early 1980s, the growth peaks after the dips never surpass previous growth peaks, even when the dips are not as big as previous dips were.
p12 <- autoplot(pelt, Hare)
p13 <- gg_lag(pelt, Hare, geom = "point")
p14 <- pelt |>
ACF(Hare) |>
autoplot()
p12
p13
p14
The Hare data from the pelt dataset does not exhibit an upward or downward trend. There appear to be 10-year cycles of rising to a peak and falling to a valley. 1863/1864 and 1885/1886 were exceptional peaks. Plotting the ACF makes it more clear that peaks are 10 years apart, and valleys are too.
ho2 <- PBS |>
filter(ATC2 == "H02")
p15 <- ho2 |>
autoplot(Cost)
p16 <- ho2 |>
gg_season(Cost, period = "year")
p15
p16
In the PBS dataset, I think we’re seeing the concessional safety net and the general safety net both have payment maximums that patients can hit, typically reducing cost to almost nothing between February/March to May. Costs then begin to rise again around June until November, when they rise drastically through January, after which they decline drastically. Concessional copayments on the other hand are lower at the end and beginning of the year and larger in the middle. I can’t see a pattern with general copayments, but they’re much lower than their counterparts. General safety net costs trended downward and have leveled off. Concessional copayments and safety net costs have trended upward.
p17 <- autoplot(us_gasoline, Barrels)
p18 <- gg_season(us_gasoline, Barrels, period = "year")
p17
p18
The us_gasoline dataset exhibits an upward trend. I can’t discern any other patterns unfortunately despite looking at a few different plots to see if they highlighted anything more interesting (that I could see and understand anyway).