Exercises: 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
help(aus_production)
## starting httpd help server ... done
help(pelt)
help(gafa_stock)
help(vic_elec)
aus_production.desc <- "Quarterly production of selected commodities in Australia."
pelt.desc <- "Pelt trading records"
gafa_stock.desc <- "GAFA (Google, Amazon, Facebook, Apple) stock prices"
vic_elec.desc <- "Half-hourly electricity demand for Victoria, Australia"
The time interval for aus_production is quarterly. The
time interval for pelt is annual. The time interval for
gafa_stock is daily (on irregular trading days). The time
interval for vic_elec is half-hourly.
autoplot(aus_production,Bricks) + labs(title = aus_production.desc)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
autoplot(pelt,Lynx) + labs(title = pelt.desc)
autoplot(gafa_stock,Close) + labs(title = gafa_stock.desc)
autoplot(vic_elec,Demand) + labs(title = vic_elec.desc)
autoplot(vic_elec,Demand) +
labs(
title = vic_elec.desc,
subtitle = "Plotting total electrical demand in MWh") +
xlab("Date and Time (in 30 min intervals)") +
ylab("Electricity Demand (MWh)")
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date, Close)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv(file.path(projPath, "tute1.csv"))
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
When I don’t include
facet_grid, all of the series are
graphed on one plot.
I installed USgas in my setup block at the top.
us_total_ts <- us_total %>%
as_tsibble(index = year, key = state)
us_total_ts_NE <- us_total_ts %>%
filter(state %in% c( "Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island"))
ggplot(data = us_total_ts_NE,
aes(x = year, y = y, colour = state)
) +
geom_line() +
labs(title = "Annual Natural Gas Consumption by State for the New England area") +
ylab("Gas Consumption")
tourism_xlsx <- readxl::read_excel(file.path(projPath, "tourism.xlsx"))
glimpse(tsibble::tourism)
## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
tourism_ts <- tourism_xlsx %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
glimpse(tourism_ts)
## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
tourism_ts %>%
group_by(Region, Purpose) %>%
summarize(avg_trips = mean(Trips),
.groups = "drop") %>%
select(Region, Purpose, avg_trips) %>%
arrange(desc(avg_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## # A tsibble: 24,320 x 4 [1Q]
## # Key: Region, Purpose [304]
## Region Purpose avg_trips Quarter
## <chr> <chr> <dbl> <qtr>
## 1 Melbourne Visiting 985. 2017 Q4
## 2 Sydney Business 948. 2001 Q4
## 3 Sydney Visiting 921. 2016 Q4
## 4 Sydney Visiting 920. 2017 Q4
## 5 Sydney Visiting 916. 2017 Q1
## 6 South Coast Holiday 915. 1998 Q1
## 7 North Coast NSW Holiday 906. 2016 Q1
## 8 Sydney Business 892. 2017 Q3
## 9 Sydney Business 884. 2017 Q2
## 10 Sydney Visiting 882. 2013 Q4
## # ℹ 24,310 more rows
I honestly can’t seem to figure out if there is a way to drop the Index (Quarter) grouping that is present in the tsibble. So I am going to use the non tsibble data.
tourism_ts %>%
as_tibble() %>%
group_by(Region, Purpose) %>%
summarize(avg_trips = mean(Trips),
.groups = "drop") %>%
select(Region, Purpose, avg_trips) %>%
arrange(desc(avg_trips))
## # A tibble: 304 × 3
## Region Purpose avg_trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # ℹ 294 more rows
Now I can see that across the full range of time, the average number of trips was highest for Syndey with the Visitng purpose. Avg_trips was 747.
I think the implication here is that because we still want this to be a tsibble, that the columns should be Quarter, State, total trips.
tourism_ts_d <- tourism_ts %>%
group_by(State) %>%
summarize(total_trips = sum(Trips))
head(tourism_ts_d)
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter total_trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
#help(us_employment)
#help(aus_production)
#help(pelt)
#help(PBS)
#help(us_gasoline)
us_employment.desc <- "US monthly employment data"
# Exploring
head(us_employment)
## # A tsibble: 6 x 4 [1M]
## # Key: Series_ID [1]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
# Plotting
autoplot(us_employment %>% filter(Title == "Total Private"), Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_season(Employed)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries( Employed)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed, lags = 1:12)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend? Findings?
Unusual years?
It looks like there is a positive trend in employment. For seasonality, there seems to be a subtle increase in roughly the first half of the year, after which it appears flatter. The lag plots imply strong autocorrelation. There are some unusual dips in the mid 1970s, early 1980s, early 1990s, early 2000s, and 2008.
#aus_production.desc <- "Quarterly production of selected commodities in Australia." # already done earlier
# Plotting
aus_production %>%
autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_season(y = Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_subseries(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_lag(Bricks, lags = 1:4)
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?
The Australia clay brick production shows signs up seasonality cyclicty and trend. From the seasonality aspect, production is lowest in Q1 and highest in Q2 and Q3. From the cyclicity standpoint, we see these drops in production an subsequent growth up again every ~ 4-6 years or so. For the trend and unusual years, the lag and autocorrelation plots imply moderate autocorrelation. It looks like production trended up from 1956 - 1980, and has since trended down. (I’m a little confused on if the proper way to look at this then would be to divide the data into pre and post 1980.)
#pelt.desc <- "Pelt trading records" # already done earlier
# Plotting
pelt %>%
autoplot(Hare)
pelt %>%
gg_subseries(Hare)
pelt %>%
gg_lag(Hare)
pelt %>%
ACF(Hare) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?
There is definite cyclicity, with cycles of spikes and valleys every 10ish or so years. It’s hard to guess at any seasonality with this annual data data set.
PBS.desc <- "Monthly Medicare Australia prescription data"
# Exploring
unique(PBS$ATC2)
## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A09" "A10" "A11" "A12" "A14"
## [13] "A15" "B01" "B02" "B03" "B05" "C01" "C02" "C03" "C04" "C05" "C07" "C08"
## [25] "C09" "C10" "D" "D01" "D02" "D04" "D05" "D06" "D07" "D08" "D10" "D11"
## [37] "G01" "G02" "G03" "G04" "H01" "H02" "H03" "H04" "H05" "J01" "J02" "J04"
## [49] "J05" "J06" "J07" "L01" "L02" "L03" "L04" "M01" "M02" "M03" "M04" "M05"
## [61] "N02" "N03" "N04" "N05" "N06" "N07" "P01" "P02" "P03" "R" "R01" "R03"
## [73] "R05" "R06" "S" "S01" "S02" "S03" "V01" "V03" "V04" "V06" "V07" "Z"
# Plotting
PBS %>%
filter(ATC2 == "H02") %>%
autoplot(Cost) +
facet_grid(Concession ~ Type, scales = "free_y")
PBS %>%
filter(ATC2 == "H02") %>%
gg_season(Cost)
PBS %>%
filter(ATC2 == "H02") %>%
gg_subseries(Cost)
#PBS %>%
# filter(ATC2 == "H02") %>%
# gg_lag(Cost, lags = 1:12)
# The above code cause en error, so I'm trying to find the unique time series
PBS %>%
filter(ATC2 == "H02") %>%
as_tibble() %>% # because without this, it kept using the time groupings too
group_by(Concession, Type, ATC1, ATC2) %>%
summarise(
Total_Cost = sum(Cost),
Total_Scripts = sum(Scripts)
)
## `summarise()` has grouped output by 'Concession', 'Type', 'ATC1'. You can
## override using the `.groups` argument.
## # A tibble: 4 × 6
## # Groups: Concession, Type, ATC1 [4]
## Concession Type ATC1 ATC2 Total_Cost Total_Scripts
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Concessional Co-payments H H02 100043176. 16884470
## 2 Concessional Safety net H H02 45609431. 5305140
## 3 General Co-payments H H02 3799151. 508692
## 4 General Safety net H H02 7251449. 1301852
PBS %>%
filter(ATC2 == "H02", Concession == "Concessional", Type == "Co-payments") %>%
gg_lag(Cost, lags = 1:12) +
labs(title = "Lag Plot: Concessional Co-payments")
PBS %>%
filter(ATC2 == "H02", Concession == "Concessional", Type == "Safety net") %>%
gg_lag(Cost, lags = 1:12) +
labs(title = "Lag Plot: Concessional Safety Net")
PBS %>%
filter(ATC2 == "H02", Concession == "General", Type == "Co-payments") %>%
gg_lag(Cost, lags = 1:12) +
labs(title = "Lag Plot: General Co-payments")
PBS %>%
filter(ATC2 == "H02", Concession == "General", Type == "Safety net") %>%
gg_lag(Cost, lags = 1:12) +
labs(title = "Lag Plot: General Safety Net")
PBS %>%
filter(ATC2 == "H02") %>%
ACF(Cost) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?
There are four different series here: Concessional Co-payments, General Co-payments, Concessional Safety Net and General Saftey Net. We see seasonality in all but General Co-payments. The safety net costs sink low in February, and trend up throughout the year until the following February. The concessional co-payments see a gentle peak around April. Over the years, Concessional has trended upwards in cost. Some interesting years can be seen with General Safety Net and General Co-payments costs dropping between 1994-1997, and then staying lower since then.
us_gasoline.desc <- "US finished motor gasoline product supplied"
# Plotting
us_gasoline %>%
autoplot(Barrels)
us_gasoline %>%
gg_season(Barrels)
us_gasoline %>%
gg_subseries(Barrels)
us_gasoline %>%
gg_lag(Barrels)
us_gasoline %>%
ACF(Barrels) %>%
autoplot()
Can you spot any seasonality, cyclicity and trend? Findings? Unusual years?
Barrels of gasoline supplied trended steadily upwards until around 2006, trended down 2006-2012, then trended up again. We see pretty strong signals of autocorrelation.