1. Explore the following four time series: Bricks from
aus_production, Lynx from pelt, Close from gafa_stock, Demand from
vic_elec.
#loading Packages
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.0 ✔ fabletools 0.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
#loading datasets
data("aus_production")
data("pelt")
data("gafa_stock")
data("vic_elec")
Use ? (or help()) to find out about the data in each series.
?aus_production
?pelt
?gafa_stock
?vic_elec
What is the time interval of each series?
aus_production: Quarterly
pelt: Yearly
gafa_stock: Daily
vic_elec: Half-hourly
Use autoplot() to produce a time plot of each series.
autoplot(aus_production, Bricks) +
labs( title = "Quarterly Production of Bricks in Australia")
#For this visual, there seems to be an increasing trend, however more recent years so a gradual drop.
autoplot(pelt, Lynx) +
labs( title = "Amount of Lynx Pelts Traded 1845-1935")
#A bit difficult to see a trend here overall but there is seasonality.
autoplot(gafa_stock, Close) +
labs( title = "GAFA Stock Prices")
#From this time series we can see that up until mid 2016, Google has the highest Prices, but there on after Amazon had the highest prices.
autoplot(vic_elec, Demand) +
labs( title = "Electricity Demand in Victoria, Australia")
#Due to the time interval being a half hour, the visual is very crowded, but we can see a pattern where there's a spike in electricity demands at the beginning of each year.
For the last plot, modify the axis labels and title.
autoplot(vic_elec, Demand) +
labs(title = "Electricity Demands: Victoria, Australia", y = "Demand (MWh)")

2. Use filter() to find what days corresponded to the peak closing
price for each of the four stocks in gafa_stock.
gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
#Use max function to find dates with highest closing price for each stock.
We can see that for Apple has its peak 10/3/18, Amazon on 9/4/18,
Facebook on 7/25/18, and Google on 7/26/18.
3. Download the file tute1.csv from the book website, open it in
Excel (or some other spreadsheet application), and review its contents.
You should find four columns of information. Columns B through D each
contain a quarterly series, labelled Sales, AdBudget and GDP. Sales
contains the quarterly sales for a small company over the period
1981-2005. AdBudget is the advertising budget and GDP is the gross
domestic product. All series have been adjusted for inflation.
You can read the data into R with the following script:
tute1 <- readr::read_csv("C:/Users/natal/Documents/Masters/Cuny SPS MDS/Fall 2024/Data 624/week 2/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
Convert the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()

When you don’t use facet_grid() the three series are plotted on one
graph, but because their value ranges are vastly different, it’s harder
to see the patterns in the series of series that have smaller
fluctuations (ex. the GDP time series)
4. The USgas package contains data on the demand for natural gas in
the US.
Install the USgas package.
library(USgas)
data(us_total)
Create a tsibble from us_total with year as the index and state as
the key.
us_total <- us_total |>
as_tsibble( key = state, index = year)
view(us_total)
Plot the annual natural gas consumption by state for the New England
area (comprising the states of Maine, Vermont, New Hampshire,
Massachusetts, Connecticut and Rhode Island).
new_england_gas <- us_total |>
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island"))
autoplot(new_england_gas, y) +
labs( title = "New England Annual Gas Consumption", y = "Gas Consumption (Million Cubic Feet)")

5. Download tourism.xlsx from the book website and read it into R
using readxl::read_excel().
library(readxl)
tourism <- readxl::read_excel("C:/Users/natal/Documents/Masters/Cuny SPS MDS/Fall 2024/Data 624/week 2/tourism.xlsx")
Create a tsibble which is identical to the tourism tsibble from the
tsibble package.
tourism_tibble <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
# added quarter abbreviations to the quarter column to make it easier to read over 1/1,4/1,7/1 and 10/1.
Find what combination of Region and Purpose had the maximum number
of overnight trips on average.
tourism_tibble |>
group_by(Region, Purpose) |>
mutate(Trips = mean(Trips)) |>
ungroup() |>
filter(Trips == max(Trips)) |>
distinct(Region, Purpose, Trips)
## # A tibble: 1 × 3
## Region Purpose Trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
# We use the distinct function to aggregate region and purpose.
Sydney has the max number of average overnight trips for
visiting.
Create a new tsibble which combines the Purposes and Regions, and
just has total trips by State.
library(dplyr)
tourism_state <- tourism_tibble |>
group_by(State) |>
summarise(Trips = sum(Trips))|>
ungroup()
tourism_state
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
#We group by state and quarter in order to see total trips by state. I wasn't sure what combining purposes and regions meant, as that means I would need to add it to the group function in order to concatenate the two columns, but the question asked for the tsibble to only have total trips by state.
8. Use the following graphics functions: autoplot(), gg_season(),
gg_subseries(), gg_lag(), ACF() and explore features from the following
time series: “Total Private” Employed from us_employment, Bricks from
aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from
us_gasoline.
total_private <- us_employment |>
filter(Title == "Total Private")
total_private |> autoplot(Employed)

total_private |> gg_season(Employed)

total_private |> gg_subseries(Employed)

total_private |> gg_lag(Employed)

total_private |> ACF(Employed)
## # A tsibble: 29 x 3 [1M]
## # Key: Series_ID [1]
## Series_ID lag acf
## <chr> <cf_lag> <dbl>
## 1 CEU0500000001 1M 0.997
## 2 CEU0500000001 2M 0.993
## 3 CEU0500000001 3M 0.990
## 4 CEU0500000001 4M 0.986
## 5 CEU0500000001 5M 0.983
## 6 CEU0500000001 6M 0.980
## 7 CEU0500000001 7M 0.977
## 8 CEU0500000001 8M 0.974
## 9 CEU0500000001 9M 0.971
## 10 CEU0500000001 10M 0.968
## # ℹ 19 more rows
aus_production |> autoplot(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_season(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |> gg_lag(Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production |> ACF(Bricks)
## # A tsibble: 22 x 2 [1Q]
## lag acf
## <cf_lag> <dbl>
## 1 1Q 0.900
## 2 2Q 0.815
## 3 3Q 0.813
## 4 4Q 0.828
## 5 5Q 0.720
## 6 6Q 0.642
## 7 7Q 0.655
## 8 8Q 0.692
## 9 9Q 0.609
## 10 10Q 0.556
## # ℹ 12 more rows
pelt |> autoplot(Hare)

pelt |> gg_subseries(Hare)

pelt |> gg_lag(Hare)

pelt |> ACF(Hare)
## # A tsibble: 19 x 2 [1Y]
## lag acf
## <cf_lag> <dbl>
## 1 1Y 0.658
## 2 2Y 0.214
## 3 3Y -0.155
## 4 4Y -0.401
## 5 5Y -0.493
## 6 6Y -0.401
## 7 7Y -0.168
## 8 8Y 0.113
## 9 9Y 0.307
## 10 10Y 0.340
## 11 11Y 0.296
## 12 12Y 0.206
## 13 13Y 0.0372
## 14 14Y -0.153
## 15 15Y -0.285
## 16 16Y -0.295
## 17 17Y -0.202
## 18 18Y -0.0676
## 19 19Y 0.0956
h_02 <- PBS |> filter(ATC2 == "H02")
h_02 |> autoplot(Cost)

h_02 |> gg_season(Cost)

h_02 %>% gg_subseries(Cost)

h_02 |> ACF(Cost)
## # A tsibble: 92 x 6 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Concession Type ATC1 ATC2 lag acf
## <chr> <chr> <chr> <chr> <cf_lag> <dbl>
## 1 Concessional Co-payments H H02 1M 0.834
## 2 Concessional Co-payments H H02 2M 0.679
## 3 Concessional Co-payments H H02 3M 0.514
## 4 Concessional Co-payments H H02 4M 0.352
## 5 Concessional Co-payments H H02 5M 0.264
## 6 Concessional Co-payments H H02 6M 0.219
## 7 Concessional Co-payments H H02 7M 0.253
## 8 Concessional Co-payments H H02 8M 0.337
## 9 Concessional Co-payments H H02 9M 0.464
## 10 Concessional Co-payments H H02 10M 0.574
## # ℹ 82 more rows
us_gasoline |> autoplot(Barrels)

us_gasoline |> gg_season(Barrels)

us_gasoline |> gg_subseries(Barrels)

us_gasoline |> gg_lag(Barrels)

us_gasoline |> ACF(Barrels)
## # A tsibble: 31 x 2 [1W]
## lag acf
## <cf_lag> <dbl>
## 1 1W 0.893
## 2 2W 0.882
## 3 3W 0.873
## 4 4W 0.866
## 5 5W 0.847
## 6 6W 0.844
## 7 7W 0.832
## 8 8W 0.831
## 9 9W 0.822
## 10 10W 0.808
## # ℹ 21 more rows
Can you spot any seasonality, cyclicity and trend?
Overall, the total private dataset shows an increase in employment
during summer months, this would indicate seasonality.
What do you learn about the series?
The series is quite repetitive. We can see that volume does increase
with time, but the pattern remains predictable.
What can you say about the seasonal patterns?
There seems to be an increase in employment each year from April to
August, then there is a small decrease after those months.
Can you identify any unusual years?
During 2008 to 2010, there appears to be a more dramatic dip in
employment in comparison to other years.