<- aus_production %>%
plot_bricks select(Quarter, Bricks) %>%
autoplot(Bricks)
<- pelt %>%
plot_lynx select(Year, Lynx) %>%
autoplot(Lynx)
<- gafa_stock %>%
plot_gafa select(Date, Close) %>%
autoplot(Close)
<- vic_elec %>%
plot_demand select(Time, Demand) %>%
autoplot(Demand) +
labs(title = "Electricity Demand", y = "dEmand", x = "[30MIN]")+
theme_minimal()
| plot_lynx) / (plot_gafa | plot_demand) (plot_bricks
Hyndman Chapter 2
2.1
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec. Use ? (or help()) to find out about the data in each series.
What is the time interval of each series? * aus_production -> quarterly * pelt -> annual * gafa_stock -> daily * vic_elec -> every half hourly
Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
2.2
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
%>%
gafa_stock group_by(Symbol) %>%
slice_max(order_by = Close, n = 1)
# A tsibble: 4 x 8 [!]
# Key: Symbol [4]
# Groups: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
It’s interesting that FB and Google had their highest closing prices one day apart.
2.3
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation. You can read and view data with the code below, but check out what happens if you don’t include facet_grid()
<- readr::read_csv("./tute1.csv")
tute1
<- tute1 |>
mytimeseries mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
<- mytimeseries |>
mtplot pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
<- mytimeseries |>
mtplot2 pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
| mtplot2) (mtplot
That’s pretty interesting. It puts them all on the same chart!
2.4
The USgas package contains data on the demand for natural gas in the US. Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
<- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
NE
<- us_total %>%
nef_tsibble filter(state %in% NE) %>%
as_tsibble(index = year, key = state)
ggplot(nef_tsibble, aes(x = year, y = y, color = state, group = state)) +
geom_line() +
theme_minimal() +
labs(title = "Gas consumption by state",
x = "Year",
y = "Value of y",
color = "State") +
scale_color_brewer(palette = "Set1")
2.5
Download tourism.xlsx from the book website and read it into R using read_excel(), and create a tsibble which is identical to the tourism tsibble from the tsibble package.
Comparing xlxs to library
From the tsibble package
data("tourism", package = "tsibble")
glimpse(tourism)
Rows: 24,320
Columns: 5
Key: Region, State, Purpose [304]
$ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
$ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
$ State <chr> "South Australia", "South Australia", "South Australia", "Sout…
$ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
$ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
CSV rom the web
## Downloading from the web.
<- "https://bit.ly/fpptourism"
url
<- tempfile(fileext = ".nogit.xlsx")
temp_file
download.file(url, temp_file, mode = "wb")
<- read_excel(temp_file) %>%
t_from_excel mutate(Quarter = yearquarter(ymd(Quarter))) #< this is the hard bit.
unlink(temp_file)
glimpse(t_from_excel)
Rows: 24,320
Columns: 5
$ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
$ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
$ State <chr> "South Australia", "South Australia", "South Australia", "Sout…
$ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
$ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…
Highest overnight trips
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
data("tourism", package = "tsibble")
<- tourism %>%
mx filter(Trips == max(Trips)) %>%
select("State", "Purpose")
<- mx$State
state_value <- mx$Purpose purpose_value
People mostly go to Victoria for Visiting.
Roll up to state
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
%>%
tourism as_tibble() %>%
group_by(Quarter, State) %>%
summarise(total_trips = sum(Trips), .groups = "drop") %>%
as_tsibble(index = Quarter, key=State) %>%
head(3)
# A tsibble: 3 x 3 [1Q]
# Key: State [1]
Quarter State total_trips
<qtr> <chr> <dbl>
1 1998 Q1 ACT 551.
2 1998 Q2 ACT 416.
3 1998 Q3 ACT 436.
2.8
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: Employed from us_employment, Bricks from aus_production, Hare from pelt, Cost from PBS, and Barrels from us_gasoline.
data(pelt, package="tsibbledata")
data(aus_production, package="tsibbledata")
data(PBS, package="tsibbledata")
data("us_employment")
<- function(df, time_column, value_column, do_plots) {
generate_time_series_plots <- as_tsibble(df, index = {{ time_column }})
df_tsibble <- list()
plots_list
if ("auto" %in% do_plots) {
"Autoplot"]] <- autoplot(df_tsibble) + labs(title = "Autoplot")
plots_list[[
}if ("season" %in% do_plots) {
"Seasonal_Plot"]] <- gg_season(df_tsibble, {{ value_column }}) + labs(title = "Seasonal Plot")
plots_list[[
}if ("subseries" %in% do_plots) {
"Subseries_Plot"]] <- df_tsibble %>% gg_subseries({{ value_column }}) + labs(title = "Subseries Plot")
plots_list[[
}if ("lag" %in% do_plots) {
"Lag_Plot"]] <- df_tsibble %>% gg_lag({{ value_column }}, geom = "point") + labs(title = "Lag Plot")
plots_list[[
}if ("acf" %in% do_plots) {
<- ACF(df_tsibble, {{ value_column }})
acf_result "ACF_Plot"]] <- autoplot(acf_result) + labs(title = "ACF Plot")
plots_list[[
}
return(plots_list)
}
Prescription Costs from PBS
It looks like costs are the highest around December / January, and lowest in February. This is a very seasonal industry.
<- PBS %>%
pre_f as_tibble() %>%
group_by(Month) %>%
summarise(Cost = sum(Cost, na.rm = TRUE)) %>%
as_tsibble(index = Month)
<- generate_time_series_plots(pre_f, time_column = Month, value_column = Cost,
pre_plots do_plots=c("auto", "season", "subseries", "lag", "acf"))
<- (pre_plots$Autoplot | pre_plots$Seasonal) /
plot_layout $Subseries_Plot | pre_plots$ACF_Plot) /
(pre_plots$Lag_Plot
pre_plots
+
plot_layout plot_layout(guides = "collect", ncol = 1, heights = c(1, 1, 2))
Employed from us_employment
Not much seasonality, at least not according the the subseries and seasonal plots. Also, there is an upward trend for people being employed. That makes a lot of sense, as when I was a child for every three people in the US back then, now there are four.
<- us_employment %>%
emp_f as_tibble() %>%
group_by(Month) %>%
summarise(Employed = sum(Employed, na.rm = TRUE)) %>%
as_tsibble(index = Month)
<- generate_time_series_plots(emp_f, time_column = Month, value_column = Employed,
emp_plots do_plots=c("auto", "season", "subseries", "lag", "acf"))
<- (emp_plots$Autoplot | emp_plots$Seasonal) /
plot_layout $Subseries_Plot | emp_plots$ACF_Plot) /
(emp_plots$Lag_Plot
emp_plots
+
plot_layout plot_layout(guides = "collect", ncol = 1, heights = c(1, 1, 2))
Bricks from aus_production
Very strong seasonality in that there is less production during the first few months oft the year. Teh industry itself looks like it has ups and downs, isn’t on a constant growth cycle. I suppose that makes sense as well. In boston, the entire back bay seems to be made out of bricks, and now almost nothing is.
= aus_production %>% select(Quarter, Bricks)
bricks_f <- generate_time_series_plots(bricks_f, time_column = Quarter, value_column = Bricks,
bricks_plots do_plots=c("auto", "season", "subseries", "lag", "acf"))
# Arrange the plots
<- (bricks_plots$Autoplot | bricks_plots$Seasonal) /
plot_layout $Subseries_Plot | bricks_plots$ACF_Plot) /
(bricks_plots$Lag_Plot
bricks_plots
# Apply the layout configuration
+
plot_layout plot_layout(guides = "collect", ncol = 1, heights = c(1, 1, 2))
Pelts from pelt
You can’t tell much about seasonality because the data is in annual increments. It’s almost better to plot this in a bar chart.
= pelt %>% select(Year, Hare)
pelts_f <- generate_time_series_plots(pelts_f, time_column = Year, value_column = Hare,
pelt_plots do_plots=c("auto", "subseries", "acf"))
$Autoplot | pelt_plots$Subseries_Plot | pelt_plots$ACF_Plot) (pelt_plots
Barrels from us_gasoline
This data is actual weeks, it’s almost a little too zoomed, but you do see some higher costs in teh summer months. It also shows usage patterns going up over time, of course we expect that.
<- generate_time_series_plots(us_gasoline, time_column = Week, value_column = Barrels, do_plots=c("auto", "season", "subseries", "lag", "acf"))
b_plots
<- (b_plots$Autoplot | b_plots$Seasonal) /
plot_layout $Subseries_Plot | b_plots$ACF_Plot) /
(b_plots$Lag_Plot
b_plots
+
plot_layout plot_layout(guides = "collect", ncol = 1, heights = c(1, 1, 2))