help function to explore what the series
gafa_stock, PBS, vic_elec and
pelt represent.autoplot() to plot some of the series in these data
sets.gafa_stock: A tsibble indexed by
irregular trading days for historical stock prices from 2014 to 2018 for
Google, Amazon, Facebook and Apple.
PBS: A monthly tsibble containing the
following:
Scripts: The total number of scripts.Cost: The cost of the scripts in $AUD.vic_elecA half-hourly tsibble that
contains the following:
Demand: Total electricity demand for Victoria,
Australia in MWh.Temperature: Temperature of Melbourne (BOM site
086071).Holiday: Indicator for if that day is a public
holiday.pelt: An annual tsibble that contains
trade record data for the Hudson Bay Company for Showshoe Hare and
Canadian Lynx furs from 1845 to 1935 (for all areas of the company).
Contains the following:
Hare: The number of Snowshoe Hare pelts traded.Lynx: The number of Canadian Lynx pelts traded.For gafa_stock, let’s create a time series plot that
plots each of the adjusted close prices for each of the companies on 4
separate lines.
gafa_stock %>%
group_by(Symbol) %>%
autoplot(Adj_Close) +
labs(y = "Adjusted Close",
title = "Adjusted Close Stock Prices")
## `mutate_if()` ignored the following grouping variables:
## • Column `Symbol`
For PBS, we can use autoplot() to create a
time series plot. Let’s consider just the cost of aggregate scripts
where ATC2 == A01 and the
Concession == Concessional.
PBS %>%
filter(ATC2 == "A01" & Concession == "Concessional") %>%
select(Month, Concession, Type, Cost) %>%
summarise(Cost = sum(Cost)) %>%
autoplot(Cost)
For vic_elec, let’s consider creating a plot where we
plot all of the temperature readings contained within the
tsibble.
vic_elec %>%
autoplot(Temperature) +
labs(y = "Degrees Celsius",
title = "Half-hourly temperatures: Melbourne, Australia")
The output above shows us that there is seasonality inherent within the
temperature data, which is to be expected.
For pelt, let’s use autoplot() to generate
time series plots for the Hare and Lynx
variables.
pelt_hare_plot <- pelt %>%
autoplot(Hare)
pelt_lynx_plot <- pelt %>%
autoplot(Lynx)
pelt_hare_plot + pelt_lynx_plot
### Question 2.1b Answer
gafa_stock: A tsibble indexed by
irregular trading days.
PBS: A monthly tsibble.
vic_elecA half-hourly tsibble.
pelt: An annual tsibble.
Use filter() to find what days corresponded to the peak
closing price for each of the four stocks in
gafa_stock.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Date)
## Adding missing grouping variables: `Symbol`
## # A tsibble: 4 x 2 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date
## <chr> <date>
## 1 AAPL 2018-10-03
## 2 AMZN 2018-09-04
## 3 FB 2018-07-25
## 4 GOOG 2018-07-26
Download the file tute1.csv from the book website, open
it in Excel (or some other spreadsheet application), and review its
contents. You should find four columns of information. Columns B through
D each contain a quarterly series, labelled Sales,
AdBudget and GDP. Sales contains
the quarterly sales for a small company over the period 1981-2005.
AdBudget is the advertising budget and GDP is
the gross domestic product. All series have been adjusted for
inflation.
tute1 <- readr::read_csv("https://bit.ly/fpptute1")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
mytimeseries <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include facet_grid().
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The exclusion of facet_grid results in three time series
ploted on one graph with a common x and y axis, instead of each of the
time series plotted on seperate graphs with seperate axes.
The USgas package contains data on the demand for
natural gas in the US.
USgas package.tsibble from us_total with
year as the index and state as the key.us_total %>%
as_tsibble(
index = year,
key = state
)
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # … with 1,256 more rows
us_total %>%
as_tsibble(index = year, key = state) %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
group_by(state) %>%
ggplot(aes(x = year, y = y)) +
geom_line() +
facet_grid(vars(state), scales = "free_y") +
labs(title = "Natural Gas Consumption by State for the New England Area",
y = "Yearly Natural Gas Consumption (million cubic feet)")
tourism.xlsx from the book website and read it
into R using readxl::read_excel().tsibble which is identical to the
tourism tsibble from the tsibble
package.Region and
Purpose had the maximum number of overnight trips on
average.tsibble which combines the
Purposes and Regions, and just has total trips
by State.In order to differentiate the tourism
tsibble from the tsibble package with the
imported .xlsx file, we will be calling the imported .xslx file data
tourism_imported.
GET("https://bit.ly/fpptourism", write_disk(TF <- tempfile(fileext = ".xlsx")))
## Response [https://otexts.com/fpp3/extrafiles/tourism.xlsx]
## Date: 2023-02-06 00:40
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 679 kB
## <ON DISK> /tmp/Rtmp8YFhC4/file57811db3bc54.xlsx
tourism_imported <- readxl::read_excel(TF)
tourism_imported_tsibble <- tourism_imported %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
tourism_imported_tsibble
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # … with 24,310 more rows
tourism %>%
as.data.frame() %>%
group_by(Region, Purpose) %>%
summarise(Avg_trips = mean(Trips)) %>%
arrange(desc(Avg_trips)) %>%
head(5)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 3
## # Groups: Region [3]
## Region Purpose Avg_trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
The output above shows us the top 5 combinations of
Region and Purpose that had the maximum number
of overnight trips on average, with the number 1 being
Sydney for Region and Visiting
for Purpose.
By converting to a dataframe, we can group by Purposes,
Regions, and State, and then sum the Trips
column for each unique group using the summarise to get the
output shown below. The first five rows are shown.
tourism %>%
as.data.frame() %>%
group_by(Region, Purpose, State) %>%
summarise(Total_Trips_By_State = sum(Trips)) %>%
head(5)
## `summarise()` has grouped output by 'Region', 'Purpose'. You can override using
## the `.groups` argument.
## # A tibble: 5 × 4
## # Groups: Region, Purpose [5]
## Region Purpose State Total_Trips_By_State
## <chr> <chr> <chr> <dbl>
## 1 Adelaide Business South Australia 12442.
## 2 Adelaide Holiday South Australia 12523.
## 3 Adelaide Other South Australia 4525.
## 4 Adelaide Visiting South Australia 16415.
## 5 Adelaide Hills Business South Australia 213.
However, the output above is in the form of a dataframe and not a
tsibble. We can’t convert the output above to a
tsibble because there is no unique time index. I think what
I did above is the right answer, just in the wrong format. For the
tsibble object below, the Trips for each of
the Quarters for each of the States are
aggregated and shown below (just the first five rows), and I am not
entirely sure this is the correct answer, since I believe the data frame
makes more sense with respect to what the question is asking.
tourism %>%
group_by(State) %>%
summarise(Total_Trips_By_State = sum(Trips)) %>%
head(5)
## # A tsibble: 5 x 3 [1Q]
## # Key: State [1]
## State Quarter Total_Trips_By_State
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
Monthly Australian retail data is provided in
aus_retail. Select one of the time series as follows (but
choose your own seed value):
set.seed(23987)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Explore your chosen retail time series using the following functions:
autoplot(), gg_season(),
gg_subseries(), gg_lag(),,
ACF() %>% autoplot()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
The output below shows us that there is a clear and increasing trend.
There is also a strong seasonal pattern as well. This seasonal pattern
increases in size as the level of the series increases. This graph shows
us an overall increase in Turnover. Also note there there
is a small decrease in trend in 2011.
myseries %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
The seasonal plot below shows us that there is a large jump in turnover in December, and we are seeing this large jump across all of the years. The graph below also shows us that in recent years, in February, there seems to be a dip in turnover, and then the turnover rises back up the next month.
myseries %>%
gg_season(labels = "both")
## Plot variable not specified, automatically selected `y = Turnover`
We can come to the same conclusions in this plot as we did for the
previous plot. Also of note, all of the subseries plots show an increase
in turnover for all of the months, which ties in with the
autoplot() plot.
myseries %>%
gg_subseries()
## Plot variable not specified, automatically selected `y = Turnover`
The lag plot below shows us that the first 12 lags have a strong
positive relationship, with lag 12 having the strongest positive
relationship, which is to be expected since the autoplot
shows us a seasonality pattern which repeats every 12 months.
myseries %>%
gg_lag(geom = "point", lags = 1:12)
## Plot variable not specified, automatically selected `y = Turnover`
For the plot below, I decided to set the lag_max = 36
since the previous plots revealed that there was a seasonality pattern
inherent in the data that repeats itself every 12 months. Therefore a
lag_max = 36 would represent a lag of 3 years. As we can
see, the ACF plot shows us trend and seasonality. Trend because there is
a slow decrease in lag in the ACF plot, and seasonality because of the
“scalloped” shape of the ACF plot.
myseries %>%
ACF(Turnover, lag_max = 36) %>%
autoplot()
Based on all of the plots that were generated, there is seasonality and trend but there is no cyclicity.