Problems 2.1,2.2,2.3,2.4,2.5 and 2.8
library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.1.8 ✔ tsibble 1.1.3
## ✔ dplyr 1.1.0 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.0
## ✔ lubridate 1.9.1 ✔ fable 0.3.2
## ✔ ggplot2 3.4.0 ✔ fabletools 0.3.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(readr)
Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.
help(gafa_stock)
The help function provides a detailed description of the gafa_stock data set. The data sets description is that the data is historical stock prices from 2014-18 for Apple,Amazon, Facebook(renamed Meta in October 2021) and Google (Google reorganized into Alphabet Inc, and has two main companies that trade under the symbols GOOG and GOOGL). This particular data set is using the GOOG trading symbol.
The format of the data set is timeseries of class tibble.
The details state that gafa_stock is a tsibble containing data on irregular trading days with the fields of “Open”,“High”,“Low”,“Close”, “Adj_Close” and “Volume”. Each stock is identified by one key, which is the ticker symbol for the stock.
The souce of the data is Yahoo Finance Historical Data.
This plot style isn’t good for this group of data because of the scale of prices are off. This makes the plot show more details for GOOG and AMZN than it does for AAPL and FB. This plot is basically useless for gaining any visual data about AAPL and FB. For AMZN and GOOG, there does seem to be a long term trend in the prices. The data does not look to be seasonal, rather cyclical. During the last quarter of 2015 into the first quarter of 2016 and 2017 into 2018 the prices of these two stocks sink, then rise. While during the ending of 2014 and 2016 the movements are much less pronounced. While the last half of 2018 has a very steep drop.
autoplot(gafa_stock,Close)
The PBS data sets is of the monthly medicare prescription data in Australia. It is tsibble with two values, one being “Scripts”, the other “Cost”(in $AUD).
The format of the data set is timeseries of class tibble.
The data is disaggregated using four keys:
Concession: Concessional scripts are given to pensioners, unemployed, dependents, and other card holders
Type: Co-payments are made until an individual’s script expenditure hits a threshold ($290.00 for concession, $1141.80 otherwise). Safety net subsidies are provided to individuals exceeding this amount.
ATC1: Anatomical Therapeutic Chemical index (level 1)
ATC2: Anatomical Therapeutic Chemical index (level 2)
help(PBS)
Using auto plot straight up on the PBS data set won’t work. The data set needs to be grouped in some fashion. The quickest and easiest grouping is by total cost by month.
This data set looks to have seasonality as the data is a repeating pattern that is increasing and decreasing in similar intervals.
PBS %>%
summarise(TotalC = sum(Cost)) %>%
autoplot(TotalC) +
labs(title = "Total Costs of Scripts",
y = "Total Cost")
vic_elec is a half-hourly tsibble with three fields
Demand is the total electricity demand in MWh.
Temperature is the temperature of Melbourne, Australia.
Holiday: Indicator for if that day is a public holiday.
The format of the data is a time series of class tsibble.
This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).
The source of the data is Australian Energy Market Operator.
help(vic_elec)
This time series appears to have seasonality, as the data increases and decreases in regular intervals. There might be a trend forming with the peaks of each year using more power.
autoplot(vic_elec,Demand)
This dataset contains the Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935.
The format is time series of class tsibble
The pelt data is an annual tsibble with two values Hare and Lynx that represents how many pelts were traded.
the source of data is the Hudson Bay Company
help(pelt)
This time series appears to have seasonality, as the data increases and decreases in regular intervals. There also appears to be a long term increasing trend and a long term decreasing trend.
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
During the timeframe of 2014 to 2018, the peak of this group of stocks all happened in 2018 between the 3rd and 4th quarters.
The peak of Facebook and Google happened a day apart, with Facebook peaking on July 25, 2018 and Google peaking on July 26, 2018.
Apple peaked the latest of the group, peaking on October 3, 2018. While Amazon peaked on September 4, 2018.
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date)
## # A tsibble: 4 x 2 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date
## <chr> <date>
## 1 AAPL 2018-10-03
## 2 AMZN 2018-09-04
## 3 FB 2018-07-25
## 4 GOOG 2018-07-26
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
tute_series <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
tute_series %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include facet_grid().
Facet creates subplots, which plots every symbol in its own separate plot, while using the same axis.
Without the facet being used, all the plots are on the same plot.
By using the subplot you gain more accurate information as you can see a more precise scale of each plot.
tute_series %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The USgas package contains data on the demand for natural gas in the US.
#install.packages('USgas')
library(USgas)
us_total <- us_total %>%
as_tibble(key = state,
index = year)
Connecticut has a strong upward trend, although there does not appear to be any seasonality. It also has the second highest amount of natural gas used.
Maine had peak consumption in 2002. From then on Maines natural gas consumption has been on a downward trend. Maine consumes the second least amount of natural gas out of the New England area states.
Massachusetts gas consumption has a slight upward trend. It uses the most natural gas of all the New England states.
New Hampshire gas usage has no real trend of increasing or decreasing, but it does seem to have seaonality where the consumption spikes every three years or so. New Hampshire consumes roughly 55,000, which puts it in the middle of the amount of usage out of New England states.
Rhode Island’s natural gas consumption peaked in 1998. It bottomed out in 2004 and has increasing steadily since then. Rhode Island uses the third most amount of natural gas, behind Massachusetts and Connecticut
Vermont has the least amount of natural gas consumption, consuming roughly 14,000 units. Although from 2012 to 2019 its consumption has increased rapidly from roughly 8,000 units to roughly 14,000 units. Almost doubling in consumption in seven years.
us_total %>%
filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) %>%
ggplot(aes(x = year, y = y, colour = state)) +
geom_line() +
facet_grid(state ~., scales = "free_y") +
labs(title = "Annual Natural Gas Consumption in New England area",
y = "Consumption")
library(readxl)
tourism <- readxl::read_excel('tourism.xlsx')
tourism_ts<- tourism %>%
group_by(Region, State) %>%
summarise(Total_Trips = sum(Trips))
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
tourism_ts
## # A tibble: 76 × 3
## # Groups: Region [76]
## Region State Total_Trips
## <chr> <chr> <dbl>
## 1 Adelaide South Australia 45906.
## 2 Adelaide Hills South Australia 2299.
## 3 Alice Springs Northern Territory 4529.
## 4 Australia's Coral Coast Western Australia 15167.
## 5 Australia's Golden Outback Western Australia 15017.
## 6 Australia's North West Western Australia 13067.
## 7 Australia's South West Western Australia 41825.
## 8 Ballarat Victoria 11017.
## 9 Barkly Northern Territory 1388.
## 10 Barossa South Australia 3850.
## # … with 66 more rows
highest_avg_rg_purp <- tourism %>%
group_by(Region, Purpose) %>%
mutate(Avg_Trips = mean(Trips)) %>%
filter(Avg_Trips == max(Avg_Trips)) %>%
distinct(Region, Purpose,Avg_Trips)%>%
arrange(desc(Avg_Trips))
highest_avg_rg_purp
## # A tibble: 304 × 3
## # Groups: Region, Purpose [304]
## Region Purpose Avg_Trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # … with 294 more rows
total_trips <- tourism %>%
group_by(Region) %>%
mutate(Total_Trips =sum(Trips)) %>%
distinct(Region,Total_Trips)%>%
arrange(desc(Total_Trips))
total_trips
## # A tibble: 76 × 2
## # Groups: Region [76]
## Region Total_Trips
## <chr> <dbl>
## 1 Sydney 161607.
## 2 Melbourne 136170.
## 3 Brisbane 98485.
## 4 North Coast NSW 87675.
## 5 Gold Coast 70404.
## 6 South Coast 65306.
## 7 Experience Perth 62743.
## 8 Sunshine Coast 57848.
## 9 Hunter 56837.
## 10 Adelaide 45906.
## # … with 66 more rows
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
head(aus_retail,10)
## # A tsibble: 10 x 5 [1M]
## # Key: State, Industry [1]
## State Industry Serie…¹ Month Turno…²
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurants and… A33498… 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Sep 4.2
## 7 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Oct 4.8
## 8 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Nov 5.4
## 9 Australian Capital Territory Cafes, restaurants and… A33498… 1982 Dec 6.9
## 10 Australian Capital Territory Cafes, restaurants and… A33498… 1983 Jan 3.8
## # … with abbreviated variable names ¹`Series ID`, ²Turnover
set.seed(2151)
aus_retail <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Explore your chosen retail time series using the following functions:
autoplot(aus_retail,Turnover)
aus_retail %>% gg_season(Turnover)
aus_retail %>% gg_subseries(Turnover)
aus_retail %>% gg_lag(Turnover,geom='point')
aus_retail %>% ACF(Turnover) %>% autoplot()
In the autoplot a pattern can be seen. There was a long term upward trend from 1982 until 2010. Then there was a drop off in of roughly 30% to 40% from 2010 to 2012. From 2012 until 2018 Australian retail has been on another long term trend.
Seasonality also looks to be in the autoplot, but becomes more apparent in the gg_seasonality plot. Where retail tends to drop from Jan to Feb, then rises in March. From March until June retail tends to drop, only to rise until August. Then it proceeds to drop until November and December when retail makes sharp gains. Although there are years that buck this seasonal trend, it is not easy to tell if that bucking is in regular intervals, such as every three years.
Due to the bucking in the seasonality data I would assume this is due to business cycles, national economics or possibly weather cycles.
The acf plot shows us that there are strong correlations from zero lags up until 26 lags. This further validates that there are strong trends, seasonality and cyclical patterns in the data.