Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.”
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series. What is the time interval of each series. Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
library(tidyverse)
library(dplyr)
# install.packages("ggplot2")
# install.packages("ggfortify")
library(ggplot2)
library(ggfortify)
library(tsibble)
library(tsibbledata)
#install.packages("feasts")
library(feasts)
# ?aus_production
# ?pelt
# ?gafa_stock
# ?vic_elec
# aus_production
# pelt
# gafa_stock
# vic_elec
#Time Interval for the Bricks from aus_production
aus_production |> autoplot(Bricks) +
labs(y = "million units", title = "Australian Clay Brick production in millions of bricks, Time Interval is Quarter")
#Time Interval for Lynx from pelt
pelt |> autoplot(Lynx) +
labs(y = "Number of Lynx Furs", title = "Lynx Furs Traded from 1845 to 1935, Time Interval is Annual")
#Time Interval for Close from gafa_stock
gafa_stock |> autoplot(Close) +
labs(y = "Closing price of stock", title = "Closing price of stock, Time Interval is Day")
#Time Interval for Demand from vic_elec
vic_elec |> autoplot(Demand) +
labs(y = "Total electricity demand in MWh.", title = "Half-hourly electricity demand for Victoria, Australia, Time Interval is Half-Hourly")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
Answer: Per the above autoplot, it looks like filter and max(close) applied has provided the peak stock prices on the following dates which align with the chart above
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents.
You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product.
All series have been adjusted for inflation.
tute1 <- readr::read_csv("tute1.csv")
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
# +facet_grid(name ~ ., scales = "free_y")
The USgas package contains data on the demand for natural gas in the US.
Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
#install.packages("USgas")
library(USgas)
# ?usgas
str(usgas)
## 'data.frame': 92783 obs. of 5 variables:
## $ date : Date, format: "1973-01-01" "1973-01-01" ...
## $ process : chr "Commercial Consumption" "Residential Consumption" "Commercial Consumption" "Residential Consumption" ...
## $ state : chr "U.S." "U.S." "U.S." "U.S." ...
## $ state_abb: chr "U.S." "U.S." "U.S." "U.S." ...
## $ y : int 392315 843900 394281 747331 310799 648504 231943 465867 174258 326313 ...
## - attr(*, "units")= chr "MMCF"
## - attr(*, "product_name")= chr "Natural Gas"
## - attr(*, "source")= chr "EIA API: https://www.eia.gov/opendata/browser/natural-gas"
head(us_total)
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
us_gas <- us_total |> mutate(year=year) |>
as_tsibble(key = state, index = year)
head(us_gas)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
us_gas |> filter(state =="Maine"| state =="Vermont"| state =="New Hampshire"| state =="Massachusetts"| state =="Connecticut" | state =="Rhode Island") |>
ggplot(aes(x = year, y = y, colour = state)) +
geom_line() +
facet_grid(state ~ ., scales = "free_y") +
labs(title = "annual natural gas consumption by state for the New England area", y = "Gas Consumption, million cubic feet")
Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
Questions
Create a tsibble which is identical to the tourism tsibble from the
tsibble package. Find what combination of Region and Purpose had the
maximum number of overnight trips on average. Answer: Sydney and
Visiting combination of Region and Purpose gives us max overnight trips
on average as 747.2700
Create a new tsibble which combines the Purposes and Regions, and just
has total trips by State.
# ?tourism
#tourism
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
tourism1 <- readxl::read_excel("tourism.xlsx")
#Create similar Tsibble to Tourism
mytimeseries2 <- tourism1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter, key = c(Region, State,Purpose))
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
head(mytimeseries2)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
#Find the max number of overnight trips on average by Region and purpose
Final <- tourism1 %>%
group_by(Region, Purpose) %>% summarise(mean1 = mean(Trips)) %>% filter(mean1 == max(mean1)) %>% arrange(desc(mean1))
head(Final)
## # A tibble: 6 × 3
## # Groups: Region [6]
## Region Purpose mean1
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 North Coast NSW Holiday 588.
## 4 Gold Coast Holiday 528.
## 5 South Coast Holiday 495.
## 6 Brisbane Visiting 493.
#Create a new Tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism2 <- tourism1 %>%
group_by(Purpose, Region, State) %>%
summarise(Total_Trips = sum(Trips)) %>% arrange(desc(Total_Trips))
head(tourism2)
## # A tibble: 6 × 4
## # Groups: Purpose, Region [6]
## Purpose Region State Total_Trips
## <chr> <chr> <chr> <dbl>
## 1 Visiting Sydney New South Wales 59782.
## 2 Visiting Melbourne Victoria 49512.
## 3 Business Sydney New South Wales 48164.
## 4 Holiday North Coast NSW New South Wales 47032.
## 5 Holiday Sydney New South Wales 44026.
## 6 Holiday Gold Coast Queensland 42267.
tourism2 %>%
group_by(State) %>% summarise(Total_Trips = sum(Total_Trips))
## # A tibble: 8 × 2
## State Total_Trips
## <chr> <dbl>
## 1 ACT 41007.
## 2 New South Wales 557367.
## 3 Northern Territory 28614.
## 4 Queensland 386643.
## 5 South Australia 118151.
## 6 Tasmania 54137.
## 7 Victoria 390463.
## 8 Western Australia 147820.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Questions (us_employment)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
Answer : There is a long term increase in the data so there is an increasing trend shown by the blue line below. seasonal plot allows the underlying seasonal pattern to be seen more clearly, and can be useful in identifying years in which the pattern changes.It looks like we see some changes and unusual points in the plot below, there is a dip in employed people from 2008 around the financial crisis. Additionally the subseries plot shows that there was peak and trough on the employment data and then a significant upward trend in numbers until 2020.
library ("fpp3")
us_employment |> filter(Title == "Total Private") %>% autoplot(Employed) + geom_smooth()
head(us_employment)
## # A tsibble: 6 x 4 [1M]
## # Key: Series_ID [1]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
recent5 <- us_employment %>% filter(year(Month)>=2002)
recent5 %>%filter(Title == "Total Private") %>% gg_season(Employed, period = "year", labels="both")
recent5 %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed)
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed) +
ggtitle("Lag Plot")
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot() +
ggtitle("Autocorrelation Function")
Questions (Bricks)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
Answer : There is a long term increase in the data so there is an increasing trend shown by the blue line below. seasonal plot allows the underlying seasonal pattern to be seen more clearly, and can be useful in identifying years in which the pattern changes.It looks like we see some changes and unusual points in the plot below, there is a dip in brick production 1995 and 2000 from Q2 there is downward shift which is different from rest of the pattern.
aus_production |> autoplot(Bricks, interval = 'Quarter') + geom_smooth()
labs(y = "million units", title = "Australian Clay Brick production in millions of bricks, Time Interval is Quarter")
## $y
## [1] "million units"
##
## $title
## [1] "Australian Clay Brick production in millions of bricks, Time Interval is Quarter"
##
## attr(,"class")
## [1] "labels"
recent3 <- aus_production %>% filter(year(Quarter)>=1992)
head(recent3)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1992 Q1 443 5777 383 1289 38332 117
## 2 1992 Q2 410 5853 404 1501 39774 151
## 3 1992 Q3 420 6416 446 1539 42246 175
## 4 1992 Q4 532 5825 420 1568 38498 129
## 5 1993 Q1 433 5724 394 1450 39460 116
## 6 1993 Q2 421 6036 462 1668 41356 149
recent3 %>% gg_season(Bricks, period = "year", labels="both") + ggtitle("Season") + geom_point()
aus_production %>%
gg_subseries(Bricks) + ggtitle("Subseries Plot")
recent4 <- aus_production %>% filter(year(Quarter)>=1992)
head(recent4)
## # A tsibble: 6 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1992 Q1 443 5777 383 1289 38332 117
## 2 1992 Q2 410 5853 404 1501 39774 151
## 3 1992 Q3 420 6416 446 1539 42246 175
## 4 1992 Q4 532 5825 420 1568 38498 129
## 5 1993 Q1 433 5724 394 1450 39460 116
## 6 1993 Q2 421 6036 462 1668 41356 149
recent4 %>%
gg_lag(Bricks) + ggtitle("Lag Plot") + geom_point()
aus_production %>%
ACF(Bricks) %>%
autoplot() + ggtitle("Autocorrelation")
Questions (Hare)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
Annual data does not have a seasonal pattern. Therefore for Hare we don’t have gg_season.Lag1 shows positive correlated data, however, lag 5 and lag6 show negative correlation between the data.
pelt |> autoplot(Hare)
pelt %>%
gg_subseries(Hare) + ggtitle("Subseries Plot")
pelt %>%
gg_lag(Hare) + ggtitle("Lag Plot")
pelt %>%
ACF(Hare) %>%
autoplot() + ggtitle("Autocorrelation")
Questions (PBS)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
PBS
## # A tsibble: 67,596 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [336]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payme… A Alimenta… A01 STOMATOL… 18228 67877
## 2 1991 Aug Concessional Co-payme… A Alimenta… A01 STOMATOL… 15327 57011
## 3 1991 Sep Concessional Co-payme… A Alimenta… A01 STOMATOL… 14775 55020
## 4 1991 Oct Concessional Co-payme… A Alimenta… A01 STOMATOL… 15380 57222
## 5 1991 Nov Concessional Co-payme… A Alimenta… A01 STOMATOL… 14371 52120
## 6 1991 Dec Concessional Co-payme… A Alimenta… A01 STOMATOL… 15028 54299
## 7 1992 Jan Concessional Co-payme… A Alimenta… A01 STOMATOL… 11040 39753
## 8 1992 Feb Concessional Co-payme… A Alimenta… A01 STOMATOL… 15165 54405
## 9 1992 Mar Concessional Co-payme… A Alimenta… A01 STOMATOL… 16898 61108
## 10 1992 Apr Concessional Co-payme… A Alimenta… A01 STOMATOL… 18141 65356
## # ℹ 67,586 more rows
PBS %>% filter(ATC2 == "H02") %>%
autoplot(Cost) +
ggtitle("Autoplot")
TEST <- PBS %>% filter(ATC2 == "H02")
TEST %>% gg_season(Cost) + ggtitle("Season")
TEST %>%
gg_subseries(Cost) + ggtitle("Subseries Plot")
head(TEST)
## # A tsibble: 6 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payme… H Systemic… H02 CORTICOS… 63261 317384
## 2 1991 Aug Concessional Co-payme… H Systemic… H02 CORTICOS… 53528 269891
## 3 1991 Sep Concessional Co-payme… H Systemic… H02 CORTICOS… 52822 269703
## 4 1991 Oct Concessional Co-payme… H Systemic… H02 CORTICOS… 54016 280418
## 5 1991 Nov Concessional Co-payme… H Systemic… H02 CORTICOS… 49281 268070
## 6 1991 Dec Concessional Co-payme… H Systemic… H02 CORTICOS… 51798 277139
TEST %>%
ACF(Cost) %>%
autoplot() + ggtitle("Autocorrelation")
Questions (Barrels)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
The seasonal plots show sudden dips between 2000 and 2004.
#install.packages("fpp3")
library(fpp3)
us_gasoline |> autoplot(Barrels)
# ?us_gasoline
recent6 <- us_gasoline %>% filter(year(Week)>=2000)
recent6 %>%
gg_season() + ggtitle("Season")
us_gasoline %>%
gg_subseries() + ggtitle("Subseries Plot")
us_gasoline %>%
gg_lag() + ggtitle("Lag Plot")
us_gasoline %>%
ACF(Barrels) %>%
autoplot() + ggtitle("Autocorrelation")