Load Libraries
library(tidyverse)
library(fpp3)
library(readxl)
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.
?aus_production # quarterly time interval
?pelt # annual time interval
?gafa_stock # irregular trading days from 2014-2018
?vic_elec # every 30 min daily
The time intervals for aus_production is quarterly time intervals for Beer, Tobacco, Bricks, Cement, Electricity and Gas.
The time interval for pelt is annual interval for trading for Lynx and Hare.
The time interval for gafa_stock is irregular trading days from 2014-2018 for Google, Amazon, Facebook and Apple.
The time interval for vic_elec is every 30 minutes tracking the demand, temperature and if it’s a holiday.
aus_production %>%
autoplot(Bricks) +
labs( y = "million units",
title = "Australian clay brick production")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
pelt %>%
autoplot(Lynx)+
labs( y = "Number of trades",
title = "Annual Canadian Lynx Tradings")
gafa_stock %>%
autoplot(Close) +
labs( y = "$US",
title = "Closing stock price")
vic_elec %>%
autoplot(Demand)+
labs( y = "total electricity demand in MWh",
title = "Half-hourly electricity demand for Victoria, Australia")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
# filter for closing price by grouping the symbol
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date, Close) %>%
arrange(desc(Close))
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AMZN 2018-09-04 2040.
## 2 GOOG 2018-07-26 1268.
## 3 AAPL 2018-10-03 232.
## 4 FB 2018-07-25 218.
The highest closing price was Amazon with a closing price of 2039.51 on 09/04/2018 followed by Google, Apple and Facebook.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read_csv("https://raw.githubusercontent.com/AnnaMoy/Data-624/main/tute1.csv")
mytimeseries <- tute1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter)
Check what happens when you don’t include facet_grid().
When you take out facet_grid then all the plots will be on one chart instead of their individual charts with their own respective value. This is helpful to use when you have similar data but with different ranges and the data can be seen individually instead of all on the same chart. With all the data on the same chart it’s hard to tell the ranges of each data for Sales, GDP, AdBudget.
The USgas package contains data on the demand for natural gas in the US.
Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
library(USgas)
# create tsibble
#index = time, key = categorical information
us_total <-us_total %>%
as_tsibble(key = state,
index = year)
# filter for particular states
us_total %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
autoplot(y) +
labs( y = "gas consumption",
title = "Annual Natural gas consumption by state")
# read the excel file from computer
tour <- read_excel("/Users/zhianna/Downloads/tourism.xlsx")
# create tsibble identical to tourism tsibble
tour1 <- tour %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
The maximum number of overnight trips on average was Region: Melbourne and Purpose: Visting.
#Combination of Purpose and Region to find max avg trips
purpose_region <- tour1 %>%
group_by(Purpose, Region) %>%
summarise(avg_trips = mean(Trips)) %>%
arrange(desc(avg_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Purpose`, `Region`, `Quarter` first.
purpose_region
## # A tsibble: 24,320 x 4 [1Q]
## # Key: Purpose, Region [304]
## # Groups: Purpose [4]
## Purpose Region Quarter avg_trips
## <chr> <chr> <qtr> <dbl>
## 1 Visiting Melbourne 2017 Q4 985.
## 2 Business Sydney 2001 Q4 948.
## 3 Visiting Sydney 2016 Q4 921.
## 4 Visiting Sydney 2017 Q4 920.
## 5 Visiting Sydney 2017 Q1 916.
## 6 Holiday South Coast 1998 Q1 915.
## 7 Holiday North Coast NSW 2016 Q1 906.
## 8 Business Sydney 2017 Q3 892.
## 9 Business Sydney 2017 Q2 884.
## 10 Visiting Sydney 2013 Q4 882.
## # ℹ 24,310 more rows
The combine of Purposes, Region and States had a total trip of 985.2784.
# New tsibble Purpose and Regions total trips by state
tour2 <- tour %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, Purpose),
index = Quarter) %>%
group_by(Purpose, Region, State) %>%
summarise(sum_trips = sum(Trips)) %>%
arrange(desc(sum_trips))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Purpose`, `Region`, `State`, `Quarter` first.
tour2
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Purpose, Region, State [304]
## # Groups: Purpose, Region [304]
## Purpose Region State Quarter sum_trips
## <chr> <chr> <chr> <qtr> <dbl>
## 1 Visiting Melbourne Victoria 2017 Q4 985.
## 2 Business Sydney New South Wales 2001 Q4 948.
## 3 Visiting Sydney New South Wales 2016 Q4 921.
## 4 Visiting Sydney New South Wales 2017 Q4 920.
## 5 Visiting Sydney New South Wales 2017 Q1 916.
## 6 Holiday South Coast New South Wales 1998 Q1 915.
## 7 Holiday North Coast NSW New South Wales 2016 Q1 906.
## 8 Business Sydney New South Wales 2017 Q3 892.
## 9 Business Sydney New South Wales 2017 Q2 884.
## 10 Visiting Sydney New South Wales 2013 Q4 882.
## # ℹ 24,310 more rows
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
us_employment %>%
filter(Title == "Total Private") %>%
autoplot(Employed) +
ggtitle("Total Private Employed")
us_employment %>%
filter(Title == "Total Private") %>%
gg_season(Employed) +
ggtitle("Total Private Employed")
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed) +
ggtitle("Total Private Employed")
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed) +
ggtitle("Total Private Employed")
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot() +
ggtitle("Total Private Employed")
The employment data shows a strong increasing trend and no seasonality as the months go it’s the same. The usual pattern in 2010 there is a bigger dip in the data. There is strong correlation.
aus_production %>%
autoplot(Bricks) +
ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_season(Bricks) +
ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_subseries(Bricks) +
ggtitle("Production on Bricks")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_lag(Bricks) +
ggtitle("Production on Bricks")
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot() +
ggtitle("Production on Bricks")
The data in Bricks is a cyclic behavior with seasonality behavior. The data shows bricks production is always increasing but is more recent years it has a downward trend and shows typically in Q1 production is slow and increases over time. There seems to be an increase in Q2 and Q3 during the years. There are some unusual drip in 1982 since it drops by a lot. The correlation is slow decreasing.
PBS %>%
filter(ATC2 == "H02") %>%
autoplot(Cost) +
ggtitle("H02 cost")
PBS %>%
filter(ATC2 == "H02") %>%
gg_season(Cost) +
ggtitle("H02 cost")
The cost of H02 is a upward trend with cyclic behavior and seasonality. The data fluctuates a lot depending on the concession and type. There is seasonality on the data for general/concessional safety net H02 as it is high in Jan and then decreases then increase as it’s later on the year. Some shows a big decrease in cost in Feb while others are showing an increase. Was only able to run gg_season for the data.
pelt %>%
autoplot(Hare) +
ggtitle("Hare Trading Records")
pelt %>%
gg_subseries(Hare) +
ggtitle("Hare Trading Records")
pelt %>%
gg_lag(Hare) +
ggtitle("Hare Trading Records")
pelt %>%
ACF(Hare) %>%
autoplot() +
ggtitle("Hare Trading Records")
The pelt data shows data going up and down which is cyclic. It shows data is consistently going up and down approximately every 5 years which indicates high and low correlations at times. It has usual increase around 1863 and 1883 which is higher than the other data. The data range is consistent for most with a few outliers. Received an error running gg_season for pelt. The autocorrelation is positive and negative.
us_gasoline %>%
autoplot(Barrels) +
ggtitle("Barrel Gasoline")
us_gasoline %>%
gg_season(Barrels)+
ggtitle("Barrel Gasoline")
us_gasoline %>%
gg_subseries(Barrels)+
ggtitle("Barrel Gasoline")
us_gasoline %>%
gg_lag(Barrels)+
ggtitle("Barrel Gasoline")
us_gasoline %>%
ACF(Barrels) %>%
autoplot()+
ggtitle("Barrel Gasoline")
The gasoline barrel cost is an upward trend. There is increasing through the years with a dip in 2011 and there is a decrease approximately in Oct typically. The correlation is strong in the beginning and then less as time goes on. There does not appear to be a consistent seasonality as the data is up and down constantly.