Time series graphics notes:
Forecasting is predicting the future from historical data as accurately as possible.
short-term forecasts: Needed for the scheduling of personnel, production and transportation.
Medium-term forecasts: Needed to determine future resources requirements.
Long-term forecasts: Used in strategic planning. These decisions must take accounts of market opportunities, environmental factors and internal resources.
The process of analyzing time series data using statistics and modeling to make prediction and inform strategic decision-making.
Time series patterns:
Trend: Long term increase or decrease in the data; changing direction, does not have to be linear.
Seasonal: Fixed and known period that is affected by seasonal factors such as time of the year or week.
Cyclic: Not fixed period. Data exhibit ride and falls due to economic condition.
Tsibble function allows multiple time series to be stored in a single object.
Tsibble function requires a key and index.
Autoplot
Seasonal plot
Scatterplots
Lag plot
#Libraries required for the exercises
library(tsibble)
library(dplyr)
library(ggplot2)
library(tsibbledata)
#install.packages("feasts")
library(feasts)
library(zoo)
a. Use autoplot() to plot some of the series in these data sets.
1. gafa_stock
help(gafa_stock)
## starting httpd help server ... done
gafa_stock is a historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD.
Source: Yahoo Finance historical data
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ... with 5,022 more rows
autoplot(gafa_stock, Close)
2. PBS
?PBS
PBS is a monthly tsibble with two values:
Scripts: Total number of scripts Cost: Cost of the scripts in $AUD
Source: Medicare Australia
PBS
## # A tsibble: 67,596 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [336]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 18228 67877
## 2 1991 Aug Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15327 57011
## 3 1991 Sep Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 14775 55020
## 4 1991 Oct Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15380 57222
## 5 1991 Nov Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 14371 52120
## 6 1991 Dec Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15028 54299
## 7 1992 Jan Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 11040 39753
## 8 1992 Feb Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 15165 54405
## 9 1992 Mar Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 16898 61108
## 10 1992 Apr Concession~ Co-pa~ A Alimentary~ A01 STOMATOLOG~ 18141 65356
## # ... with 67,586 more rows
autoplot(PBS, Cost)
3. vic_elec
help("vic_elec")
vic_elec is a half-hourly tsibble with three values:
Demand: Total electricity demand in MW. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday.
Source: Australian Energy Market Operator.
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ... with 52,598 more rows
autoplot(vic_elec)
## Plot variable not specified, automatically selected `.vars = Demand`
4. pelt
help(pelt)
Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company.
Source: Hudson Bay Company
pelt
## # A tsibble: 91 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
## 7 1851 74600 5560
## 8 1852 75090 5080
## 9 1853 88480 10170
## 10 1854 61280 19600
## # ... with 81 more rows
autoplot(pelt)
## Plot variable not specified, automatically selected `.vars = Hare`
b. What is the time interval of each series?
For example: pelt data shows above that is a tsibble object and it contains 91 rows and 3 columns. Alongside this, [1Y] informs us that the interval of these observations is every year.
gafa_stock = 1Day
pelt = 1Year
PBS = 1M
vic_elec = 30M
##Question 2
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
2018 happens to be the peak closing price for each of the four stocks.
a. You can read the data into R with the following script:
tute1 <- readr::read_csv("tute1.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## Quarter = col_date(format = ""),
## Sales = col_double(),
## AdBudget = col_double(),
## GDP = col_double()
## )
tute1
## # A tibble: 100 x 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
## 7 1982-09-01 778. 531. 296.
## 8 1982-12-01 932. 608. 272.
## 9 1983-03-01 996. 638. 260.
## 10 1983-06-01 908. 582. 280.
## # ... with 90 more rows
b. Convert the data to time series
c. Construct time series plots of each of the three series
library(tidyr)
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
d. Check what happens when you don’t include facet_grid().
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
Without the facet_grid(), we can see all the quarterly series in just one plot or graph. We do that often when all the series are in the same range/time. However, if one or more of the series happen to have the same point(s) on the graph, there will an overlap which can mislead or confuse us. Therefore, facet_grid() is a helpful tool that allow us to build a individual chart for each variable to have a clear and better understanding.
Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
#install.packages("USgas")
library(USgas)
## Warning: package 'USgas' was built under R version 4.1.2
data <- us_total %>% tsibble (key = state, index = year)
data
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # ... with 1,256 more rows
#Filter the data to the states required.
states <- data %>%
filter(state == 'Connecticut' | state == 'Maine' | state == 'Massachusetts' | state == 'New Hampshire' | state == 'Rhode Island' | state == 'Vermont')
#Create a Plot
autoplot(states, y) +
labs(title = "Natural Gas in New England area", y = "Gas Consumption", x = "Year")
Tourism <- readr::read_csv("tourism.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## Quarter = col_date(format = ""),
## Region = col_character(),
## State = col_character(),
## Purpose = col_character(),
## Trips = col_double()
## )
Tourism
## # A tibble: 24,320 x 5
## Quarter Region State Purpose Trips
## <date> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
## 7 1999-07-01 Adelaide South Australia Business 169.
## 8 1999-10-01 Adelaide South Australia Business 134.
## 9 2000-01-01 Adelaide South Australia Business 154.
## 10 2000-04-01 Adelaide South Australia Business 169.
## # ... with 24,310 more rows
Tourism$Quarter <- as.yearqtr(Tourism$Quarter)
Tourism
## # A tibble: 24,320 x 5
## Quarter Region State Purpose Trips
## <yearqtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ... with 24,310 more rows
Tourism %>%
group_by(Region, Purpose) %>%
summarise_at(vars(-Quarter,-State), funs(mean(Trips, na.rm=TRUE)))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## # A tibble: 304 x 3
## # Groups: Region [76]
## Region Purpose Trips
## <chr> <chr> <dbl>
## 1 Adelaide Business 156.
## 2 Adelaide Holiday 157.
## 3 Adelaide Other 56.6
## 4 Adelaide Visiting 205.
## 5 Adelaide Hills Business 2.66
## 6 Adelaide Hills Holiday 10.5
## 7 Adelaide Hills Other 1.40
## 8 Adelaide Hills Visiting 14.2
## 9 Alice Springs Business 14.6
## 10 Alice Springs Holiday 31.9
## # ... with 294 more rows
d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
Tourism %>%
group_by(Quarter, State) %>%
summarise_at(vars(-Region,-Purpose), funs(sum(Trips, na.rm=TRUE)))
## # A tibble: 640 x 3
## # Groups: Quarter [80]
## Quarter State Trips
## <yearqtr> <chr> <dbl>
## 1 1998 Q1 ACT 551.
## 2 1998 Q1 New South Wales 8040.
## 3 1998 Q1 Northern Territory 181.
## 4 1998 Q1 Queensland 4041.
## 5 1998 Q1 South Australia 1735.
## 6 1998 Q1 Tasmania 982.
## 7 1998 Q1 Victoria 6010.
## 8 1998 Q1 Western Australia 1641.
## 9 1998 Q2 ACT 416.
## 10 1998 Q2 New South Wales 7166.
## # ... with 630 more rows
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
set.seed(12272018)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries
## # A tsibble: 441 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Apr 4.4
## 2 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 May 3.4
## 3 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Jun 3.6
## 4 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Jul 4
## 5 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Aug 3.6
## 6 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Sep 4.2
## 7 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Oct 4.8
## 8 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Nov 5.4
## 9 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1982 Dec 6.9
## 10 Australian Capital~ Cafes, restaurants and cat~ A3349849A 1983 Jan 3.8
## # ... with 431 more rows
Explore your chosen retail time series using the following functions:
autoplot()
autoplot(myseries, Turnover)
autoplot() produces a plot of Turnover from 1982 thru 2018 for Newspaper and book retailing in the Australian Capital Territory. We can see that there is a relationship between the turnover the year as it is a continuous line/relationship.
gg_season()
gg_season(myseries)
## Plot variable not specified, automatically selected `y = Turnover`
The plot shows a Turnover in the months of February, March, July and October. March and December are the two months with the highest Turnover.
gg_subseries()
gg_subseries(myseries)
## Plot variable not specified, automatically selected `y = Turnover`
The plot shows the highest Turnover occurs in March and December.
January is the lowest turnover month.
gg_lag()
gg_lag(myseries, Turnover)
Although it is not very clear, we can see December is the highest turnover.
ACF() %>% autoplot()
myseries %>%
ACF(Turnover) %>%
autoplot() + labs (title = "Monthly Australian retail")
The plot above shows that months 1 - 9 are the highest which indicates that there is a seasonality to when turnover is at its highest.
Finally, we can indicate that there is seasonality and trends to the Australian Turnover data.