Introduction to the TSstudio Package

Greater Cleveland R Group

Rami Krispin

February 28, 2018

Agenda

Introduction

Quick Poll

How many are familar with:

Time Series Analysis

Time series anlysis - is the process of extracting meaningful insights from time series data with the uses (mainly) of statistical, mathematical and visualizations applications. Those insights can be used to research past events and to forecast future events

Time series data - is used to describe event of phenomena that occur overtime. It main characteristic - the series values needed to capture at equally spaced time intervals (e.g., minutes, hours, days, months, etc.)

TSstudio Overview

General Package Architecture

TSstudio Overview

The idea is to take this static plot:

data("AirPassengers")
plot.ts(AirPassengers)

TSstudio Overview

And make it dynamic:

data("AirPassengers")
ts_plot(AirPassengers)

TSstudio Overview

And with some tweaks

data("AirPassengers")
ts_plot(AirPassengers, title = "Monthly Airline Passenger Numbers 1949-1960", Ytitle = "Number of Passengers in Thousands", slider = TRUE)

Visualize Multiple Time Series Object

Will load the stock prices of key technology companies (Apple, Google, Facebook, and Microsoft)

library(TSstudio)
library(xts)
library(zoo)
library(quantmod)

tckrs <- c("AAPL",  "FB", "GOOGL", "MSFT")
getSymbols(tckrs, from = "2013-01-01", src = "yahoo")
## [1] "AAPL"  "FB"    "GOOGL" "MSFT"
head(GOOGL)
##            GOOGL.Open GOOGL.High GOOGL.Low GOOGL.Close GOOGL.Volume
## 2013-01-02   360.0701   363.8639  358.6336    361.9870      5077500
## 2013-01-03   362.8278   366.3313  360.7207    362.1972      4631700
## 2013-01-04   365.0350   371.1061  364.2042    369.3543      5521400
## 2013-01-07   368.0931   370.0601  365.6557    367.7427      3308000
## 2013-01-08   368.1382   368.5185  362.5776    367.0170      3348800
## 2013-01-09   366.5015   369.5446  364.6647    369.4294      4045300
##            GOOGL.Adjusted
## 2013-01-02       361.9870
## 2013-01-03       362.1972
## 2013-01-04       369.3543
## 2013-01-07       367.7427
## 2013-01-08       367.0170
## 2013-01-09       369.4294

Visualize Multiple Time Series Object

And capture the closing values of the four in a matrix

closing <- cbind(AAPL$AAPL.Close, FB$FB.Close, GOOGL$GOOGL.Close, MSFT$MSFT.Close)
names(closing) <- c("Apple",  "Facebook", "Google", "Microsoft")
class(closing)
## [1] "xts" "zoo"
head(closing)
##               Apple Facebook   Google Microsoft
## 2013-01-02 78.43285    28.00 361.9870     27.62
## 2013-01-03 77.44286    27.77 362.1972     27.25
## 2013-01-04 75.28571    28.76 369.3543     26.74
## 2013-01-07 74.84286    29.42 367.7427     26.69
## 2013-01-08 75.04429    29.06 367.0170     26.55
## 2013-01-09 73.87143    30.59 369.4294     26.70

Visualize Multiple Time Series Object

The default option is multiple plots mode

ts_plot(closing, title = "Top Technology Companies Stocks Prices Since 2013")

Visualize Multiple Time Series Object

You can set it to a single plot mode

ts_plot(closing, title = "Top Technology Companies Stocks Prices Since 2013", type = "single")

Seasonality Analysis

data("USgas")
ts_plot(USgas, title = "US Natural Gas Consumption", Xtitle = "Year", Ytitle = "Billion Cubic Feet" )

Seasonality Analysis

The “normal” mode of the function can be split into full cycle period (year)

ts_seasonal(USgas)

Seasonality Analysis

The “cycle” on the other hand split the series by cycle units (i.e., months, quarter)

ts_seasonal(USgas, type = "cycle")

Seasonality Analysis

Last but not least is the box plot representative of the cycle units

ts_seasonal(USgas, type = "box")

Seasonality Analysis

The “all” option will give you the full story

ts_seasonal(USgas, type = "all")

Seasonality Analysis

Another approach to check a seasonality in the series is to use a heatmap

ts_heatmap(USgas)

Correlation Analysis

The package provides interactive representative for the acf and pacf functions and for lags plot

ts_acf(USgas, lag.max = 48)

Correlation Analysis

A nicer method to get the relationship of the series with its lags is with lag plot

ts_lags(USgas)

Correlation Analysis

You can control the number of lags

ts_lags(USgas, lag.max = 24)

Application for Forecasting

The TSstudio package provides a set of supporting tools for training and testing forecasting models, that include:

Let’s train the model for 12 months forecast for the USgas dataset

Application for Forecasting

We will set the horizon to 12 months forecast

h <- 12

And split the data into training and testing partitions

usgas_split <- ts_split(USgas, sample.out = h)

train <- usgas_split$train
test <- usgas_split$test
train
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2000 2510.5 2330.7 2050.6 1783.3 1632.9 1513.1 1525.6 1653.1 1475.0 1567.8
## 2001 2677.0 2309.5 2246.6 1807.2 1522.4 1444.4 1598.1 1669.2 1494.1 1649.1
## 2002 2487.6 2242.4 2258.4 1881.0 1611.5 1591.4 1748.4 1725.7 1542.2 1645.9
## 2003 2700.5 2500.3 2197.9 1743.5 1514.7 1368.4 1600.5 1651.6 1428.6 1553.2
## 2004 2675.8 2511.1 2100.9 1745.2 1573.0 1483.7 1584.9 1578.0 1482.2 1557.2
## 2005 2561.9 2243.0 2205.8 1724.9 1522.6 1534.1 1686.6 1695.1 1422.5 1428.2
## 2006 2165.3 2144.4 2126.4 1681.0 1526.3 1550.9 1758.7 1751.7 1462.1 1644.2
## 2007 2475.6 2567.0 2128.8 1810.1 1559.1 1555.2 1659.9 1896.1 1590.5 1627.8
## 2008 2734.0 2503.4 2278.2 1823.9 1576.4 1604.2 1708.6 1682.9 1460.9 1635.8
## 2009 2729.7 2332.5 2170.7 1741.3 1504.0 1527.8 1658.0 1736.5 1575.0 1666.5
## 2010 2809.8 2481.0 2142.9 1691.8 1617.3 1649.5 1825.8 1878.9 1637.5 1664.9
## 2011 2888.6 2452.4 2230.5 1825.0 1667.4 1657.3 1890.5 1891.8 1655.6 1744.5
## 2012 2756.2 2500.7 2127.8 1953.1 1873.8 1868.4 2069.8 2008.8 1807.2 1901.1
## 2013 2878.8 2567.2 2521.1 1967.5 1752.5 1742.9 1926.3 1927.4 1767.0 1866.8
## 2014 3204.1 2741.2 2557.9 1961.7 1810.2 1745.4 1881.0 1933.1 1809.3 1912.8
## 2015 3115.0 2925.2 2591.3 2007.9 1858.1 1899.9 2067.7 2052.7 1901.3 1987.3
## 2016 3092.0 2651.4 2357.2 2089.4 1971.4 2005.5 2193.3 2216.1 1951.3 1926.3
##         Nov    Dec
## 2000 1908.5 2587.5
## 2001 1701.0 2120.2
## 2002 1913.6 2378.9
## 2003 1753.6 2263.7
## 2004 1782.8 2327.7
## 2005 1663.4 2326.4
## 2006 1765.4 2122.8
## 2007 1834.5 2399.2
## 2008 1868.9 2399.7
## 2009 1776.2 2491.9
## 2010 1973.3 2714.1
## 2011 2031.9 2541.9
## 2012 2167.8 2503.9
## 2013 2316.9 2920.8
## 2014 2357.5 2679.2
## 2015 2249.1 2588.2
## 2016 2164.4
test
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 2016                                                                      
## 2017 2899.4 2326.2 2572.2 1926.7 1898.4 1912.6 2143.2 2139.5 1929.8 2035.3
##         Nov    Dec
## 2016        2867.3
## 2017 2282.7

Application for Forecasting

We will build three time series models with the training set and evaluate their performace on the testing set.

We will start with a baseline model using the naive approach (seasonal naive)

library(forecast)

f1 <- snaive(train, h = h)
test_forecast(actual = USgas, forecast.obj = f1, test = test)

Application for Forecasting

In the second model we will test the auto ARIMA model from the forecast package

m2 <- auto.arima(train, stepwise = FALSE)
f2 <- forecast(m2, h = h)
test_forecast(actual = USgas, forecast.obj = f2, test = test)

Application for Forecasting

Last we will try Neural Network model for time series data

m3 <- nnetar(train, repeats = 200)
f3 <- forecast(m3, h = h)
test_forecast(actual = USgas, forecast.obj = f3, test = test)

Application for Forecasting

It seems like that the model is not tune well, lets set the seasonal (P) and non seasonal lags (p) of the model

m4 <- nnetar(train, P = 3, p = 12, repeats = 200)
f4 <- forecast(m4, h = h)
test_forecast(actual = USgas, forecast.obj = f4, test = test)

Roadmap

Questions?

Thank You!