Introduction to the TSstudio Package
Greater Cleveland R Group
Rami Krispin
February 28, 2018
How many are familar with:
Time series anlysis - is the process of extracting meaningful insights from time series data with the uses (mainly) of statistical, mathematical and visualizations applications. Those insights can be used to research past events and to forecast future events
Time series data - is used to describe event of phenomena that occur overtime. It main characteristic - the series values needed to capture at equally spaced time intervals (e.g., minutes, hours, days, months, etc.)
The TSstudio package provides a set of tools for descriptive and predictive analysis of a time series data supporting “ts”, “mts”, “zoo” and “xts” objects and the forecast package outputs.
Started as a supporting package for the MLstudio package time series visualizations, and therefore Plotly was selected as the visualization engine
Realized that it is an opportunity to wrap a bunch of other functions for time series analysis I accumulated over time
Designed to work with Shiny
The main idea of the package is minimum code to get maximum results
Entered into the RStudio list of “Top 40” new packages for Jan 2018 (link)
The idea is to take this static plot:
data("AirPassengers")
plot.ts(AirPassengers)And make it dynamic:
data("AirPassengers")
ts_plot(AirPassengers)And with some tweaks
data("AirPassengers")
ts_plot(AirPassengers, title = "Monthly Airline Passenger Numbers 1949-1960", Ytitle = "Number of Passengers in Thousands", slider = TRUE)Will load the stock prices of key technology companies (Apple, Google, Facebook, and Microsoft)
library(TSstudio)
library(xts)
library(zoo)
library(quantmod)
tckrs <- c("AAPL", "FB", "GOOGL", "MSFT")
getSymbols(tckrs, from = "2013-01-01", src = "yahoo")## [1] "AAPL" "FB" "GOOGL" "MSFT"
head(GOOGL)## GOOGL.Open GOOGL.High GOOGL.Low GOOGL.Close GOOGL.Volume
## 2013-01-02 360.0701 363.8639 358.6336 361.9870 5077500
## 2013-01-03 362.8278 366.3313 360.7207 362.1972 4631700
## 2013-01-04 365.0350 371.1061 364.2042 369.3543 5521400
## 2013-01-07 368.0931 370.0601 365.6557 367.7427 3308000
## 2013-01-08 368.1382 368.5185 362.5776 367.0170 3348800
## 2013-01-09 366.5015 369.5446 364.6647 369.4294 4045300
## GOOGL.Adjusted
## 2013-01-02 361.9870
## 2013-01-03 362.1972
## 2013-01-04 369.3543
## 2013-01-07 367.7427
## 2013-01-08 367.0170
## 2013-01-09 369.4294
And capture the closing values of the four in a matrix
closing <- cbind(AAPL$AAPL.Close, FB$FB.Close, GOOGL$GOOGL.Close, MSFT$MSFT.Close)
names(closing) <- c("Apple", "Facebook", "Google", "Microsoft")
class(closing)## [1] "xts" "zoo"
head(closing)## Apple Facebook Google Microsoft
## 2013-01-02 78.43285 28.00 361.9870 27.62
## 2013-01-03 77.44286 27.77 362.1972 27.25
## 2013-01-04 75.28571 28.76 369.3543 26.74
## 2013-01-07 74.84286 29.42 367.7427 26.69
## 2013-01-08 75.04429 29.06 367.0170 26.55
## 2013-01-09 73.87143 30.59 369.4294 26.70
The default option is multiple plots mode
ts_plot(closing, title = "Top Technology Companies Stocks Prices Since 2013")You can set it to a single plot mode
ts_plot(closing, title = "Top Technology Companies Stocks Prices Since 2013", type = "single")data("USgas")
ts_plot(USgas, title = "US Natural Gas Consumption", Xtitle = "Year", Ytitle = "Billion Cubic Feet" )The “normal” mode of the function can be split into full cycle period (year)
ts_seasonal(USgas)The “cycle” on the other hand split the series by cycle units (i.e., months, quarter)
ts_seasonal(USgas, type = "cycle")Last but not least is the box plot representative of the cycle units
ts_seasonal(USgas, type = "box")The “all” option will give you the full story
ts_seasonal(USgas, type = "all")Another approach to check a seasonality in the series is to use a heatmap
ts_heatmap(USgas)The package provides interactive representative for the acf and pacf functions and for lags plot
ts_acf(USgas, lag.max = 48)A nicer method to get the relationship of the series with its lags is with lag plot
ts_lags(USgas)You can control the number of lags
ts_lags(USgas, lag.max = 24)The TSstudio package provides a set of supporting tools for training and testing forecasting models, that include:
Let’s train the model for 12 months forecast for the USgas dataset
We will set the horizon to 12 months forecast
h <- 12And split the data into training and testing partitions
usgas_split <- ts_split(USgas, sample.out = h)
train <- usgas_split$train
test <- usgas_split$test
train## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 2510.5 2330.7 2050.6 1783.3 1632.9 1513.1 1525.6 1653.1 1475.0 1567.8
## 2001 2677.0 2309.5 2246.6 1807.2 1522.4 1444.4 1598.1 1669.2 1494.1 1649.1
## 2002 2487.6 2242.4 2258.4 1881.0 1611.5 1591.4 1748.4 1725.7 1542.2 1645.9
## 2003 2700.5 2500.3 2197.9 1743.5 1514.7 1368.4 1600.5 1651.6 1428.6 1553.2
## 2004 2675.8 2511.1 2100.9 1745.2 1573.0 1483.7 1584.9 1578.0 1482.2 1557.2
## 2005 2561.9 2243.0 2205.8 1724.9 1522.6 1534.1 1686.6 1695.1 1422.5 1428.2
## 2006 2165.3 2144.4 2126.4 1681.0 1526.3 1550.9 1758.7 1751.7 1462.1 1644.2
## 2007 2475.6 2567.0 2128.8 1810.1 1559.1 1555.2 1659.9 1896.1 1590.5 1627.8
## 2008 2734.0 2503.4 2278.2 1823.9 1576.4 1604.2 1708.6 1682.9 1460.9 1635.8
## 2009 2729.7 2332.5 2170.7 1741.3 1504.0 1527.8 1658.0 1736.5 1575.0 1666.5
## 2010 2809.8 2481.0 2142.9 1691.8 1617.3 1649.5 1825.8 1878.9 1637.5 1664.9
## 2011 2888.6 2452.4 2230.5 1825.0 1667.4 1657.3 1890.5 1891.8 1655.6 1744.5
## 2012 2756.2 2500.7 2127.8 1953.1 1873.8 1868.4 2069.8 2008.8 1807.2 1901.1
## 2013 2878.8 2567.2 2521.1 1967.5 1752.5 1742.9 1926.3 1927.4 1767.0 1866.8
## 2014 3204.1 2741.2 2557.9 1961.7 1810.2 1745.4 1881.0 1933.1 1809.3 1912.8
## 2015 3115.0 2925.2 2591.3 2007.9 1858.1 1899.9 2067.7 2052.7 1901.3 1987.3
## 2016 3092.0 2651.4 2357.2 2089.4 1971.4 2005.5 2193.3 2216.1 1951.3 1926.3
## Nov Dec
## 2000 1908.5 2587.5
## 2001 1701.0 2120.2
## 2002 1913.6 2378.9
## 2003 1753.6 2263.7
## 2004 1782.8 2327.7
## 2005 1663.4 2326.4
## 2006 1765.4 2122.8
## 2007 1834.5 2399.2
## 2008 1868.9 2399.7
## 2009 1776.2 2491.9
## 2010 1973.3 2714.1
## 2011 2031.9 2541.9
## 2012 2167.8 2503.9
## 2013 2316.9 2920.8
## 2014 2357.5 2679.2
## 2015 2249.1 2588.2
## 2016 2164.4
test## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2016
## 2017 2899.4 2326.2 2572.2 1926.7 1898.4 1912.6 2143.2 2139.5 1929.8 2035.3
## Nov Dec
## 2016 2867.3
## 2017 2282.7
We will build three time series models with the training set and evaluate their performace on the testing set.
We will start with a baseline model using the naive approach (seasonal naive)
library(forecast)
f1 <- snaive(train, h = h)
test_forecast(actual = USgas, forecast.obj = f1, test = test)In the second model we will test the auto ARIMA model from the forecast package
m2 <- auto.arima(train, stepwise = FALSE)
f2 <- forecast(m2, h = h)
test_forecast(actual = USgas, forecast.obj = f2, test = test)Last we will try Neural Network model for time series data
m3 <- nnetar(train, repeats = 200)
f3 <- forecast(m3, h = h)
test_forecast(actual = USgas, forecast.obj = f3, test = test)It seems like that the model is not tune well, lets set the seasonal (P) and non seasonal lags (p) of the model
m4 <- nnetar(train, P = 3, p = 12, repeats = 200)
f4 <- forecast(m4, h = h)
test_forecast(actual = USgas, forecast.obj = f4, test = test)While the main focus in the current and previous releases was on descriptive analysis tools, the next release of the package will focus on tools for predictive analysis such as cross validation and training methods for time series forecasting, automation of forecasting and model selection.
Visualize the forecast output
Shiny application for time series modeling