class: center, middle, inverse, title-slide # Time Series Analysis and Forecasting with the TSstudio Package ## Bay Area R Users Group Meetup ### Rami Krispin ### 2018-12-11 --- class: inverse ## Agenda - Introduction - Utility functions - Visualization tools - Seasonal analysis - Correlation analysis - Forecasting applications - Road map --- class: inverse, center, middle ## Any experince with time series analysis? ##forecast and plotly packages? --- class: inverse ## Introduction The **TSstudio** package provides a set of functions for time series analysis and forecasting such as: - Utility functions for pre-processing time series data - Interactive data visualization tools for descriptive analysis, based on the **plotly** package engine - Set of functions for predictive analysis and forecasting automation with the use of models from the **forecast**, **forecastHybrid**, and **bsts** packages The primary goal of the package is to simplify the analysis workflow (or, minimum code - maximum results) --- class: inverse ## Package structure  --- class: inverse # Installation Install from [CRAN](https://cran.r-project.org/web/packages/TSstudio/index.html): ```r install.packages("TSstudio") ``` Or from [Github](https://github.com/RamiKrispin/TSstudio): ```r devtools::install_github("RamiKrispin/TSstudio") ``` --- class: inverse # Utility functions - object info The `ts_info` returns the main characteristics of the series: ```r library(TSstudio) data("USVSales") ts_info(USgas) ``` ``` ## The USgas series is a ts object with 1 variable and 225 observations ## Frequency: 12 ## Start time: 2000 1 ## End time: 2018 9 ``` ```r data("Michigan_CS") ts_info(Michigan_CS) ``` ``` ## The Michigan_CS series is a xts object with 1 variable and 465 observations ## Frequency: monthly ## Start time: Jan 1980 ## End time: Sep 2018 ``` --- class: inverse # Utility functions - creating partitions The `ts_split` splits a time series object to a training and testing partitions: ```r USgas_split <- ts_split(USgas, sample.out = 12) train <- USgas_split$train test <- USgas_split$test ``` --- class: inverse # Utility functions - creating partitions The `ts_split` splits a time series object to a training and testing partitions: ```r ts_info(train) ``` ``` ## The train series is a ts object with 1 variable and 213 observations ## Frequency: 12 ## Start time: 2000 1 ## End time: 2017 9 ``` ```r ts_info(test) ``` ``` ## The test series is a ts object with 1 variable and 12 observations ## Frequency: 12 ## Start time: 2017 10 ## End time: 2018 9 ``` --- class: inverse # Utility functions - prophet object The `ts_to_prophet` convert time series object (e.g., `ts`, `xts`, `zoo`) to `prophet` input structure: ```r USgas_prophet <- ts_to_prophet(USgas) head(USgas_prophet) ``` ``` ## ds y ## 1 2000-01-01 2510.5 ## 2 2000-02-01 2330.7 ## 3 2000-03-01 2050.6 ## 4 2000-04-01 1783.3 ## 5 2000-05-01 1632.9 ## 6 2000-06-01 1513.1 ``` --- class: inverse # Utility functions - more Another useful utility functions: - `xts_to_ts()` for converting `xts` object to `ts` object - `zoo_to_ts()` for converting `zoo` object to `ts` object - `ts_sum()` - summation of multiple time series objects - `ts_reshape()` - transform time series object to a data frame format --- class: inverse # Visualization tools - plotting ts object The `ts_plot` function plot time series objects, supporting multiple formats (i.e., `ts`, `xts`, `zoo`, `data.frame`, `tbl`): ```r ts_plot(USgas) ```
--- class: inverse # Visualization tools - plotting ts object It is fairly simple to customize the plot: ```r ts_plot(USgas, Ytitle = "Billion Cubic Feet", title = "Monthly Natural Gas Consumption in the US", slider = TRUE, color = "green") ```
--- class: inverse # Visualization tools - add plotly layer All the visualization outputs are `plotly` objects: ```r p <- ts_plot(USgas, Ytitle = "Billion Cubic Feet", title = "Monthly Natural Gas Consumption in the US") class(p) ``` ``` ## [1] "plotly" "htmlwidget" ``` --- class: inverse # Visualization tools - add plotly layer Therefore, you can apply any of the plotly functions, and customize the object accordingly: ```r library(plotly) p %>% layout(font = list(color = "black"), plot_bgcolor = "white", paper_bgcolor = "#f2f2f2") ```
--- class: inverse # Seasonal analysis The package provides a set of functions for seasonal analysis, such as: - `ts_seasonal()` - provides a view of the series by its frequency units, applicable for a series with daily frequency and above (e.g., monthly, quarterly) - `ts_heatmap()` - heatmap for time series data supports time series with half hour frequency and above - `ts_surface` - A 3D view of the series, by the frequency units (e.g., the month of the year), the cycle units (e.g. the year), and the series values - `ts_polar` - polar plot of time series data, applicable for monthly or quarterly series - `ts_quantile` - quantile plots of time series data --- class: inverse # Seasonal analysis - seasonal plots ```r ts_seasonal(USgas, type = "all", palette_normal = "inferno") ```
--- class: inverse # Seasonal analysis - heatmap ```r ts_heatmap(USgas, color = "Greens") ```
--- class: inverse # Seasonal analysis - 3D view ```r ts_surface(USgas) ```
--- class: inverse # Correlation analysis - ACF plot ```r ts_acf(USgas) ```
--- class: inverse # Correlation analysis - lags plot ```r ts_lags(USgas) ```
--- class: inverse # Correlation analysis - lags plot ```r ts_lags(USgas, lags = c(12, 24, 36, 48)) ```
--- class: inverse # Forecasting applications - error evaluation ```r library(forecast) USgas_split <- ts_split(USgas, sample.out = 12) train <- USgas_split$train test <- USgas_split$test md <- auto.arima(train, stepwise = FALSE, approximation = FALSE) fc <- forecast(md, h = 12) accuracy(fc, test) ``` ``` ## ME RMSE MAE MPE MAPE MASE ## Training set -1.66563 98.53365 72.04795 -0.2777977 3.497193 0.6743267 ## Test set 174.92468 196.47007 174.92468 7.0405671 7.040567 1.6371928 ## ACF1 Theil's U ## Training set 0.02073135 NA ## Test set -0.15231904 0.5813516 ``` --- class: inverse # Forecasting applications - error evaluation ```r test_forecast(actual = USgas, test = test, forecast.obj = fc) ```
--- class: inverse # Forecasting applications - forecast plot ```r md1 <- auto.arima(USgas, stepwise = FALSE, approximation = FALSE) fc1 <- forecast(md1, h = 60) plot_forecast(fc1) ```
--- class: inverse # Forecasting applications - backtesting <img src="backtesting.png" width="500px" /> [Source - Uber Engineering Blog](https://eng.uber.com/omphalos/) --- class: inverse # Forecasting applications - backtesting ```r md2 <- ts_backtesting(ts.obj = USgas, h = 60, window_size = 12, periods = 6) ``` ```r md2$leaderboard ``` ``` ## Model_Name avgMAPE sdMAPE avgRMSE sdRMSE ## 1 auto.arima 6.148333 0.9886843 184.1150 15.82619 ## 2 hybrid 6.713333 0.9228579 196.2300 21.57911 ## 3 tbats 7.468333 0.7641313 208.5633 17.60290 ## 4 bsts 7.551667 1.9287967 214.9950 44.75606 ## 5 HoltWinters 7.736667 0.6686903 218.5017 19.05705 ## 6 ets 7.778333 1.7040824 202.5533 45.95689 ## 7 nnetar 8.033333 0.6963524 275.0717 11.80927 ``` --- class: inverse # Forecasting applications - backtesting ```r md2$summary_plot ```
--- class: inverse # Forecasting applications - backtesting ```r md3 <- ts_backtesting(ts.obj = USgas, periods = 6, error = "RMSE", window_size = h1, h = h2, a.arg = list(stepwise = FALSE, approximation = FALSE, D = 1), e.arg = list(opt.crit = "mse"), n.arg = list(P = 2, p =1, repeats = 100), h.arg = list(errorMethod = "RMSE", verbos = FALSE)) ``` --- class: inverse # Forecasting applications - backtesting ```r md3$leaderboard ``` ``` ## Model_Name avgMAPE sdMAPE avgRMSE sdRMSE ## 1 auto.arima 6.305000 0.5681813 187.9250 7.128455 ## 2 hybrid 6.848333 0.6713990 197.4033 20.243566 ## 3 ets 7.778333 1.7040824 202.5533 45.956891 ## 4 tbats 7.468333 0.7641313 208.5633 17.602901 ## 5 nnetar 6.716667 0.2035354 212.0533 2.190650 ## 6 bsts 7.393333 2.0701948 212.6817 48.537247 ## 7 HoltWinters 7.736667 0.6686903 218.5017 19.057048 ``` --- class: inverse # Forecasting applications - backtesting ```r md3$summary_plot ```
--- class: inverse # Forecasting applications - backtesting ```r md2$leaderboard ``` ``` ## Model_Name avgMAPE sdMAPE avgRMSE sdRMSE ## 1 auto.arima 6.148333 0.9886843 184.1150 15.82619 ## 2 hybrid 6.713333 0.9228579 196.2300 21.57911 ## 3 tbats 7.468333 0.7641313 208.5633 17.60290 ## 4 bsts 7.551667 1.9287967 214.9950 44.75606 ## 5 HoltWinters 7.736667 0.6686903 218.5017 19.05705 ## 6 ets 7.778333 1.7040824 202.5533 45.95689 ## 7 nnetar 8.033333 0.6963524 275.0717 11.80927 ``` ```r md3$leaderboard ``` ``` ## Model_Name avgMAPE sdMAPE avgRMSE sdRMSE ## 1 auto.arima 6.305000 0.5681813 187.9250 7.128455 ## 2 hybrid 6.848333 0.6713990 197.4033 20.243566 ## 3 ets 7.778333 1.7040824 202.5533 45.956891 ## 4 tbats 7.468333 0.7641313 208.5633 17.602901 ## 5 nnetar 6.716667 0.2035354 212.0533 2.190650 ## 6 bsts 7.393333 2.0701948 212.6817 48.537247 ## 7 HoltWinters 7.736667 0.6686903 218.5017 19.057048 ``` --- class: inverse # Forecasting applications - backtesting ```r check_res(md3$Models_Final$auto.arima) ```
--- class: inverse # Road map - Improve both the functionality and efficiency of the ts_backtesting function: - Add more models (e.g., `prophet`, `tslm`, `stl`, etc.) - Parallel the function - Utilize **purrr** functions - Expand the descriptive function to more granular or high-frequency data --- class: inverse # Road map Work on progress... ```r ts_sim(model = md3$Models_Final$auto.arima, h = 60, n = 100) ```
--- class: inverse # Aditional information - Package [github page](https://github.com/RamiKrispin/TSstudio) - Release notes on my [blog](https://ramikrispin.github.io/) - Great post about backtesting from [Uber Engineering Blog](https://eng.uber.com/omphalos/) - The [forecast](https://github.com/robjhyndman/forecast) and [plotly](https://plot.ly/r/) packages --- class: inverse, center, middle # Questions? --- class: inverse, center, middle # Thanks!