As a MATLAB user who transitioned to R for data science, one of the things I miss the most is fantastic interactive plotting. MATLAB lets you zoom and pan with ease, and multiple graphs can be stacked vertically with linked axes, so you can navigate multiple time series simultaneously. Recently I was pleasantly surprised to discover R’s dygraphs package, which lets you do almost everything you can do in MATLAB.
Let’s have a look at Rossmann sales data using dygraph:
## [1] "character"
##
## FALSE TRUE
## 858354 199943
## store dayOfWeek date sales
## Min. : 1.0 Min. :1.000 Min. :2013-01-01 Min. : 0
## 1st Qu.: 280.0 1st Qu.:2.000 1st Qu.:2013-08-26 1st Qu.: 3727
## Median : 558.0 Median :4.000 Median :2014-04-20 Median : 5744
## Mean : 558.3 Mean :3.998 Mean :2014-04-30 Mean : 5774
## 3rd Qu.: 837.0 3rd Qu.:6.000 3rd Qu.:2015-01-12 3rd Qu.: 7856
## Max. :1115.0 Max. :7.000 Max. :2015-09-17 Max. :41551
## NA's :41088
## Customers open promo stateHoliday
## Min. : 0.0 Mode :logical Mode :logical Length:1058297
## 1st Qu.: 405.0 FALSE:178801 FALSE:653953 Class :character
## Median : 609.0 TRUE :879485 TRUE :404344 Mode :character
## Mean : 633.1 NA's :11 NA's :0
## 3rd Qu.: 837.0
## Max. :7388.0
## NA's :41088
## schoolHoliday split Id
## Mode :logical Length:1058297 Min. : 1
## FALSE:858354 Class :character 1st Qu.:10273
## TRUE :199943 Mode :character Median :20545
## NA's :0 Mean :20545
## 3rd Qu.:30816
## Max. :41088
## NA's :1017209
This graph is initialized to show the first 6 months of 2014, but it contains the entire sales history of store 1081 along with the predictions of a simple model (median by store, dayOfWeek, and promo period). Promo periods are slightly shaded. State holidays are marked with dashed lines, color-coded to the value of StateHoliday.
Here’s how you interact with the graph:
Try setting the moving average to 14 days: this removes the alternating promo/non-promo periods and shows slower trends, as well as the holiday behavior.