dbplot now includes an extension to the ggvis package. This allows an tbl_sql object to be used as the source of the plot without any additional code. Under the hood, dbplot adds the proper S3 methods that perform the calculations inside the database, and returns the results in the correct format that ggvis expects. For example, ggvis::layer_histograms() depends on the compute_bin() function to calculate the bins, dbplot includes a custom compute_bin() that supports a tbl_sql object, so R will use that function to create the bins instead of the one that comes in ggvis. dbplot uses the same underlying db_compute_bins() function, so the results are consistent across visualization approaches. The ggvis plots currently supported by dbplot are:

A new layer to create raster plots in ggvis is also implemented by dbplot.

Setup

devtools::install_github("edgararuiz/dbplot")

Opening the connection

library(dplyr)
library(dbplot)
library(sparklyr)
library(ggvis)

conf <- spark_config()
sc <- spark_connect(master = "local", version = "2.1.0")

spark_flights <- copy_to(sc, nycflights13::flights, "flights")

Histogram

spark_flights %>%
  ggvis(~sched_dep_time) %>%
  layer_histograms()

dbplot supports the width argument that is passed as bin_width to the db_compute_bins function.

spark_flights %>%
  ggvis(~sched_dep_time) %>%
  layer_histograms(width = 400)

Passing simple formulas as the property value is also supported

spark_flights %>%
  filter(!is.na(arr_delay)) %>%
  ggvis(~arr_delay - dep_delay) %>%
  layer_histograms()

Boxplot

Box plots are currently only supported for sparklyr and Hive connections.

spark_flights %>%
  filter(!is.na(dep_delay)) %>%
  ggvis(~month, ~dep_delay) %>%
  layer_boxplots(width = 0.5)

Bar plots

spark_flights %>%
  ggvis(~month) %>%
  layer_bars()

Raster

dbplot implements a new plot for ggvis called layer_raster(). It works with both local and database sources.

spark_flights %>%
  filter(!is.na(arr_delay)) %>%
  ggvis(~arr_delay, ~dep_delay) %>%
  layer_raster()

layer_raster() supports aggregate formulas passed in the fill argument

spark_flights %>%
  filter(!is.na(arr_delay)) %>%
  ggvis(~arr_delay, ~dep_delay) %>%
  layer_raster(fill = ~mean(distance), res = 40)

Because it returns a standard ggvis object, further customizations can be done to the plot

spark_flights %>%
  filter(!is.na(arr_delay)) %>%
  ggvis(~arr_delay, ~dep_delay) %>%
  scale_numeric("fill", range = c("orange","blue")) %>%
  layer_raster(fill = ~mean(distance), res = 40)

The new compute_raster() function uses dplyr, so it allows the same function to support local data frames as well as tbl_sql objects

spark_flights %>%
  compute_raster(~arr_delay, ~dep_delay) %>%
  head
##      x1_    y1_   agg_    x2_   y2_
## 1  -4.52 -16.12  74448  22.64 10.76
## 2  22.64 -16.12   8589  49.80 10.76
## 3 -31.68 -16.12 142374  -4.52 10.76
## 4  -4.52  10.76  22096  22.64 37.64
## 5 -58.84 -16.12  17321 -31.68 10.76
## 6  22.64  37.64   8388  49.80 64.52