dbplot now includes an extension to the ggvis package. This allows an tbl_sql object to be used as the source of the plot without any additional code. Under the hood, dbplot adds the proper S3 methods that perform the calculations inside the database, and returns the results in the correct format that ggvis expects. For example, ggvis::layer_histograms() depends on the compute_bin() function to calculate the bins, dbplot includes a custom compute_bin() that supports a tbl_sql object, so R will use that function to create the bins instead of the one that comes in ggvis. dbplot uses the same underlying db_compute_bins() function, so the results are consistent across visualization approaches. The ggvis plots currently supported by dbplot are:
Histograms
Box plots
Bar
A new layer to create raster plots in ggvis is also implemented by dbplot.
devtools::install_github("edgararuiz/dbplot")
library(dplyr)
library(dbplot)
library(sparklyr)
library(ggvis)
conf <- spark_config()
sc <- spark_connect(master = "local", version = "2.1.0")
spark_flights <- copy_to(sc, nycflights13::flights, "flights")
spark_flights %>%
ggvis(~sched_dep_time) %>%
layer_histograms()
dbplot supports the width argument that is passed as bin_width to the db_compute_bins function.
spark_flights %>%
ggvis(~sched_dep_time) %>%
layer_histograms(width = 400)
Passing simple formulas as the property value is also supported
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay - dep_delay) %>%
layer_histograms()
Box plots are currently only supported for sparklyr and Hive connections.
spark_flights %>%
filter(!is.na(dep_delay)) %>%
ggvis(~month, ~dep_delay) %>%
layer_boxplots(width = 0.5)
spark_flights %>%
ggvis(~month) %>%
layer_bars()
dbplot implements a new plot for ggvis called layer_raster(). It works with both local and database sources.
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
layer_raster()
layer_raster() supports aggregate formulas passed in the fill argument
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
layer_raster(fill = ~mean(distance), res = 40)
Because it returns a standard ggvis object, further customizations can be done to the plot
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
scale_numeric("fill", range = c("orange","blue")) %>%
layer_raster(fill = ~mean(distance), res = 40)
The new compute_raster() function uses dplyr, so it allows the same function to support local data frames as well as tbl_sql objects
spark_flights %>%
compute_raster(~arr_delay, ~dep_delay) %>%
head
## x1_ y1_ agg_ x2_ y2_
## 1 -4.52 -16.12 74448 22.64 10.76
## 2 22.64 -16.12 8589 49.80 10.76
## 3 -31.68 -16.12 142374 -4.52 10.76
## 4 -4.52 10.76 22096 22.64 37.64
## 5 -58.84 -16.12 17321 -31.68 10.76
## 6 22.64 37.64 8388 49.80 64.52