dbplot
now includes an extension to the ggvis
package. This allows an tbl_sql
object to be used as the source of the plot without any additional code. Under the hood, dbplot
adds the proper S3 methods that perform the calculations inside the database, and returns the results in the correct format that ggvis
expects. For example, ggvis::layer_histograms()
depends on the compute_bin()
function to calculate the bins, dbplot
includes a custom compute_bin()
that supports a tbl_sql
object, so R will use that function to create the bins instead of the one that comes in ggvis
. dbplot
uses the same underlying db_compute_bins()
function, so the results are consistent across visualization approaches. The ggvis
plots currently supported by dbplot
are:
Histograms
Box plots
Bar
A new layer to create raster plots in ggvis
is also implemented by dbplot
.
devtools::install_github("edgararuiz/dbplot")
library(dplyr)
library(dbplot)
library(sparklyr)
library(ggvis)
conf <- spark_config()
sc <- spark_connect(master = "local", version = "2.1.0")
spark_flights <- copy_to(sc, nycflights13::flights, "flights")
spark_flights %>%
ggvis(~sched_dep_time) %>%
layer_histograms()
dbplot
supports the width
argument that is passed as bin_width
to the db_compute_bins
function.
spark_flights %>%
ggvis(~sched_dep_time) %>%
layer_histograms(width = 400)
Passing simple formulas as the property value is also supported
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay - dep_delay) %>%
layer_histograms()
Box plots are currently only supported for sparklyr
and Hive connections.
spark_flights %>%
filter(!is.na(dep_delay)) %>%
ggvis(~month, ~dep_delay) %>%
layer_boxplots(width = 0.5)
spark_flights %>%
ggvis(~month) %>%
layer_bars()
dbplot
implements a new plot for ggvis
called layer_raster()
. It works with both local and database sources.
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
layer_raster()
layer_raster()
supports aggregate formulas passed in the fill
argument
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
layer_raster(fill = ~mean(distance), res = 40)
Because it returns a standard ggvis
object, further customizations can be done to the plot
spark_flights %>%
filter(!is.na(arr_delay)) %>%
ggvis(~arr_delay, ~dep_delay) %>%
scale_numeric("fill", range = c("orange","blue")) %>%
layer_raster(fill = ~mean(distance), res = 40)
The new compute_raster()
function uses dplyr
, so it allows the same function to support local data frames as well as tbl_sql
objects
spark_flights %>%
compute_raster(~arr_delay, ~dep_delay) %>%
head
## x1_ y1_ agg_ x2_ y2_
## 1 -4.52 -16.12 74448 22.64 10.76
## 2 22.64 -16.12 8589 49.80 10.76
## 3 -31.68 -16.12 142374 -4.52 10.76
## 4 -4.52 10.76 22096 22.64 37.64
## 5 -58.84 -16.12 17321 -31.68 10.76
## 6 22.64 37.64 8388 49.80 64.52