I’m excited to announce that plotly’s R package just made it’s first CRAN update in nearly four months. This update introduces breaking changes, enables new features, fixes some bugs, and takes us from version 3.6.0 to 4.3.5. To see all the changes, I encourage you to read the NEWS file. In this post, I’ll highlight the most important changes, explain why they needed to happen, and provide some tips for fixing errors brought about by this update. As you’ll see, this update is mostly about improving the plot_ly() interface, so ggplotly() users won’t see much of a change. If you’d like to learn more about the package in general, I recently started a “plotly book” which will provide more narrative surrounding both basic and advanced usage of the R package. It is very much in its beginning stages, but I hope to add more in the coming months.

Formula mappings

In the past, you could use an expression to map (a function of) variable(s) in a data frame to visual attribute(s), but this no longer works. From now on, you’ll need to use a formula instead, which is basically an expression, but with a ~ prefixed. You won’t have to use a formula when referencing objects, but I recommend it, since it helps inform sensible axis/guide title defaults (e.g., compare the output of plot_ly(z = volcano) to plot_ly(z = ~volcano)).

library(plotly)
plot_ly(mtcars, x = mpg, y = sqrt(wt))
#> Error in plot_ly(mtcars, x = mpg, y = sqrt(wt)): object 'wt' not found
plot_ly(mtcars, x = ~mpg, y = ~sqrt(wt))
1015202530351.21.41.61.822.22.4
mpgsqrt(wt)

There are a number of technical reasons why imposing this change is a good idea. If you’re interested in the details, I recommend reading Hadley Wickham’s notes on non-standard evaluation, but here’s the gist of the situation:

  1. Since formulas capture the environment in which they are created, we can be confident that evaluation rules are always be correct, no matter the context.
  2. Formulas are much easier to program with compared to symbols. In particular, it makes writing custom functions around plot_ly() easier. Also, it’s fairly easy to convert a string to a formula (e.g., as.formula("~sqrt(wt)")). This trick can be quite useful when programming in shiny (and a variable mapping depends on an input value).
myPlot <- function(x, y) {
  plot_ly(mtcars, x = x, y = y, color = ~factor(cyl), colors = "Dark2")
}
myPlot(~mpg, ~disp)
101520253035100200300400500
mpgdisp468

Smarter defaults

Instead of always defaulting to a “scatter” trace, plot_ly() now infers a sensible trace type (and other attribute defaults) based on the information provided. These defaults are determined by inspecting the vector type (e.g., numeric/character/factor/etc) of positional attributes (e.g., x/y). For example, if we supply a discrete variable to x (or y), we get a vertical (or horizontal) bar chart:

subplot(
  plot_ly(diamonds, x = ~cut, color = ~clarity),
  plot_ly(diamonds, y = ~cut, color = ~clarity),
  margin = 0.07
) %>% hide_legend()
FairGoodVery GoodPremiumIdeal−101234020004000FairGoodVery GoodPremiumIdeal

Or, if we supply two discrete variables to both x and y:

plot_ly(diamonds, x = ~cut, y = ~clarity)
FairGoodVery GoodPremiumIdealI1SI2SI1VS2VS1VVS2VVS1IF
cutclarity10002000300040005000

Also, the order of categories on a discrete axis, by default, is now either alphabetical (for character strings) or matches the ordering of factor levels. This makes it easier to sort categories according to something meaningful, rather than the order in which the categories appear (the old default). If you prefer the old default, use layout(categoryorder = "trace")

library(dplyr)
# order the clarity levels by their median price
d <- diamonds %>%
  group_by(clarity) %>%
  summarise(m = median(price)) %>%
  arrange(m)
diamonds$clarity <- factor(diamonds$clarity, levels = d[["clarity"]])
plot_ly(diamonds, x = ~price, y = ~clarity, type = "box")
05k10k15kIFVVS1VVS2VS1VS2SI1I1SI2
priceclarity

plot_ly() now initializes a plot

Previously plot_ly() always produced at least one trace, even when using add_trace() to add on more traces (if you’re familiar with ggplot2 lingo, a trace is similar to a layer). From now on, you’ll have to specify the type in plot_ly() if you want it to always produce a trace:

subplot(
  plot_ly(economics, x = ~date, y = ~psavert, type = "scatter") %>% 
    add_trace(y = ~uempmed) %>%
    layout(yaxis = list(title = "Two Traces")),
  plot_ly(economics, x = ~date, y = ~psavert) %>% 
    add_trace(y = ~uempmed) %>% 
    layout(yaxis = list(title = "One Trace")),
  titleY = TRUE, shareX = TRUE, nrows = 2
) %>% hide_legend()
01020197019801990200020101020
dateTwo TracesOne Trace

Why make this change? Often times, when composing a plot with multiple traces, you have attributes that are shared across traces (i.e., global) and attributes that are not. By allowing plot_ly() to simply initialize the plot and define global attributes, it makes for a much more natural to describe such a plot. Consider the next example, where we declare x/y (longitude/latitude) attributes and alpha transparency globally, but alter trace specific attributes in add_trace()-like functions. This example also takes advantage of a few other new features:

  1. The group_by() function which defines “groups” within a trace (described in more detail in the next section).
  2. New add_*() functions which behave like add_trace(), but are higher-level since they assume a trace type, might set some attribute values (e.g., add_marker() set the scatter trace mode to marker), and might trigger other data processing (e.g., add_lines() is essentially the same as add_paths(), but guarantees values are sorted along the x-axis). I hope to add more of these high-level function over the coming months.
  3. Scaling is avoided for “AsIs” values (i.e., values wrapped with I()) which makes it easier directly specify a constant value for a visual attribute (as opposed to mapping data values to visuals).
  4. More support for R’s graphical parameters such as pch for symbols and lty for linetypes.
map_data("world", "canada") %>%
  group_by(group) %>%
  plot_ly(x = ~long, y = ~lat, alpha = 0.1) %>%
  add_polygons(color = I("black"), hoverinfo = "none") %>%
  add_markers(color = I("red"), symbol = I(17),
              text = ~paste(name, "<br />", pop),
              hoverinfo = "text", data = maps::canada.cities) %>%
  hide_legend()
−140−120−100−80−604050607080
longlat

New interpretation of group

The group argument in plot_ly() has been removed in favor of the group_by() function. In the past, the group argument incorrectly created multiple traces. Now, group(s) are used to define “gaps” within a trace. This is more consistent with how ggplot2’s group aesthetic is translated in ggplotly().

txhousing %>%
  group_by(city) %>%
  plot_ly(x = ~date, y = ~median, mode = "lines")
2000200220042006200820102012201450k100k150k200k250k300k
datemedian

If you hover on the plot above, you’ll notice the hovertext is not very informative

txhousing %>%
  plot_ly(x = ~date, y = ~median, color = ~city, colors = "black") %>%
  add_lines()
200020052010201550k100k150k200k250k300k
datemedianAbileneAmarilloArlingtonAustinBay AreaBeaumontBrazoria CountyBrownsvilleBryan-College StationCollin CountyCorpus ChristiDallasDenton CountyEl PasoFort BendFort WorthGalvestonGarlandHarlingenHoustonIrvingKerrvilleKilleen-Fort HoodLaredoLongview-MarshallLubbockLufkinMcAllenMidlandMontgomery CountyNacogdochesNE Tarrant CountyOdessaParisPort ArthurSan AngeloSan AntonioSan MarcosSherman-DenisonSouth Padre IslandTemple-BeltonTexarkanaTylerVictoriaWacoWichita Falls

New plotly object representation

Previously, most functions in plotly returned a data frame with special attributes attached (needed for tracking the plot’s attributes). At the time, I thought this was the right way to enable a “data-plot-pipeline” where a plot is described as a sequence of visual mappings and data manipulations. For a number of technical reasons, I’ve changed my mind, and decided the central plotly object should inherit from an htmlwidget object instead. This change doesn’t destroy our ability to implement a “data-plot-pipeline”, but it does constrain the set manipulations we can perform on a plotly object. As of writing, plotly supports dplyr generics (e.g., mutate()/filter()/etc), but I hope to add support for tidyr (and possibly other?) generics very soon.

p <- economics %>%
  plot_ly(x = ~date, y = ~unemploy / pop, showlegend = F) %>%
  add_lines(linetype = I("22")) %>%
  mutate(rate = unemploy / pop) %>% 
  filter(rate == max(rate)) %>%
  add_markers(symbol = I(10), size = I(50))

layout(p, annotations = list(x = ~date, y = ~rate, text = "peak"))
197019801990200020100.020.030.040.050.06
dateunemploy/poppeak

In this context, I’ve often found it helpful to inspect the (most recent) data associated with a particular plot, which you can do via plotly_data()

plotly_data(p)
#> # A tibble: 1 × 7
#>         date    pce    pop psavert uempmed unemploy       rate
#>       <date>  <dbl>  <int>   <dbl>   <dbl>    <int>      <dbl>
#> 1 1982-12-01 2167.4 233160    10.3    10.2    12051 0.05168554

To keep up to date with currently supported data manipulation verbs, please consult the help(reexports) page, and for more examples, check out the examples section under help(plotly_data).

This change in the representation of a plotly object also has important implications for folks using plotly_build() to “manually” access or modify a plot’s underlying spec. Previously, this function returned the JSON spec as an R list, but it now returns more “meta” information about the htmlwidget, so in order to access that same list, you have to grab the “x” element. The new as_widget() function (different from the now deprecated as.widget() function) is designed to turn a plotly spec into an htmlwidget object.

pl <- plotly_build(qplot(1:10))[["x"]]
pl$data[[1]]$hoverinfo <- "none"
as_widget(pl)
#> Error in eval(expr, envir, enclos): could not find function "as_widget"