Plotly Package Review

Jonathan Yu, Jaden Stanford, Joseph Keogh

Package Overview

History of Plotly

Plotly is a company founded by Alex Johnson, Jack Parmer, Chris Parmer, and Matthew Sundquistin 2013. While working for the data science program of a California-based cleantech company, Alex, Jack, Chris, and Matthew found themselves facing the seemingly simple problem of having to find a way to meaningfully and easily share the data they’ve gathered. Even after collecting, analyzing, and sorting data, they felt that there were still important questions that had to be answered. These questions included:

  • How do we share what we’ve learned with others in a meaningful way?
  • How do we enable others to explore our data?
  • Can we give others access to the models to explore on their own?

To answer these questions they decided to create a tool to make scientific and data analysis simple. Plotly was actually originally a JavaScript graphing library which was eventually converted into an R package which allowed the creation of graphs using R data and syntax from ggplot2. Their focuses with plotly were to use the web as a data science platform, power discovery with open source, provide unlimited flexibility, remove language as a barrier, and enable shared goals across the organization. In 2013 and 2014, Jack, Chris, Matthew, and Alex officially founded “plotly” and opened their Montreal headquarters.

Background of Plotly

Plotly is a graphing library that makes interactive, publication-quality graphs. Oftentimes with data visualization, people run into issues with making graphs both interesting and informative. However, with plotly this can be done easily as plotly allows for the creation of a wide variety of visualizations ranging from basic bar-graphs to maps, 3d charts, and animated charts with a few lines of code. In addition to being easy to use, plotly is also an open-source package that can be found on numerous platforms including R, Python, .NET, JS, and Julia.

Version History

Current Version: 4.9.3 (last updated 1/10/2021)

Dependencies

R (≥ 3.2.0), ggplot2 (≥ 3.0.0)

Usage

plotly offers a wide variety of interactive options allowed on a web-based application which include:

Examples of Usage

plotly converts plots to an interactive, web-based version. It allows for zooming in or out of a graph, selection of data points

  • zooming in or out of a graph
  • selection of data points upon mouse hover
  • panning through a graph
  • downloading a graph as a PNG
  • box selecting or lasso selecting data points
  • option to compare nearby data points upon hover

visual formats supported by plotly include:

  • basic charts (scatter and line plots, bar charts, pie charts, bubble charts, and more)
  • statistical charts (2D Histograms, box plots, histograms, error bars, violin plots, and more)
  • scientific charts (log plots, contour plots, heatmaps, network graph, ternary contour plots, and more)
  • financial charts (time series, candlestick charts, OHLC charts, waterfall charts, funnel charts, and more)
  • maps (chloropleth maps, scatter plots on maps, mapbox density, lines on maps, mapbox layers, and more)
  • 3D Charts (3D scatter plots, 3d line plots, 3d surface plots, 3d mesh plots, 3d cone plots, and more)
  • subplots (multiple axes, map subplots and small multiples, inset plots, subplots, 3d subplots, and more)
  • animated plots

ggplot to plotly

One of the neat things about the plotly package is that it can convert a ggplot2 graph into a plotly graph. Here we have a basic graph made from ggplot2:

To turn this graph into a plotly graph we can use the function ‘ggplotly’.

plotlyggplot <- ggplotly(ggplot1)
plotlyggplot

With the ggplot2 graph converted into a plotly graph, it is now possible to enjoy all of the features that plotly has with a graph made from ggplot2.

Bubble Chart

One of the most basic charts that you are able to create in plotly is the bubble chart. A bubble chart is a scatter plot whose markers have variable color and size. In this example we will be making a bubble plot from scratch using plotly and the built in mtcars dataset in R.

#plot_ly function allows for creation of any plotly graph
figure <- plot_ly(
  data = mtcars, #specified dataset, in this case we are using mtcars
  x = ~mpg, #variable that corresponds with x-axis 
  y = ~hp, #variable that corresponds with y-axis
  text = rownames(mtcars), #specifies text that shows up when you hover over a data point
  type = 'scatter', #specified graph type, in this case we are using the scatter plot
  mode = 'markers', 
  marker = list(size = ~cyl, opacity = 0.5, color = 'green')) #marker parameter specifies the size, opacity, and color of each of our bubbles.  


figure <- figure %>% #plotly uses the pipe function from dplyr for formatting the layout of the graph (x-axis, y-axis, and title)
  #the layout function is used to specify the text and formatting of the title, x-axis, and y-axis
  layout(title = 'Gas Efficiency and Speed of Cars in mtcars dataset', 
         xaxis = list(title = 'Miles Per Gallon'),
         yaxis = list(title = 'Horsepower'))
figure

You can also add more to what can be shown in a plotly graph by adding the ‘text’ parameter ‘hoverinfo’.

figure <- plot_ly(
  data = mtcars,
  x = ~mpg,
  y = ~hp,
  hoverinfo = 'text', #hoverinfo parameter added
  text = ~paste('Car Model:', rownames(mtcars), '<br>Displacement:', disp), #text parameter changed
  type = 'scatter', 
  mode = 'markers', 
  marker = list(size = ~cyl, opacity = 0.5, color = 'green')) 


figure <- figure %>%
  layout(title = 'Gas Efficiency and Speed of Cars in mtcars dataset', 
         xaxis = list(title = 'Miles Per Gallon'),
         yaxis = list(title = 'Horsepower'))
figure

Heatmaps

Heatmaps are graphs that use a color scale to illustrate relationships between variables. In order to easily create a heatmap in plotly, you have to first convert your dataset into a matrix. This can be done with the as.matrix function in R. After that, it is important to normalize the different variables since the heatmap uses the same color scale for all variables so some variables with lower/higher numbers will not illustrate the differences between variables clearly on the color scale. In this example, we will be using the USArrests dataset built into R.

USArrests <- USArrests[,-c(4)]
usarrests_matrix <- as.matrix(USArrests) #turning mtcars dataset into a matrix
usarrests_matrix <- apply(USArrests, 2, function(x) {x/mean(x)}) #normalizing the variables within mtcars_matrix
plot <- plot_ly (x = colnames(usarrests_matrix), #specifies variables on x-axis
                 y = rownames(usarrests_matrix), #specifies variables on y-axis
                 z = usarrests_matrix, #specifies data being used
                 type = "heatmap") #specifies type of graph 

plot

3D Graphs

Plotly can also be used to create 3D versions of graphs, such as 3D scatterplots. In the following example from the Clustering Lab, a 3D plotly scatterplot is shown that graphs NBA players based on minutes played, field goals, and points. Hover over each point to see the player name and their salary.

#importing dataset and viewing data
nba = read_csv("/cloud/project/nba2020-21.csv")

#a function for pre-processing of the nba data
nba_pre_processing <- function(nba){
  stats <- nba[, c("Player","Tm", "Pos", "FG", "FT", "PTS", "MP")] %>%
mutate(Player = as.factor(Player)) %>%
  mutate(Pos=as.factor(Pos)) %>%
  mutate(Tm=as.factor(Tm)) %>%
  distinct(Player, .keep_all = TRUE)    #getting rid of duplicate names
  x <- gsub("[^[:alnum:]]", " ", stats$Player)  #getting rid of special chars
  mutate(stats, Player = x)
}

#calling the function
nba_stats <- nba_pre_processing(nba)
view(nba_stats)

#pre-processing for salary data
salary = read_csv("/cloud/project/nba_salaries_21.csv") 

#function for pre-processing salary data
salary_pre_processing <- function(salary){
  salary %>%
  mutate(Player = as.factor(Player)) %>%
  distinct(Player, .keep_all = TRUE)    #getting rid of duplicate names
  x <- gsub("[^[:alnum:]]", " ", salary$Player)  #getting rid of special chars
  mutate(salary, Player = x)
}

#calling salary function
salary_stats <- salary_pre_processing(salary) #processing salary data
view(salary_stats)
#combining nba data with salary data into one chart
nba_combined <- merge(nba_stats, salary_stats, by="Player", all=TRUE)  

#fixing column names 
colnames(nba_combined) <- c("Player", "Tm","Pos", "FG", "FT","PTS", "MP", "Salary")
#Creating a 3D graph

#creating 3D scatterplot using plotly
fig <- plot_ly(nba_combined,type = "scatter3d",mode = "markers", x = ~MP, y = ~FG, z = ~PTS, 
                text = ~paste('Player:', Player,'Salary:', Salary)) #to show player name and salary when hovering over points
 fig <- fig %>% layout(title = 'NBA Player Rankings Based on Minutes Played, Field Goals, and Points') #adding title
 fig <- fig %>% layout(scene = list(xaxis = list(title = 'Minutes Played'),
                                    yaxis = list(title = 'Field Goals'),
                                    zaxis = list(title = 'Points')))
#formatting axes
 
fig

Maps

Plotly has some features that allow for the creation of maps. There are different types of maps in plotly, like bubble maps, scatter plot maps, choropleth maps, line maps, and layered maps. The map below is an example of a plotly choropleth map. A choropleth map shades geographic regions by value. It shows the states ranked based on mental health and access. Hovering over the states shows the state postal code and and ranking.

data<- read.csv("/cloud/project/health.csv", header = TRUE, sep = ",") #reading in data from the csv

plot <- plot_ly(data, type="choropleth", locations=data$State, z=data$Rank, locationmode="USA-states")
#limiting map to USA and not global
g <- list(
  scope = 'usa'
)
#showing plot
plot <- plot %>% layout(
  geo = g
)
plot

Similar Packages

ggplot2

ggplot2 is probably the most similar package to plotly. You can even convert from ggplot2 to plotly, and they share a lot of the same capabilities as far as creating graphics for data visualization. However, plotly has slightly more capabilaties as far as creating fully interactive graphs. Plotly is also available for Python, MATLAB, and React, whereas ggplot2 is made for R.

The ggplot2 graph below shows a simple graph created in ggplot2 that graphs GDP per capita vs. Life Expectancy.

leaflet

Leaflet is a very popular R package used to make interactive maps. It expands greatly on the map visualization tools in plotly.

Some features include:

  • interactive panning and zooming
  • creating maps using combinations of map tiles, map markers, polygons, lines, popups, and more
  • create maps directly in RStudio
  • embed maps into RMarkdown documents and Shiny apps
  • display maps in non spherical mercator projections
  • easy rendering of spatial objects from the sp or sf packages, or data frames with latitude/longitude columns
  • can change features in maps using plugins from the leaflet plugins repository

The leaflet graph below shows an interactive graph of U.S population density.

dygraphs

dygraphs is an R package thats focuses on interactive time series visualization.

Some features include:

  • automatic plotting of xts time-series objects or other time series objects that are convertible to xts (eXtensible Time Series)
  • range selector interface for interactive interactive panning and zooming,
  • interactive series/point highlighting with different visual options,
  • highly configurable axis and series display
  • graph overlays including shaded regions, event lines, and annotations
  • make graphs in the R console
  • embed graphs into RMarkdown documents and R Shiny apps

The dygraph below shows an interactive time-series graph showing the Discharge of River Danube.

Reflection

Plotly seems to be a comprehensive package for graphic data visualization. Graphics created with plotly are interactive and aesthetically pleasing, and there are many different options for types of graphics (plots, maps, graphs, etc.) It is also useful that plotly can be used with ggplot2, another great package in r for data visualization. Although the two packages are similar, plotly is more widely applicable and allows for higher levels of interaction. Additionally, plotly is available in Python, MATLAB, and Reach.

Pros

  • graphs are interactive
  • graphs are aesthetically pleasing
  • can be used in conjunction to ggplot
  • relatively simple to use for simple graphs
  • many options for visualizations (graphs, maps, animations, subplots, 3D charts, financial charts, statistical charts )
  • graphs are highly customizable
  • plotly is available in other programs

Cons

  • Documentation is limited on the specific parameters within plot_ly. While the package documentation is helpful in outlining each function within plot_ly it does a fairly poor job of providing examples of what each parameter does within the plot_ly function. I think having extensive documentation on each parameter accompanied by a picture example of what it does would be tremendously helpful for people to get a better understanding of the plotly package.
  • for use in R, need to make sure that if you are looking for information online it is specific to R and not another program that uses plotly (like Python)

Suggestions

  • more documentation for some of the specific and more complex graph types (like choropleths, 3D charts)
  • maybe including a basic overview of the specific graph types and the input parameters needed