If, after reading this article, you decide to incorporate R into your work, be prepared for a real adventure. R is a fully-fledged programming language, and misperceiving it as an engineering calculator on steroids can lead to a poor experience. In this article, I will demonstrate one of R’s strongest qualities - data visualization. We will explore what R has to offer, from simple “sketches” to versatile frameworks, whose possibilities are limited only by your imagination and time.
In my opinion, the main advantage of basic visualization is the ability to easily create simple graphs. R supports four main types of plots: scatter plot, histogram, bar plot, and box plot.
head(mtcars)
mpg <- mtcars$mpg # Miles/(US) gallon
disp <- mtcars$disp # disp Displacement (cu.in.)
plot(mpg,disp, main="mpg/disp dependency")
How plot guessed the axis labels without explicit indication is another interesting topic. You can try passing mtcars$mpg directly, but the result will delight you with its straightforwardness. So it’s best to use arguments to control the properties of the axes.
hist(mtcars$disp, xlab="Displacement (cu.in.)",
main="Distribution of Engine Displacement")
Let’s add some colors for our boxplot
boxplot(mtcars$hp,disp,
main="Distribution of Gross Horsepower and Engine Displacement",
names = c("Hp","Miles/(US) gallon"),
col = c("#f5424b","#4287f5")
)
To create data for a barplot, the dplyr framework is used. If you haven’t actively used Non-Standard Evaluation (NSE), this block of code might raise some questions. You can find a complete explanation here, or wait for the article about it.
if (!require("dplyr")) install.packages("dplyr")
library(dplyr)
mean_mpg_by_cyl <- mtcars %>%
group_by(cyl)%>%
summarise(mean_mpg=mean(mpg))
barplot(mean_mpg_by_cyl$mean_mpg,names.arg = mean_mpg_by_cyl$cyl,
main="Mean mpg by number of cyl",
xlab="Mean mpg",
ylab="Number of cylinders",
horiz = TRUE, # rotate graph
las=1) # rotate text ticks. Why las and why 1 who nows, it was it is
So far, it’s relatively simple - we’ve specified what goes on the x and y axes, labels, and basic aesthetics. But it’s worth digging deeper to see that plot supports method dispatch. You can pass a data.frame in its entirety, and not just that. For a deeper understanding of this topic, it is worth exploring the implementation of OOP in R. There’s a lot to read there.
plot(mtcars, main = "Scatterplot Matrix of mtcars Data",
col = "#03bafc", pch = 19)
For dessert, an example with a legend and subplots
par(mfrow = c(2, 2)) # setting dimension
# Plot 1: mpg vs. wt by cylinders
plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl, pch = 19,
main = "Miles per Gallon vs. Weight", xlab = "Weight", ylab = "MPG")
legend("topright", legend = unique(mtcars$cyl), col = unique(mtcars$cyl),
pch = 19, title = "Cylinders")
# Plot 2: mpg vs. hp by transmission
plot(mtcars$hp, mtcars$mpg, col = mtcars$am, pch = 17,
main = "Miles per Gallon vs. Horsepower", xlab = "Horsepower", ylab = "MPG")
legend("topright", legend = c("Automatic", "Manual"), col = c(1, 2),
pch = 17, title = "Transmission")
# Plot 3: mpg vs. qsec by gears
plot(mtcars$qsec, mtcars$mpg, col = mtcars$gear, pch = 16,
main = "Miles per Gallon vs. Quarter Mile Time",
xlab = "Quarter Mile Time", ylab = "MPG")
legend("topright", legend = unique(mtcars$gear), col = unique(mtcars$gear),
pch = 16, title = "Gears")
# Plot 4: mpg vs. disp by engine type
plot(mtcars$disp, mtcars$mpg, col = mtcars$vs + 1, pch = 15,
main = "Miles per Gallon vs. Displacement",
xlab = "Displacement", ylab = "MPG")
legend("topright", legend = c("V-shaped", "Straight"),col = c(1, 2),
pch = 15, title = "Engine Type")
That concludes the capabilities of basic visualization. It’s not bad and allows you to achieve acceptable results without additional complexity. However, tackling more complex tasks with its help can be challenging.
The first library I’d like to mention is lattice. It was released
back in 1997 and significantly expanded visualization capabilities in R.
It supports various types of plots, data transmission via formula; in
its basic form, a plot is constructed using the following schema:
xyplot(y ~ x | group, data). The main advantage of lattice
lies in its ability to quickly create multi-panel plots. Each panel
represents a complete independent plot, allowing layering of elements
within each panel. While ggplot2 has largely replaced lattice in
articles and reports today, it’s still worth mentioning for its
historical significance and unique features in R graphics
if (!require("lattice")) install.packages("lattice")
library(lattice)
df <- mtcars
df$am <- ifelse(mtcars$am==0, "automatic", "manaul")
df$cyl <- paste(mtcars$cyl, "cylinders")
# Creating a plot
lattice_plot<- xyplot(mpg ~ hp | factor(cyl) * factor(am), data = df,
main = "Scatterplot of mpg vs. hp conditioned by cyl and am",
xlab = "Horsepower",
ylab = "Miles per Gallon",
pch = 16,
panel = function(x, y, ...) {
panel.xyplot(x, y, ...) # Scatter plot
panel.lmline(x,y) # reggression line
},
auto.key = list(space = "top", columns = 3,
points = TRUE, lines = FALSE,
title = "Legend"))
lattice_plot
What so cool about it, besides potentially offering a greater variety of plots and enhanced control over aesthetics? Now, a plot isn’t just a side effect; it’s a full-fledged object with its own attributes, such as dimension.
dim(lattice_plot)
# Which means we can substract some part
lattice_plot[3,1]
A highly useful feature is when there are many grouping variables and it’s interesting to show the overall picture and then specific details.
The name might suggest it’s not the first version, but that’s not the case - ggplot2 was released in 2007, paying homage to the book. At its core, ggplot2 follows a systematic approach to creating graphics (outlined in this book), which involves decomposing plots and then combining and modifying components. These components include:
Each of these components can be finely tuned. Years of active use as the gold standard in visualization allow ggplot2 to deliver high-quality results with minimal configuration. However, delving into the details may require studying substantial documentation, richly illustrated with advanced examples. Besides the official package, the community has developed numerous extensions to address specialized tasks. I won’t overwhelm you with copious code snippets for flashy graphs, but I assure you that in years of using R, from Genome-wide association studies (GWAS) to managerial analytics projects, ggplot2 has never let me down.
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")
library("ggplot2")
library("dplyr")
# Creating a plot
mtcars %>%
mutate(am = ifelse(mtcars$am==0, "automatic", "manaul"))%>%
ggplot(aes(x = mpg)) + # aesthetics
geom_histogram(binwidth = 2, color = "white", fill = "#69b3a2") + # geometries and layers
facet_wrap(~ factor(am), ncol = 2) + # facest
labs(title = "Distribution of MPG by Transmission Type", # scales
x = "Miles per Gallon (MPG)",
y = "Frequency") +
theme_minimal() + # themes
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 12),
strip.text = element_text(face = "bold", size = 10))
Further examples are presented as gifs. If you’re interested in playing around in this little sandbox, feel free to check them out here.
So far, we’ve only looked at static graphs. The simplest solution to
create an interactive graph is plotly::ggplotly(). This
function attempts to transform a ggplot2 graph into a plotly graph.
However, this approach doesn’t fully leverage all of plotly’s
capabilities and can be relatively slow. Let’s try making the graph from
the previous example interactive using this approach.
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("dplyr")) install.packages("dplyr")
if (!require("plotly")) install.packages("plotly")
library("ggplot2")
library("dplyr")
library("plotly")
# Creating plot
ggplot_graph<- mtcars %>%
mutate(am = ifelse(mtcars$am==0, "automatic", "manaul"))%>%
ggplot(aes(x = mpg)) + # data and aesthetics
geom_histogram(binwidth = 2, color = "white", fill = "#69b3a2") + # geometry and lyers
facet_wrap(~ factor(am), ncol = 2) + # facets
labs(x = "Miles per Gallon (MPG)",
y = "Frequency") +
theme_minimal() + # themes
theme(plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 12),
strip.text = element_text(face = "bold", size = 10))
# Magic
plotly_graph <- ggplotly(ggplot_graph)
plotly_graph
If you access the attributes of the plotly_graph object,
you can see:
$names
[1] "x" "width" "height" "sizingPolicy" "dependencies" "elementId"
[7] "preRenderHook" "jsHooks"
$class
[1] "plotly" "htmlwidget"
$package
[1] "plotly"
The presence of preRenderHook and jsHooks,
as well as the htmlwidget class, allows you to modify the
graph object using JavaScript. Let’s try adding a missing title to our
graph using these features.
custom_js <- "
function(el, x) {
Plotly.relayout(el.id, {
title: 'Distribution of MPG by Transmission Type'
});
}
"
plotly_graph <- htmlwidgets::onRender(plotly_graph, custom_js) %>%
layout(width=900, height=620)
plotly_graph
But is it necessary to reinvent the wheel? Let’s take an overview of Plotly’s capabilities.
As mentioned earlier, Plotly in R allows you to convert static ggplot2 graphs and also create them from scratch using the same fundamental idea as ggplot2.
One of the most common tasks for Plotly is creating linked interactive graphs. Unfortunately, R Plotly currently only allows shared axes among subplots. While you can achieve linking effects using htmlwidgets, I suggest using a higher-level library like crosstalk](https://rstudio.github.io/crosstalk/). Leveraging the familiar htmlwidget, crosstalk provides an R6 class that mimics a standard data.frame, facilitating linked brushing and filtering across plots.
if (!require("crosstalk")) install.packages("crosstalk")
if (!require("plotly")) install.packages("plotly")
if (!require("dplyr")) install.packages("dplyr")
library(crosstalk)
library(plotly)
library(dplyr)
# Data preparation
df <- mtcars %>%
mutate(am = ifelse(am == 0, "Automatic", "Manual"),
cyl=paste(cyl, "cylinders"))
shared_mtcars <- SharedData$new(df) # R6 class
# Scatter plot
scatter_plot <- plot_ly(shared_mtcars, x = ~mpg, y = ~hp, type = 'scatter',
mode = 'markers',
color = ~factor(cyl),
colors = "Set1",
text = ~paste("Car:", rownames(mtcars)),
marker = list(size = 10)) %>%
layout(title = "Scatter Plot of MPG vs HP",
xaxis = list(title = "Miles per Gallon (MPG)"),
yaxis = list(title = "Horsepower (HP)")
)
# MPG box plot
box_plot_mpg <- plot_ly(shared_mtcars, y = ~mpg, type = 'box',
color = ~factor(cyl), colors = "Set1") %>%
layout(title = "Box Plot of MPG by Cylinders",
yaxis = list(title = "Miles per Gallon (MPG)"))
# HP box plot
box_plot_hp <- plot_ly(shared_mtcars, y = ~hp, type = 'box',
color = ~factor(cyl), colors = "Set1") %>%
layout(title = "Box Plot of HP by Cylinders",
yaxis = list(title = "Horsepower (HP)"))
# Combine plots together
combined_plot <- subplot(scatter_plot, box_plot_mpg, box_plot_hp,
nrows = 2, titleX = TRUE, titleY = TRUE) %>%
layout(title = "Combined Plot of MPG and HP with Box Plots",
width=900, height=620)%>%
hide_legend()
# Show plots
combined_plot
To minimize the need for pure JavaScript, Plotly graciously provides built-in widgets:
*Dropdown (dropdown).
We won’t deep dive into when to use these instead of subplots, nor will we discuss when subplots are necessary versus a single plot. Instead, let’s look at a simple example of their use.
if (!require("plotly")) install.packages("plotly")
if (!require("dplyr")) install.packages("dplyr")
library(plotly)
library(dplyr)
# Data preparation
mtcars <- mtcars %>%
mutate(am = ifelse(am == 0, "automatic", "manual"))
# Creating plot with dropdown
plot <- plot_ly(data = mtcars, type = "box",
x = ~am, y = ~mpg,
color = ~am, colors = "Set1",
text = ~paste("Car:", rownames(mtcars))) %>%
layout(title = "MPG by Transmission Type",
xaxis = list(title = "Transmission Type"),
yaxis = list(title = "Miles per Gallon (MPG)"),
width=900, height=620,
updatemenus = list( # dropdown
list(
buttons = list(
list(method = "restyle",
args = list("type", "box"),
label = "Boxplot"),
list(method = "restyle",
args = list("type", "violin"),
label = "Violin Plot")
)
)
))
# Show graph
plot
An experienced reader might point out that animations can be created
in ggplot2 using gganinimate..
However, I believe animation is more suited for Plotly, although this
can vary based on personal preference and the specific task at hand.
Unfortunately, I couldn’t come up with an animation for the mtcars
dataset without relatively complex data transformations, so let’s go
with a classic - gapminder. This dataset contains
socio-economic characteristics for various countries over different
years.
if (!require("plotly")) install.packages("plotly")
if (!require("dplyr")) install.packages("dplyr")
if (!require("gapminder")) install.packages("gapminder")
library(plotly)
library(dplyr)
library(gapminder)
# Creating plotly graph
plotly_graph <- gapminder %>%
plot_ly(x = ~gdpPercap, y = ~lifeExp, size = ~pop,
text = ~country, hoverinfo = "text") %>%
layout(xaxis = list(type = "log"), width=900, height=620)
# Adding animation
plotly_graph %>%
# configure markers
add_markers(color = ~continent, frame = ~year, ids = ~country) %>%
# animation options
animation_opts(1000, easing = "elastic", redraw = FALSE) %>%
# confiugre animation widgets
animation_button(
x = 1, xanchor = "right", y = 0, yanchor = "bottom"
) %>%
animation_slider(
currentvalue = list(prefix = "YEAR ", font = list(color="red"))
)
I don’t quite understand the appeal of a 3-axis graph in static form. However, Plotly allows for much more exciting things than that.
if (!require("plotly")) install.packages("plotly")
library(plotly)
volcan <- plot_ly(z = volcano, type = "surface") %>%
layout(width=900, height=620)
volcan
R has its own wrapper around the JS library Highcharts, known as highcharter. The developers promise beautiful charts, extensive customization options, and seamless integration with the R Shiny Web Framework. However, the JS library itself is distributed under a commercial license. While the charts indeed look great, the package is not as popular as Plotly.
if (!require("highcharter")) install.packages("highcharter")
if (!require("dplyr")) install.packages("dplyr")
library(highcharter)
library(dplyr)
# Creating plot
hc <- highchart() %>%
# add dat and aes
hc_add_series(
data = mtcars,
type = "scatter",
hcaes(x = hp, y = mpg, group = as.factor(cyl))
) %>%
# title
hc_title(text = "Miles per Gallon vs Horsepower") %>%
# title x
hc_xAxis(title = list(text = "Horsepower")) %>%
# title y
hc_yAxis(title = list(text = "Miles per Gallon")) %>%
# configure hover effect
hc_tooltip(pointFormat = "mpg: {point.y}, hp: {point.x}, cyl: {point.group}") %>%
# configure legend
hc_legend(title = list(text = "Cylinders")) %>%
hc_size(width = 900, height = 600)
hc
The solutions discussed so far are fantastic but can require substantial effort, especially when creating numerous plots for large datasets. Trelliscope offers an intermediate solution between Plotly and a full-fledged Shiny application. The resulting dashboard allows for handling a large number of plots, enabling filtering and sorting. When I first started using it, there were issues with integrating it into Shiny web applications, but a lot of time has passed since then. It’s definitely worth browsing through galleries and code repositories.
Leaflet is an R wrapper for the powerful Leaflet.js library, which is used for creating interactive maps. The R package integrates seamlessly with Shiny and the R ecosystem, making it an excellent tool for adding interactive mapping capabilities to your applications.
if (!require("leaflet")) install.packages("leaflet")
library(leaflet)
# Creating data for markers
cities <- data.frame(
name = c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix"),
lat = c(40.7128, 34.0522, 41.8781, 29.7604, 33.4484),
lng = c(-74.0060, -118.2437, -87.6298, -95.3698, -112.0740)
)
# Building map
map <- leaflet(width=909, height=620) %>%
addTiles() %>% # Добавление базового слоя карты
# Attach markers
addMarkers(data = cities, ~lng, ~lat, popup = ~name)
# Show map
map
Undoubtedly, this is not an exhaustive list of libraries. There are also rbokeh, dygraph, and many niche libraries. In this article, I intentionally did not compare these with Python solutions. Such comparisons can easily be made in the comments. Generally, the choice between R and Python is made long before the first plot is created. But that is a topic for another article.