Data Science Stream
Topic 4B: Data Visualisation III
Welcome to the fourth computer lab for the Data Science stream of STM1001.
In the second and third Data Science stream computer labs we familiarised ourselves with the plotly
function, and made some informative and interactive plots of data from the palmerpenguins
package (Horst, Hill, and Gorman 2020).
This computer lab marks the final in our series of labs focusing on data visualisation in RStudio.
Today we will focus on further developing our skills with plotly
(Sievert 2020), and cover adding custom controls and animations to our plotly
plots. The coding in this lab is a little more intense than in previous weeks, but we will take our time, and go through each of the steps slowly.
By the end of this lab, you should be able to create plotly
graphs with customized sliders, plotly
graphs containing multiple plots, plotly
graphs with dropdown buttons to switch between data visualizations, and have a decent idea of how to animate plotly
graphs.
Note: Before you begin this lab, make sure you have read over Section 5 of the Data Visualisation in R supplement on adding buttons to plotly
graphs.
Preparation
🏡 Before we begin our work, we will need to carry out some initial preparations.
To begin, we will need to load all the requisite packages.
By now, you should have the palmerpenguins
and plotly
packages installed on your system.
Open up RStudio and load these packages now.
If for some reason you do not have one or both of these packages installed, please install them before continuing.
Recall in the second Data Science computer lab that we created a simple scatter plot using the body_mass_g
and flipper_length_mm
variables from the penguins
data set. We also used different colours to distinguish between male and female penguins.
We will introduce some new plotly
features using this scatter plot as a base. The code for this plot is reproduced below:
penguins_scatter <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "markers")
penguins_scatter <- penguins_scatter %>%
layout(title = "Scatter Plot of Penguin Data",
legend=list(title=list(text='Sex')),
xaxis = list(title = "Penguin Body Mass (grams)"),
yaxis = list(title = "Penguin Flipper Length (mm)"))
Run this code in RStudio now.
Adding Range Sliders to plotly
Plots
💻 In the third Data Science computer lab, we gained some experience using the pipe operator and the layout
function.
Let’s use those skills now to add some fancier elements to our plotly
graphs. First, we will take a look at adding a range slider to our plot.
💻 Suppose that we would like to add a range slider to the x-axis of our penguins_scatter
scatter plot from 1.2. A range slider can be used to dynamically select a subsection of our plot, in a similar, but more controlled way to left-clicking and dragging a box over our plot to zoom in on a section.
We can add a range slider to the x-axis of a plot using the function rangeslider()
.
Use the pipe operator to add a range slider to the penguins_scatter
.
💻 If your code has worked, your scatter plot should now include a range slider (as shown below).
Try left-clicking and dragging the bars on the slider endpoints.
Hint: You can always check the Code
box below if your code is not working.
penguins_scatter %>% rangeslider()
# Not too difficult so far!
💻 Initially, adding a range slider to your plot might have seemed very difficult.
However, as you can see, despite being an impressive addition it’s actually very straightforward to implement.
Of course, there are various additional adjustments that we could make to our range slider, but for now this one line of code is sufficient for our purposes.
🎧 Online students
💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.
Creating animated plotly
Plots in RStudio
💻 Another impressive addition we can make to our plot is to turn it into an animation.
You might recall that one of the variables in the penguins
data set which we haven’t really considered so far is the year
variable - namely, we have penguin data recorded for the years 2007, 2008 and 2009.
Suppose we would like to see how the body_mass_g
and bill_length_mm
values of the male and female penguins changes over the years. We already have this information stored away in the penguins
data set, but we haven’t visualised it yet.
Could we somehow modify our scatter plot to show data for each year, and dynamically switch between years on command? Is such a thing even possible? Why, with plotly
, yes it is!
💻 Adding animations to a plotly
plot is surprisingly easy, but we need to ensure that our data and code is set up properly. Fortunately, in this instance the penguins
data set which we are using already contains the information we would like to use for our animation.
The argument we will use to turn our scatter plot into an animated plot is simply frame = ...
.
We need to include this inside our plotly()
function, in a similar fashion to how we use x= ...
and y= ...
when assigning the data for our x
and y
variables.
Use the frame = ...
argument to add the year
variable into our penguins_scatter
scatter plot from 1.2. You may want to assign the new plot to a new object - e.g. penguins_scatter_anim
.
Note: You will of course need to replace the ...
’s with the appropriate code.
💻 If your code has worked, your plot should now include an animation option (as shown below):
If you haven’t already, try clicking on the Play
button, to watch the animation unfold.
If it’s a little fast, you can also click and drag the circle in the slider to change between years.
💻 Next, instead of using the year
variable, create a scatter plot animation that cycles through the different species
of penguin in the penguins
data set.
Also, change your hover text from showing the species of penguin, to showing the year the data was recorded.
What do you notice about the different species?
🎧 Online students
💬 Comment on any differences you observe between the different species.
Creating Combined plotly
Plots in RStudio
💻 Over the course of the data science data visualisation labs, we have created interactive histograms, scatter plots, box plots and violin plots.
Suppose that we would like to present multiple data visualisations of the penguins
data together in the one graph. One approach we could use for this would be to use the plotly
subplot
function.
💻 To begin, suppose we would like to combine an interactive histogram and an interactive scatter plot focusing on the recorded penguins
body mass values.
Run the code in the code chunk below to prepare the histogram and reset the scatter plot details:
penguin_hist <- plot_ly(data = penguins,
x = ~body_mass_g,
color = ~island,
type = "histogram", alpha = 0.6)
penguin_hist <- penguin_hist %>% layout(barmode ="overlay")
penguins_scatter <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "markers")
Next, take a look at the R code below:
penguin_combined_plots <- subplot(penguins_scatter, penguin_hist,
nrows = 2, margin = 0.05)
penguin_combined_plots <- penguin_combined_plots %>%
layout(title = "Palmer Penguin Data",
xaxis = list(title = 'body_mass_g'),
yaxis = list(title = "flipper_length_mm"),
xaxis2 = list(title = 'body_mass_g'),
yaxis2 = list(title = "count"))
Note that here:
- We are using the
subplot
command to plot the penguins_scatter
and penguin_hist
plots together.
- The
nrows = 2
argument tells R to produce these plots in 2 rows.
- The
margin = 0.05
argument tells R to leave a small margin between the two plots.
- The subsequent lines of code are used to add a title to our selection of plots, and add axes labels to the plots - note that we use
xaxis
to define the x-axis label for the first plot, and xaxis2
to define the x-axis label for the second plot (and similarly for the y-axes).
💻 If you now run this object penguin_combined_plots
, you should obtain the set of two plots, in a single view (as shown below):
penguin_combined_plots
Note that the two plots are still completely interactive. The legends have been combined, and can be used to filter the individual plots.
While we have only combined two plots here, the subplot
function can be used to present several plots together, which can be particularly informative when you would like to display multiple aspects of your data simultaneously.
The only major downside of presenting plots together using subplot
is that their axes labels are removed by default, and must be re-specified, as above.
💻 Note that the automatically generated legend title of the combined subplot
shown above in 4.2 is not completely accurate. Add a more informative legend to this subplot
.
Hint: You will have to add an argument to the layout
section of the code.
💻 Using the information from 4.1, try to combine the scatter plot from 1.2 with both the histogram from 4.1, and the box plots you created in section 2.7 of the third Data Science computer lab.
Make sure your combined subplot
has appropriate axes labels and legend.
Hint: You can check the solutions for Data Science Computer Lab 3, or create a new object for the box plots using your plotly
skills.
🎧 Online students
💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.
Extension: Adding Buttons to plotly
Plots
💻 So far, hopefully the coding in this lab has not been too intense. That’s about to change.
While subplots
are useful for combining a couple of different plots, they can become unwieldy when we consider too many plots at once.
In this final section, we’ll look at an alternative approach - buttons
.
We can add buttons
to a plotly
plot, which (when clicked) will allow us to shift between different presentations of our data.
Please note - before you begin this section, make sure you have completed the Data Science Computer Lab 3 and have read through Section 5 of the Data Visualisation in R supplement.
Note: It may also be helpful to have these open in a separate tab, so that you can refer to them as you work through this section.
💻 It is worth noting that plotly
graphs incorporating buttons
can run into difficulties when trying to switch between different data sets.
To keep things at an appropriate level of difficulty, we will focus on presenting data for one variable from the penguins
data set at a time using our buttons
-enabled plots.
💻 Good work! Let’s take a step back, and consider some smaller modifications we can make to our code in 5.4.
To start, perhaps instead of both buttons showing, we would like a dropdown menu. This can be achieved by changing the type = "buttons"
code in our penguins_plots
object code to type = "dropdown"
(which makes sense).
Try making this change now, and check the results.
Note: Despite changing the type = "buttons"
code, we keep the buttons = list(...)
specification, this doesn’t need to change.
💻 You will have noticed that the penguins_plots
’s histogram looks a little different to the one in our combined subplot
from 4.2. This is because we have not specified here that the histograms should overlay each other. Recall that we can do this via the barmode ="overlay"
command.
Normally, we would have to specify this when creating our original plot. However, by using the pipe operator and the layout
function, we can easily add this in to our penguins_plots
plot.
Try inserting the barmode ="overlay"
command into your code for our new penguins_plots
object now.
Hint: It doesn’t need to be added within the updatemenus
function.
💻 Next, to really appreciate the benefit of the buttons
approach over the subplot
approach, let’s add a third plot in our 5.2 code.
Using your code from 5.3 as a guide, add code to the layout
specifications of the penguins_plots
object so that the penguins
body mass data can, as a third option, also be presented in violin plots.
Note: To ensure the box plots are shown within the violin plots, you will need to also add the code box = list(visible = T )
in your violin plot specifications.
💻 Now that you feel more comfortable using buttons
, try to complete the following steps:
- Switch the order of
buttons
around so that violin plots are shown first and the histogram is shown last.
- Change the data shown from being body mass data, to bill length data, and adjust the title accordingly.
- Colour the observations by
species
, not sex
.
- Add a rangeslider to your plots.
💻 As you can see, when we are dealing with plots presenting multiple variables, it may be better to use subplots
, while if we are dealing with different plots of the one variable, it may be better to use buttons
. Perhaps a mixture is best.
Do you have a preference for subplots
or buttons
?
🎧 Online students
💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.
Well done! There was a lot of content today. You have come a long way from that first base R histogram you created back in the second Data Science Computer Lab.
Don’t worry if you weren’t able to finish everything in the one session - there is quite of lot of material to work through in this lab, and it’s not easy.
Hopefully though, you are beginning to feel quite skilled with using plotly
. The techniques and coding skills you are learning should hold you in good stead for the following weeks. Remember, you can always refer back to this material at a later date if you need a quick refresher.
Before you finish up, make sure to save your script file somewhere safe - it might come in handy later on.
References
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020.
Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.
https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020.
Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC.
https://plotly-r.com.
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License
BY-NC-ND.
