Data Science Module
Topic 3B: Data Visualisation III
Welcome to the third Data Science computer lab.
Last week, in the Data Science computer lab 2, we familiarised ourselves with the plotly
function, and made some informative and interactive plots, using the palmerpenguins
R data set (Horst, Hill, and Gorman 2020).
This week, we will focus on further developing our skills with plotly
(Sievert 2020), and cover adding custom controls and animations to our plotly
plots. The coding in this lab is a little more intense than in previous weeks, but we will take our time, and go through each of the steps slowly.
Before you begin this lab, make sure you have read over Section 4 of the Data Visualisation in R supplement on piping, and Section 5 of the Data Visualisation in R supplement on adding buttons to plotly
graphs.
By the end of this lab, you should understand piping, be able to create plotly
graphs with customized sliders and dropdown buttons, and have a decent idea of how to animate plotly
graphs.
Preparation
Before we begin our work, we will need to carry out some initial preparations.
To begin, we will need to load all the requisite packages.
By now, you should have the palmerpenguins
and plotly
packages installed on your system.
Open up RStudio and load these packages now.
If for some reason you do not have one or both of these packages installed, just click the Code
button below for the relevant code you will need to run.
install.packages("palmerpenguins")
install.packages("plotly")
If you need a quick refresher on how to load packages in R, just click the Code
button below.
library(palmerpenguins)
library(plotly)
Piping
Recall from Section 4.1 of the Data Visualisation in R supplement that the pipe operator can be used to chain together a sequence of operations in R, in an intuitive manner which is typically easier to read than alternative methods. Piping can be used to add additional details to existing objects, without the need to define new objects.
Let’s take a look at a simple example.
Recall in section 2 of the Data Science computer lab 2 that we created a simple scatter plot using the body_mass_g
and flipper_length_mm
variables from the penguins
data set. We also used different colours to distinguish between male and female penguins. This code is reproduced below:
penguins_scatter <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "markers")
When we hover over a point in our scatter plot, we see the flipper length, body mass, and sex details for that point. This is great, but we are missing one important piece of information - the species of penguin! Fortunately, it is straightforward to add this information to the hover text. We can do this by including the argument text = ~species
in our code, in a similar way to how we have used color = ~sex
to colour the points.
Update your penguins_scatter
plot with this addition now, and hover over some points to check that your code has worked as intended.
Suppose that we would like to add a title to the penguins_scatter
plot above.
Instead of rewriting our plot_ly
function in 2.2 and assigning the output to a new object (e.g. penguins_scatter2
), we could use the pipe operator to add this information to our penguins_scatter
plot.
The code below does just this:
penguins_scatter %>% layout(title = "Scatter Plot of Penguin Data")
We can also add a title to our legend. This can often help to make our graphs more informative.
Try running the code below, and check the result:
penguins_scatter %>% layout(title = "Scatter Plot of Penguin Data",
legend=list(title=list(text='Sex')))
In the 2.4 code, we were able to add details to multiple components of our plot, via the layout
function. Generally, when we make changes to plotly
plots via piping, we are making changes to the layout, rather than the core data being visualised.
Within the layout
function, we have used the argument title
(the function of which is to, rather appropriately, change the title). This is one of many possible arguments - some you will learn as we develop our understanding of plotly
, and some you may never use, as they are quite context specific. Typically though, the names of the arguments are clear and easy to remember - for instance, legend
allows us to change details in the legend.
To conclude this example, suppose that we would like to rename our plot’s x-axis and y-axis. The default names are ok, but perhaps we would like something a little different. Take a look at the code below, and try filling in the missing details (denoted by the …s) for the yaxis:
penguins_scatter %>% layout(xaxis = list(title = "Penguin Body Mass (grams)"),
...)
)
Make sure to run your code to verify it is working as intended.
Hint: If you would like to check your work, or are not quite sure if you are on the right track, just click the Code
button below.
# An example for the yaxis code is shown below
penguins_scatter %>% layout(xaxis = list(title = "Penguin Body Mass (grams)"),
yaxis = list(title = "Penguin Flipper Length (mm)")
)
Notice that we have included the list
function within our layout
function coding.
The xaxis
and yaxis
arguments can both take several settings - for example, we could change the x-axis title, and font size. This is typically the case for layout
arguments (the title in 2.3 was an exception).
Therefore, please keep in mind that generally speaking, in the context of our plotly
graphs, when dealing with the layout
function we need to use the list
function before specifying our desired changes to layout
arguments.
As a final note, it’s worth pointing out that our main title from 2.3 has disappeared in our new plot. This is because we did not assign our enhanced plot to a new object. When we use piping, we are not modifying the original object, but rather are carrying out operations on/with it. Therefore any changes we implement are not saved to the original object.
Adding Range Sliders to Plotly plots
Now that we have discussed the pipe operator and the layout
function, let’s consider adding some fancier elements to our plotly
graphs. First, let’s take a look at adding a range slider to our plot.
Suppose that we would like to add a range slider to our x-axis. A range slider can be used to dynamically select a subsection of our plot, in a similar, but more controlled way to left-clicking and dragging a box over our plot to zoom in on a section.
We can add a range slider to the x-axis of our plot using the function rangeslider()
.
Use the pipe operator to add a range slider to our scatter plot from 2.2.
If your code has worked, you should obtain the plot shown below - try left-clicking and dragging the bars on the slider endpoints.
Hint: You can always check the Code
box below if your code is not working.
penguins_scatter %>% rangeslider()
# Not too scary so far!
Initially, adding a range slider to your plot might have seemed very difficult.
However, as you can see, despite being an impressive addition it’s actually very straightforward to implement.
Of course, there are various additional adjustments that we could make to our range slider (and we will come back to range sliders later on in the semester), but for now, this one line of code is sufficient for our purposes.
Creating animated Plotly plots
Another (even more) impressive addition we can make to our plot is to turn it into an animation.
You might recall that one of the variables in the penguins
data set which we haven’t really considered so far is the year
variable - namely, we have penguin data recorded for the years 2007, 2008 and 2009.
Suppose we would like to see how the body_mass_g
and bill_length_mm
values of the male and female penguins changes over the years. We already have this information stored away in the penguins
data set, but we haven’t visualised it yet.
Could we somehow modify our scatter plot to show data for each year, and dynamically switch between years on command? Is such a thing even possible? Why, with plotly
, yes it is!
Adding animations to a plotly
plot is surprisingly easy, but we need to ensure that our data and code is set up properly. Fortunately, in this instance the penguins
data set which we are using already contains the information we would like to use for our animation.
The argument we will use to turn our scatter plot into an animated plot is frame =
. That’s it.
We need to include this inside our plotly()
function, in a similar fashion to how we use x=
and y=
when assigning the data for our x
and y
variables.
Try using the frame =
argument to add the year
variable into our scatter plot from 2.2. You may want to assign the new plot to a new object - e.g. penguins_scatter_anim
.
Your end result should look like the plot below:
If you haven’t already, try clicking on the Play
button, to watch the animation unfold.
If it’s a little fast, you can also click and drag the circle in the slider to change between years.
Next, instead of using the year
variable, create a scatter plot animation that cycles through the different species
of penguin in the penguins
data set. Change your hover text from showing the species of penguin, to showing the year the data was recorded.
What do you notice about the different species?
Adding buttons to Plotly plots
So far, hopefully the coding in this lab has not been too intense. That’s about to change.
In this final section, we’ll look at how to add buttons to our plotly
plot, which (when clicked) will allow us to shift between different presentations of our data. Specifically, we’ll add buttons that allow us to shift between viewing our penguins
data as a scatter plot, and as a histogram.
Please note - before you begin this section, make sure you have read through Section 5 of the Data Visualisation in R supplement. It may also be helpful to have this open in a separate tab, so that you can refer to it as you work through this section.
Sometimes, adding buttons to plotly
plots that include additional hover text can lead to display errors. Therefore, before continuing, please re-run the R code you used in 2.1 (reproduced in the code chunk below for your convenience):
penguins_scatter <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "markers")
Let’s take a step back, and consider some smaller modifications we can make to our code in 5.2.
To start, perhaps instead of both buttons showing, we would like a dropdown menu. This can be achieved by changing the type = "buttons"
code in the line list(y = 0.8, type = "buttons",
to type = "dropdown"
(which makes sense).
Try making this change now.
You will have noticed that the histogram looks a little different to the one from the Data Science computer lab 2. This is because we have not specified that the histograms should overlay each other. Recall that we can do this via the barmode ="overlay"
command.
Normally, we would have to specify this when creating our original plot. However, by using the pipe operator and the layout
function, we can easily add this in to our new plot. Try inserting this command in your code for our new penguins_plots
object.
Hint: It doesn’t need to be added within the updatemenus
function.
Finally, let’s add an appropriate title and a range slider to our new penguins_plots
.
Hint: Refer back to our previous steps, e.g. 2.3 and 3.1, if you are not quite sure how to proceed.
Well done! There was a lot of content today.
Don’t worry if you weren’t able to finish everything in the one session - there is quite of lot of material to work through in this lab, and it’s not easy.
Hopefully though, you are beginning to feel quite skilled with using plotly
. The techniques and coding skills you are learning should hold you in good stead for the following weeks. Remember, you can always refer back to this material at a later date if you need a quick refresher.
Before you finish up, make sure to save your script file somewhere safe - it might come in handy later on.
References
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020.
Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.
https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020.
Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC.
https://plotly-r.com.
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License
BY-NC-ND.
