Data Science Stream

Topic 4B: Data Visualisation III


Welcome to the fourth computer lab for the Data Science stream of STM1001.

In the second and third Data Science stream computer labs we familiarised ourselves with the plotly function, and made some informative and interactive plots of data from the palmerpenguins package (Horst, Hill, and Gorman 2020).

This computer lab marks the final in our series of labs focusing on data visualisation in RStudio.

Today we will focus on further developing our skills with plotly (Sievert 2020), and cover adding custom controls and animations to our plotly plots. The coding in this lab is a little more intense than in previous weeks, but we will take our time, and go through each of the steps slowly.

By the end of this lab, you should be able to create plotly graphs with customized sliders, plotly graphs containing multiple plots, plotly graphs with dropdown buttons to switch between data visualizations, and have a decent idea of how to animate plotly graphs.

Note: Before you begin this lab, make sure you have read over Section 5 of the Data Visualisation in R supplement on adding buttons to plotly graphs.


1 Preparation

🏡 Before we begin our work, we will need to carry out some initial preparations.

1.1

To begin, we will need to load all the requisite packages. By now, you should have the palmerpenguins and plotly packages installed on your system. Open up RStudio and load these packages now.

If for some reason you do not have one or both of these packages installed, please install them before continuing.

1.2

Recall in the second Data Science computer lab that we created a simple scatter plot using the body_mass_g and flipper_length_mm variables from the penguins data set. We also used different colours to distinguish between male and female penguins.

We will introduce some new plotly features using this scatter plot as a base. The code for this plot is reproduced below:

penguins_scatter <- plot_ly(data = penguins, 
                            x = ~body_mass_g, y = ~flipper_length_mm, 
                            color = ~sex, colors = "Set1",
                            type = "scatter", mode = "markers")

penguins_scatter <- penguins_scatter %>% 
                        layout(title = "Scatter Plot of Penguin Data", 
                               legend=list(title=list(text='Sex')),
                               xaxis = list(title = "Penguin Body Mass (grams)"),
                               yaxis = list(title = "Penguin Flipper Length (mm)"))

Run this code in RStudio now.

2 Adding Range Sliders to plotly Plots

💻 In the third Data Science computer lab, we gained some experience using the pipe operator and the layout function.

Let’s use those skills now to add some fancier elements to our plotly graphs. First, we will take a look at adding a range slider to our plot.

2.1

💻 Suppose that we would like to add a range slider to the x-axis of our penguins_scatter scatter plot from 1.2. A range slider can be used to dynamically select a subsection of our plot, in a similar, but more controlled way to left-clicking and dragging a box over our plot to zoom in on a section.

We can add a range slider to the x-axis of a plot using the function rangeslider().

Use the pipe operator to add a range slider to the penguins_scatter.

2.2

💻 If your code has worked, your scatter plot should now include a range slider (as shown below).

Try left-clicking and dragging the bars on the slider endpoints.

Hint: You can always check the Code box below if your code is not working.

penguins_scatter %>% rangeslider()
# Not too difficult so far!

2.3

💻 Initially, adding a range slider to your plot might have seemed very difficult. However, as you can see, despite being an impressive addition it’s actually very straightforward to implement.

Of course, there are various additional adjustments that we could make to our range slider, but for now this one line of code is sufficient for our purposes.


🎧 Online students 💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.


3 Creating animated plotly Plots in RStudio

💻 Another impressive addition we can make to our plot is to turn it into an animation.

You might recall that one of the variables in the penguins data set which we haven’t really considered so far is the year variable - namely, we have penguin data recorded for the years 2007, 2008 and 2009.

Suppose we would like to see how the body_mass_g and bill_length_mm values of the male and female penguins changes over the years. We already have this information stored away in the penguins data set, but we haven’t visualised it yet.

Could we somehow modify our scatter plot to show data for each year, and dynamically switch between years on command? Is such a thing even possible? Why, with plotly, yes it is!

3.1

💻 Adding animations to a plotly plot is surprisingly easy, but we need to ensure that our data and code is set up properly. Fortunately, in this instance the penguins data set which we are using already contains the information we would like to use for our animation.

The argument we will use to turn our scatter plot into an animated plot is simply frame = .... We need to include this inside our plotly() function, in a similar fashion to how we use x= ... and y= ... when assigning the data for our x and y variables.

Use the frame = ... argument to add the year variable into our penguins_scatter scatter plot from 1.2. You may want to assign the new plot to a new object - e.g. penguins_scatter_anim.

Note: You will of course need to replace the ...’s with the appropriate code.

3.2

💻 If your code has worked, your plot should now include an animation option (as shown below):

If you haven’t already, try clicking on the Play button, to watch the animation unfold. If it’s a little fast, you can also click and drag the circle in the slider to change between years.

3.3

💻 Next, instead of using the year variable, create a scatter plot animation that cycles through the different species of penguin in the penguins data set.

Also, change your hover text from showing the species of penguin, to showing the year the data was recorded.

What do you notice about the different species?


🎧 Online students 💬 Comment on any differences you observe between the different species.


4 Creating Combined plotly Plots in RStudio

💻 Over the course of the data science data visualisation labs, we have created interactive histograms, scatter plots, box plots and violin plots.

Suppose that we would like to present multiple data visualisations of the penguins data together in the one graph. One approach we could use for this would be to use the plotly subplot function.

4.1

💻 To begin, suppose we would like to combine an interactive histogram and an interactive scatter plot focusing on the recorded penguins body mass values.

Run the code in the code chunk below to prepare the histogram and reset the scatter plot details:

penguin_hist <- plot_ly(data = penguins, 
                        x = ~body_mass_g, 
                        color = ~island, 
                        type = "histogram", alpha = 0.6)

penguin_hist <- penguin_hist %>% layout(barmode ="overlay")

penguins_scatter <- plot_ly(data = penguins, 
                            x = ~body_mass_g, y = ~flipper_length_mm, 
                            color = ~sex, colors = "Set1",
                            type = "scatter", mode = "markers")

Next, take a look at the R code below:

penguin_combined_plots <- subplot(penguins_scatter, penguin_hist, 
                                  nrows = 2, margin = 0.05) 
penguin_combined_plots <- penguin_combined_plots %>% 
                            layout(title = "Palmer Penguin Data",
                                   xaxis = list(title = 'body_mass_g'), 
                                   yaxis = list(title = "flipper_length_mm"),
                                   xaxis2 = list(title = 'body_mass_g'), 
                                   yaxis2 = list(title = "count"))

Note that here:

  • We are using the subplot command to plot the penguins_scatter and penguin_hist plots together.
  • The nrows = 2 argument tells R to produce these plots in 2 rows.
  • The margin = 0.05 argument tells R to leave a small margin between the two plots.
  • The subsequent lines of code are used to add a title to our selection of plots, and add axes labels to the plots - note that we use xaxis to define the x-axis label for the first plot, and xaxis2 to define the x-axis label for the second plot (and similarly for the y-axes).

4.2

💻 If you now run this object penguin_combined_plots, you should obtain the set of two plots, in a single view (as shown below):

penguin_combined_plots

Note that the two plots are still completely interactive. The legends have been combined, and can be used to filter the individual plots.

While we have only combined two plots here, the subplot function can be used to present several plots together, which can be particularly informative when you would like to display multiple aspects of your data simultaneously.

The only major downside of presenting plots together using subplot is that their axes labels are removed by default, and must be re-specified, as above.

4.3

💻 Note that the automatically generated legend title of the combined subplot shown above in 4.2 is not completely accurate. Add a more informative legend to this subplot.

Hint: You will have to add an argument to the layout section of the code.

4.4

💻 Using the information from 4.1, try to combine the scatter plot from 1.2 with both the histogram from 4.1, and the box plots you created in section 2.7 of the third Data Science computer lab.

Make sure your combined subplot has appropriate axes labels and legend.

Hint: You can check the solutions for Data Science Computer Lab 3, or create a new object for the box plots using your plotly skills.


🎧 Online students 💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.


5 Extension: Adding Buttons to plotly Plots

💻 So far, hopefully the coding in this lab has not been too intense. That’s about to change.

While subplots are useful for combining a couple of different plots, they can become unwieldy when we consider too many plots at once.

In this final section, we’ll look at an alternative approach - buttons. We can add buttons to a plotly plot, which (when clicked) will allow us to shift between different presentations of our data.

Please note - before you begin this section, make sure you have completed the Data Science Computer Lab 3 and have read through Section 5 of the Data Visualisation in R supplement.

Note: It may also be helpful to have these open in a separate tab, so that you can refer to them as you work through this section.

5.1

💻 It is worth noting that plotly graphs incorporating buttons can run into difficulties when trying to switch between different data sets.

To keep things at an appropriate level of difficulty, we will focus on presenting data for one variable from the penguins data set at a time using our buttons-enabled plots.

5.2

💻 To begin, we will create an initial simple object with a button, and then build upon this with subsequent adjustments.

The code chunk below contains all the code we need for our first plot.

Note that here:

  • We specify the buttons information within the layout arguments.
  • The buttons argument can contain a list of information, with each element of the list corresponding to a different data visualisation.
  • To begin, we have just the one specification, a histogram - the details of the histogram are specified within a list object (within the buttonslist).

Run this code now, and then call the penguin_plots object, to see the results. Ignore any red text that appears.

penguins_plots <- plot_ly(data = penguins, 
                          x = ~body_mass_g, 
                          color = ~sex, 
                          colors = "Set1", 
                          opacity = 0.6) %>% 
  layout(
    
    title = "Penguin's Body Mass Data",
    
    updatemenus = list(
      
      list(x = 1.2, y = 0.7, type = "buttons", 
           
         # We really just have to focus on the code below   
         buttons = list(
           
          list(method = "restyle",
               label = "Histogram", # The button label
               args = list(
                list(type = list("histogram")))) # The plot type
    ))))

Note: The spacing here is not strictly necessary, but has been chosen with the aim of making the different arguments clearer.

5.3

💻 Note that at the moment, the Histogram button in your plot won’t do anything, as we just have the one plot.

Using the code above in 5.2 as a guide, add code to the layout specifications of the penguins_plots object so that, as a second option, the penguins body mass data can also be presented in box plots.

Note: You will need to add a comma after list(type = list("histogram")))) since you are adding another argument to the buttons list.

Hint: If you are stuck, check the code chunk below:

# Add a comma at the end of the existing line of code
list(type = list("histogram"))))
# and then paste the following on the next line, before the final four )'s.

          list(method = "restyle",
               label = "Box Plots",
               args = list(
                list(type = list("box"))))
# Check with your lab demonstrator if this is unclear.

5.4

💻 If your code has worked, your plot should now have two buttons, which we can switch between (as shown below):

5.5

💻 Good work! Let’s take a step back, and consider some smaller modifications we can make to our code in 5.4.

To start, perhaps instead of both buttons showing, we would like a dropdown menu. This can be achieved by changing the type = "buttons" code in our penguins_plots object code to type = "dropdown" (which makes sense).

Try making this change now, and check the results.

Note: Despite changing the type = "buttons" code, we keep the buttons = list(...) specification, this doesn’t need to change.

5.6

💻 You will have noticed that the penguins_plots’s histogram looks a little different to the one in our combined subplot from 4.2. This is because we have not specified here that the histograms should overlay each other. Recall that we can do this via the barmode ="overlay" command.

Normally, we would have to specify this when creating our original plot. However, by using the pipe operator and the layout function, we can easily add this in to our penguins_plots plot.

Try inserting the barmode ="overlay" command into your code for our new penguins_plots object now.

Hint: It doesn’t need to be added within the updatemenus function.

5.7

💻 Next, to really appreciate the benefit of the buttons approach over the subplot approach, let’s add a third plot in our 5.2 code.

Using your code from 5.3 as a guide, add code to the layout specifications of the penguins_plots object so that the penguins body mass data can, as a third option, also be presented in violin plots.

Note: To ensure the box plots are shown within the violin plots, you will need to also add the code box = list(visible = T ) in your violin plot specifications.

5.8

💻 Now that you feel more comfortable using buttons, try to complete the following steps:

  • Switch the order of buttons around so that violin plots are shown first and the histogram is shown last.
  • Change the data shown from being body mass data, to bill length data, and adjust the title accordingly.
  • Colour the observations by species, not sex.
  • Add a rangeslider to your plots.

5.9

💻 As you can see, when we are dealing with plots presenting multiple variables, it may be better to use subplots, while if we are dealing with different plots of the one variable, it may be better to use buttons. Perhaps a mixture is best.

Do you have a preference for subplots or buttons?


🎧 Online students 💬 Volunteer to share your screen to show and describe your plotly plot. Highlight any issues you have encountered while making the plot.



Well done! There was a lot of content today. You have come a long way from that first base R histogram you created back in the second Data Science Computer Lab.

Don’t worry if you weren’t able to finish everything in the one session - there is quite of lot of material to work through in this lab, and it’s not easy.

Hopefully though, you are beginning to feel quite skilled with using plotly. The techniques and coding skills you are learning should hold you in good stead for the following weeks. Remember, you can always refer back to this material at a later date if you need a quick refresher.

Before you finish up, make sure to save your script file somewhere safe - it might come in handy later on.


References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

