Data Science Stream

Topic 3B: Data Visualisation II


Example R code solutions for the Data Science Computer Lab 3, which uses data from Horst, Hill, and Gorman (2020), and the plotly (Sievert 2020) R package, are presented below.


1 Preparation

1.1

library(palmerpenguins)
library(plotly)

2 Creating Interactive Box Plots in RStudio

2.1

No answer required.

2.2

penguins_box <- plot_ly(data = penguins, y = ~body_mass_g, type = "box")
penguins_box

2.3

penguins_box <- plot_ly(data = penguins, 
                        y = ~body_mass_g, 
                        type = "box", 
                        x0 = "body mass (g)")
penguins_box

2.4

penguins_box <- plot_ly(data = penguins, 
                        y = ~body_mass_g,
                        color = ~sex, 
                        type = "box")
penguins_box

2.5

We observe that the distributions of body mass values for both male and female penguins are positively skewed, and clearly not symmetrical, as shown by the median values not being equidistant between the first and third quartile values.

2.6

penguins_box <- plot_ly(data = penguins, 
                        x = ~species, y = ~body_mass_g, 
                        color = ~sex, type = "box")
penguins_box

2.7

penguins_box %>% layout(boxmode = "group")

2.8

We observe that the male penguins for each species have much higher median body mass values. There is a particularly large difference in the distributions of body masses for male and female Gentoo penguins. The male and female Chinstrap penguins are relatively close in median body mass. Interestingly, the female Gentoo penguins are generally much heavier than both female and male penguins of the other species.

Another point of interest is that the distributions of body mass, when split across species and sex, no longer appear as skewed. The male Adelie and Chinstrap penguins have slightly skewed body mass distributions, but the other groups appear to have roughly symmetric distributions.

3 Piping

3.1

penguins_box %>% layout(title = "Box Plots of Penguin body mass Data", 
                        boxmode = "group")

3.2

penguins_box %>% layout(title = "Box Plots of Penguin body mass Data", 
                        boxmode = "group",
                        legend=list(title=list(text='Sex')))

3.3

No answer required.

3.4

penguins_box %>% layout(xaxis = list(title = "Penguin Species"),
                        yaxis = list(title = "Penguin Body Mass (grams)"),
                        boxmode = "group",
                        legend=list(title=list(text='Sex')))

3.5

No answer required.

3.6

No answer required.

4 Creating Interactive Violin Plots in RStudio

4.1

penguins_violin <- plot_ly(data = penguins, 
                           y = ~body_mass_g, 
                           type = "violin", 
                           x0 = "body mass (g)",
                           box = list(visible = T ))
penguins_violin

4.2

penguins_violin <- plot_ly(data = penguins, 
                           x = ~species,
                           y = ~body_mass_g, 
                           type = 'violin',
                           box = list(visible = T )) 
penguins_violin

4.3

# Note you could replace split = ~sex with color = ~sex here
penguins_violin <- plot_ly(data = penguins, 
                           x = ~species,
                           y = ~body_mass_g, 
                           split = ~sex, 
                           type = 'violin',
                           box = list(visible = T )) 
penguins_violin

4.4

penguins_violin  %>% layout(violinmode = "group")

4.5

penguins_violin  %>% layout(title = "Violin Plots of Penguin body mass Data",
                            violinmode = "group")

5 Extension: Creating your own plotly plots

Example code for the creation of violin plots with all the specified characteristics is shown below:

violin_fig <- plot_ly(data = penguins, 
                      x = ~sex, y = ~bill_length_mm, 
                      type = 'violin', 
                      split = ~species, 
                      color = ~island, 
                      text = ~island,
                      box = list(visible = T )) 

violin_fig %>% layout(title = "Violin Plots of Penguin bill length Data",
                      yaxis = list(title = "bill length (mm)"), 
                      violinmode = "group")


That’s everything for this lab.


References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

