Data Science Stream

Topic 2B: Data Visualisation I


Example R code solutions for the Data Science Computer Lab 2, which uses data from Horst, Hill, and Gorman (2020), and the plotly (Sievert 2020) R package, are presented below.


1 Palmer Penguins Data Set

1.1

# Install package
install.packages("palmerpenguins")

1.2

# Load the `palmerpenguins` package into your current R working environment
library(palmerpenguins)
# Summarise the data in the `palmerpenguins` package
summary(penguins)
##       species          island    bill_length_mm  bill_depth_mm  
##  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
##  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
##                                  Mean   :43.92   Mean   :17.15  
##                                  3rd Qu.:48.50   3rd Qu.:18.70  
##                                  Max.   :59.60   Max.   :21.50  
##                                  NA's   :2       NA's   :2      
##  flipper_length_mm  body_mass_g       sex           year     
##  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :197.0     Median :4050   NA's  : 11   Median :2008  
##  Mean   :200.9     Mean   :4202                Mean   :2008  
##  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :231.0     Max.   :6300                Max.   :2009  
##  NA's   :2         NA's   :2

2 Creating Interactive Histograms in RStudio

hist(penguins$body_mass_g, breaks = 19)

2.1

install.packages("plotly")
library(plotly)

2.2

penguin_hist_base <- plot_ly(data = penguins, 
                             x = ~body_mass_g, 
                             type = "histogram")

penguin_hist_base <- penguin_hist_base %>% layout(yaxis = list(title = 'count'))

A brief explanation of the code is provided in the Code chunk below.

# Here, we are creating a plotly object called "penguin_hist_base"
penguin_hist_base <- plot_ly(data = penguins, # We are using the penguins data
                             x = ~body_mass_g, # and modelling the body_mass_g data
                             type = "histogram") # in a histogram format

# The code below is used to modify the layout of the histogram
# to include a label for the y-axis
penguin_hist_base <- penguin_hist_base %>% layout(yaxis = list(title = 'count'))

2.3

penguin_hist_base

2.4

No answer required.

2.5

penguin_hist <- plot_ly(data = penguins, 
                        x = ~body_mass_g, 
                        color = ~island, 
                        type = "histogram", alpha = 0.6)

penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'), 
                                        barmode ="overlay")

A brief explanation of the code is provided in the Code chunk below.

# Here, we are creating a plotly object called "penguin_hist"
penguin_hist <- plot_ly(data = penguins, # We are using the penguins data
                        x = ~body_mass_g, # and modelling the body_mass_g data
                        color = ~island, type = "histogram", alpha = 0.6)
# We are producing a histogram for this data, with points coloured differently, 
# depending on the island on which the penguin is located

# The code below is used to modify the layout of the histogram
# This includes adding a label to the y-axis
# and setting the histograms to be layered over each other
# (hence the alpha = 0.6 above to change the opacity)
penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'), 
                                        barmode ="overlay")

2.6

penguin_hist

2.7

No answer required.

3 Creating Interactive Scatter Plots in RStudio

3.1

No answer required.

3.2

penguins_scatter <- plot_ly(data = penguins, 
                            x = ~body_mass_g, y = ~flipper_length_mm)
penguins_scatter

3.3

penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex)
penguins_scatter2

3.4

An example result for an arbitrary selection of colours is shown below.

penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = c("cyan", "orange"))
penguins_scatter_colours

3.5

For brevity only the result for the Set2 colors specification is shown below.

penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = "Set2")
penguins_scatter_colours

3.6

penguins_scatter2 <- plot_ly(data = penguins, 
                             x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "markers")
penguins_scatter2
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "lines")
penguins_scatter2

Note that here, R is drawing a line between the individual data points - clearly we don’t want this!

3.7

penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", text = ~species,
                             type = "scatter", mode = "markers")
penguins_scatter2

3.8

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             type = "scatter", mode = "markers")
penguins_scatter3

3.9

Here we have used the symbols cross, diamond and star.

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers")
penguins_scatter3

3.10

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers",
                             marker = list(size = 8))
penguins_scatter3

4 Creating your own plotly Scatter Plot

4.1

penguins_scatter_new <- plot_ly(data = penguins, 
                                x = ~body_mass_g, y = ~bill_length_mm,
                                type = "scatter", mode = "markers")
penguins_scatter_new

4.2

penguins_scatter_new2 <- plot_ly(data = penguins, 
                                 x = ~body_mass_g, y = ~bill_length_mm,
                                 color = ~island,
                                 type = "scatter", mode = "markers")
penguins_scatter_new2

4.3

penguins_scatter_new3 <- plot_ly(data = penguins, 
                                 x = ~body_mass_g, y = ~bill_length_mm,
                                 color = ~island, symbol = ~species,
                                 type = "scatter", mode = "markers")
penguins_scatter_new3

4.4

penguins_scatter_new4 <- plot_ly(data = penguins, 
                                 x = ~body_mass_g, y = ~bill_length_mm,
                                 color = ~island, symbol = ~species, 
                                 symbols = c("cross", "diamond", "star"),
                                 type = "scatter", mode = "markers",
                                 marker = list(size=8))
penguins_scatter_new4

4.5

It does seem that penguins living on different islands have noticeably different body_mass_g and bill_length_mm measurements, but this is also due to the fact that some species of penguin only live on one of the three islands - e.g. Gentoo and Chinstrap penguins only live on Biscoe island and Dream island respectively, whereas the Adelie penguins live on all three islands.

However, we also note that the Adelie penguins living on Torgersen island are much smaller overall than Adelie penguins living on other islands.


That’s everything covered.


References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

