Data Science Module
Palmer Penguins Data Set
# Install package
install.packages("palmerpenguins")
# This code loads the `palmerpenguins` package into your current R working environment.
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.2
# This code summarises the data in the `palmerpenguins` package.
summary(penguins)
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
Plotly Scatter Plots
# Install package
install.packages("plotly")
# Load package
library(plotly)
penguins_scatter <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm)
penguins_scatter
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex)
penguins_scatter2
penguins_scatter_colours <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = c("cyan", "orange"))
penguins_scatter_colours
penguins_scatter_colours <- plot_ly(data = penguins,
x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set2")
penguins_scatter_colours
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "markers")
penguins_scatter2
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1",
type = "scatter", mode = "lines")
penguins_scatter2
Note that here, R is drawing a line between the individual data points - clearly we don’t want this!
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1", symbol = ~species,
type = "scatter", mode = "markers")
penguins_scatter3
Here we have used the symbols cross
, diamond
and star
.
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1", symbol = ~species,
symbols = c("cross", "diamond", "star"),
type = "scatter", mode = "markers")
penguins_scatter3
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm,
color = ~sex, colors = "Set1", symbol = ~species,
symbols = c("cross", "diamond", "star"),
type = "scatter", mode = "markers",
marker = list(size = 8))
penguins_scatter3
Creating your own Plotly Scatter Plot
penguins_scatter_new <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
type = "scatter", mode = "markers")
penguins_scatter_new
penguins_scatter_new2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
color = ~island,
type = "scatter", mode = "markers")
penguins_scatter_new2
penguins_scatter_new3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
color = ~island, symbol = ~species,
type = "scatter", mode = "markers")
penguins_scatter_new3
penguins_scatter_new4 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
color = ~island, symbol = ~species,
symbols = c("cross", "diamond", "star"),
type = "scatter", mode = "markers",
marker = list(size=8))
penguins_scatter_new4
It does seem that penguins living on different islands have noticeably different body_mass_g
and bill_length_mm
measurements, but this is also due to the fact that some species of penguin only live on one of the three islands - e.g. Gentoo and Chinstrap penguins only live on Biscoe island and Dream island respectively, whereas the Adelie penguins live on all three islands.
However, we also note that the Adelie penguins living on Torgersen island are much smaller overall than Adelie penguins living on other islands.
Mixed Subplots
Recall from our [Week 1 Data Science Computer Lab] how we created some histograms for our palmerpenguins
data set.
Some of the code used for that lab is reproduced below:
penguin_hist <- plot_ly(data = penguins, x = ~body_mass_g,
color = ~island, type = "histogram", alpha = 0.6)
penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'),
barmode ="overlay")
penguin_hist
Suppose that we would like to present all our palmerpenguins
data visualisations together. We can do this using the subplot
function.
Take a look at the R code below:
penguin_combined_plots <- subplot(penguins_scatter3, penguin_hist,
nrows = 2, margin = 0.05)
penguin_combined_plots <- penguin_combined_plots %>%
layout(title = "Palmer Penguin Data",
xaxis = list(title = 'body_mass_g'),
yaxis = list(title = "flipper_length_mm"),
xaxis2 = list(title = 'body_mass_g'),
yaxis2 = list(title = "count"))
penguin_combined_plots
penguin_combined_plots_new <- subplot(penguins_scatter_new4, penguin_hist,
nrows = 2, margin = 0.05)
penguin_combined_plots_new <- penguin_combined_plots_new %>%
layout(title = "Palmer Penguin Species Data",
xaxis = list(title = 'body_mass_g'),
yaxis = list(title = "bill_length_mm"),
xaxis2 = list(title = 'body_mass_g'),
yaxis2 = list(title = "count"))
penguin_combined_plots_new
Note that this set of graphs is actually more informative than the previous subplots, since the colours for both graphs here align with the data being represented. It is always important to take such presentation possibilities into account when developing your subplots.
That’s everything covered.
References
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020.
Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.
https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020.
Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC.
https://plotly-r.com.
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License
BY-NC-ND.
