Data Science Module

Topic 2B: Data Visualisation II


Example R code solutions for the Data Science Computer Lab 2, which uses data from Horst, Hill, and Gorman (2020), and the plotly (Sievert 2020) R package, are presented below.


1 Palmer Penguins Data Set

# Install package
install.packages("palmerpenguins")
# This code loads the `palmerpenguins` package into your current R working environment.
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.2
# This code summarises the data in the `palmerpenguins` package.
summary(penguins)
##       species          island    bill_length_mm  bill_depth_mm  
##  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
##  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
##                                  Mean   :43.92   Mean   :17.15  
##                                  3rd Qu.:48.50   3rd Qu.:18.70  
##                                  Max.   :59.60   Max.   :21.50  
##                                  NA's   :2       NA's   :2      
##  flipper_length_mm  body_mass_g       sex           year     
##  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :197.0     Median :4050   NA's  : 11   Median :2008  
##  Mean   :200.9     Mean   :4202                Mean   :2008  
##  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :231.0     Max.   :6300                Max.   :2009  
##  NA's   :2         NA's   :2

2 Plotly Scatter Plots

2.1

# Install package
install.packages("plotly")
# Load package
library(plotly)

2.2

penguins_scatter <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm)
penguins_scatter

2.3

No answer required.

2.4

penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex)
penguins_scatter2

2.5

penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = c("cyan", "orange"))
penguins_scatter_colours

2.6

penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = "Set2")
penguins_scatter_colours

2.7

penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "markers")
penguins_scatter2
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "lines")
penguins_scatter2

Note that here, R is drawing a line between the individual data points - clearly we don’t want this!

2.8

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             type = "scatter", mode = "markers")
penguins_scatter3

2.9

Here we have used the symbols cross, diamond and star.

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers")
penguins_scatter3

2.10

penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers",
                             marker = list(size = 8))
penguins_scatter3

3 Creating your own Plotly Scatter Plot

3.1

penguins_scatter_new <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                type = "scatter", mode = "markers")
penguins_scatter_new

3.2

penguins_scatter_new2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island,
                                type = "scatter", mode = "markers")
penguins_scatter_new2

3.3

penguins_scatter_new3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island, symbol = ~species,
                                type = "scatter", mode = "markers")
penguins_scatter_new3

3.4

penguins_scatter_new4 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island, symbol = ~species, 
                                symbols = c("cross", "diamond", "star"),
                                type = "scatter", mode = "markers",
                                marker = list(size=8))
penguins_scatter_new4

3.5

It does seem that penguins living on different islands have noticeably different body_mass_g and bill_length_mm measurements, but this is also due to the fact that some species of penguin only live on one of the three islands - e.g. Gentoo and Chinstrap penguins only live on Biscoe island and Dream island respectively, whereas the Adelie penguins live on all three islands.

However, we also note that the Adelie penguins living on Torgersen island are much smaller overall than Adelie penguins living on other islands.

4 Mixed Subplots

Recall from our [Week 1 Data Science Computer Lab] how we created some histograms for our palmerpenguins data set. Some of the code used for that lab is reproduced below:

penguin_hist <- plot_ly(data = penguins, x = ~body_mass_g, 
                        color = ~island, type = "histogram", alpha = 0.6)

penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'), 
                                        barmode ="overlay")
penguin_hist

Suppose that we would like to present all our palmerpenguins data visualisations together. We can do this using the subplot function.

4.1

Take a look at the R code below:

penguin_combined_plots <- subplot(penguins_scatter3, penguin_hist, 
                                  nrows = 2, margin = 0.05) 
penguin_combined_plots <- penguin_combined_plots %>% 
                          layout(title = "Palmer Penguin Data",
                                 xaxis = list(title = 'body_mass_g'), 
                                 yaxis = list(title = "flipper_length_mm"),
                                 xaxis2 = list(title = 'body_mass_g'), 
                                 yaxis2 = list(title = "count"))
penguin_combined_plots

4.2

penguin_combined_plots_new <- subplot(penguins_scatter_new4, penguin_hist, 
                                  nrows = 2, margin = 0.05) 
penguin_combined_plots_new <- penguin_combined_plots_new %>% 
                          layout(title = "Palmer Penguin Species Data",
                                 xaxis = list(title = 'body_mass_g'), 
                                 yaxis = list(title = "bill_length_mm"),
                                 xaxis2 = list(title = 'body_mass_g'), 
                                 yaxis2 = list(title = "count"))
penguin_combined_plots_new

Note that this set of graphs is actually more informative than the previous subplots, since the colours for both graphs here align with the data being represented. It is always important to take such presentation possibilities into account when developing your subplots.


That’s everything covered.


References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

---
title: "STM1001: Computer Lab 2B Solutions"
output:
  bookdown::html_document2: 
    toc: true
    toc_float: true
    code_download: true
    theme: readable
    code_folding: show
bibliography: STM1001_DS_CL_references.bib 
link-citations: yes
---

<style>
#TOC {
  background: url("https://www.latrobe.edu.au/_media/la-trobe-api/v5/img/logo.svg");
  background-size: contain;
  padding-top: 80px !important;
  background-repeat: no-repeat;
}
</style>

### Data Science Module {-}

### Topic 2B: Data Visualisation II {-}

<br>

Example R code solutions for the [Data Science Computer Lab 2](https://rpubs.com/LTU_STM1001/DSMCL2_S), which uses data from @penguins, and the `plotly` [@plotly] R package, are presented below.

<br>

# Palmer Penguins Data Set {#penguins}

```{r, include = F}
install.packages("palmerpenguins", repos = "http://cran.us.r-project.org") # Install package
install.packages("plotly", repos = "http://cran.us.r-project.org") # Install package
```

```{r class.source = "fold-show", eval = F, echo = T}
# Install package
install.packages("palmerpenguins")
```

```{r class.source = "fold-show", eval = T, echo = T}
# This code loads the `palmerpenguins` package into your current R working environment.
library(palmerpenguins)
# This code summarises the data in the `palmerpenguins` package.
summary(penguins)
```

# Plotly Scatter Plots {#scatter} 

## 


```{r class.source = "fold-show", eval = F, echo = T}
# Install package
install.packages("plotly")
```

```{r class.source = "fold-show", eval = T, include = F}
# Load package
library(plotly)
```

```{r class.source = "fold-show", eval = F, echo = T}
# Load package
library(plotly)
```

## {#simplescatter}

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm)
penguins_scatter
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F}
penguins_scatter <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, type = "scatter", mode = "markers")
suppressMessages(penguins_scatter)
```

##

No answer required.

## {#scattercolour}

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex)
penguins_scatter2
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, fig.align = "center"}
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, color = ~sex, 
                             type = "scatter", mode = "markers")
suppressMessages(penguins_scatter2)
```

## {#scattercolours}

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = c("cyan", "orange"))
penguins_scatter_colours
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, fig.align = "center"}
penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = c("cyan", "orange"),
                                    type = "scatter", mode = "markers")
penguins_scatter_colours
```

##

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = "Set2")
penguins_scatter_colours
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, fig.align = "center"}
penguins_scatter_colours <- plot_ly(data = penguins, 
                                    x = ~body_mass_g, y = ~flipper_length_mm, 
                                    color = ~sex, colors = "Set3",
                                    type = "scatter", mode = "markers")
penguins_scatter_colours
```

##

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "markers")
penguins_scatter2
```

```{r class.source = "fold-show", eval = T, echo = T}
penguins_scatter2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1",
                             type = "scatter", mode = "lines")
penguins_scatter2
```

Note that here, R is drawing a line between the individual data points - clearly we don't want this!

## {#scattersymbol}

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             type = "scatter", mode = "markers")
penguins_scatter3
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, message = F, fig.align = "center"}
penguins_scatter3 <- plot_ly(data = remove_missing(penguins), x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             type = "scatter", mode = "markers")

penguins_scatter3
```

##

Here we have used the symbols `cross`, `diamond` and `star`.

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers")
penguins_scatter3
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, message = F, fig.align = "center"}
penguins_scatter3 <- plot_ly(data = remove_missing(penguins), x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers")

penguins_scatter3
```

##

```{r class.source = "fold-show", eval = F, echo = T}
penguins_scatter3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species,
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers",
                             marker = list(size = 8))
penguins_scatter3
```

```{r class.source = "fold-show", eval = T, echo = F, warning = F, message = F, fig.align = "center"}
penguins_scatter3 <- plot_ly(data = remove_missing(penguins), x = ~body_mass_g, y = ~flipper_length_mm, 
                             color = ~sex, colors = "Set1", symbol = ~species, 
                             symbols = c("cross", "diamond", "star"),
                             type = "scatter", mode = "markers",
                             marker = list(size = 8))

penguins_scatter3
```

# Creating your own Plotly Scatter Plot {#scatterpersonal}

##

```{r class.source = "fold-show", eval = T, echo = T, message = F, warning = F, fig.align = "center"}
penguins_scatter_new <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                type = "scatter", mode = "markers")
penguins_scatter_new
```

##

```{r class.source = "fold-show", eval = T, echo = T, message = F, warning = F, fig.align = "center"}
penguins_scatter_new2 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island,
                                type = "scatter", mode = "markers")
penguins_scatter_new2
```

##

```{r class.source = "fold-show", eval = T, echo = T, message = F, warning = F, fig.align = "center"}
penguins_scatter_new3 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island, symbol = ~species,
                                type = "scatter", mode = "markers")
penguins_scatter_new3
```

##

```{r class.source = "fold-show", eval = T, echo = T, message = F, warning = F, fig.align = "center"}
penguins_scatter_new4 <- plot_ly(data = penguins, x = ~body_mass_g, y = ~bill_length_mm,
                                color = ~island, symbol = ~species, 
                                symbols = c("cross", "diamond", "star"),
                                type = "scatter", mode = "markers",
                                marker = list(size=8))
penguins_scatter_new4
```

##

It does seem that penguins living on different islands have noticeably different `body_mass_g` and `bill_length_mm` measurements, but this is also due to the fact that some species of penguin only live on one of the three islands - e.g. Gentoo and Chinstrap penguins only live on Biscoe island and Dream island respectively, whereas the Adelie penguins live on all three islands.

However, we also note that the Adelie penguins living on Torgersen island are much smaller overall than Adelie penguins living on other islands.

# Mixed Subplots {#subplots}

Recall from our [Week 1 Data Science Computer Lab] how we created some histograms for our `palmerpenguins` data set.
Some of the code used for that lab is reproduced below:

```{r class.source = "fold-show", eval = T, echo = T, warning = F, message = F, fig.align = "center"}
penguin_hist <- plot_ly(data = penguins, x = ~body_mass_g, 
                        color = ~island, type = "histogram", alpha = 0.6)

penguin_hist <- penguin_hist %>% layout(yaxis = list(title = 'count'), 
                                        barmode ="overlay")
penguin_hist
```

Suppose that we would like to present all our `palmerpenguins` data visualisations together. We can do this using the `subplot` function.

## {#subplotwalkthrough}

Take a look at the R code below:

```{r class.source = "fold-show", eval = T, echo = T, warning = F, message = F}
penguin_combined_plots <- subplot(penguins_scatter3, penguin_hist, 
                                  nrows = 2, margin = 0.05) 
penguin_combined_plots <- penguin_combined_plots %>% 
                          layout(title = "Palmer Penguin Data",
                                 xaxis = list(title = 'body_mass_g'), 
                                 yaxis = list(title = "flipper_length_mm"),
                                 xaxis2 = list(title = 'body_mass_g'), 
                                 yaxis2 = list(title = "count"))
```


```{r class.source = "fold-show", eval = T, echo = T, warning = F, message = F, fig.dim = c(10, 8), fig.align = "center"}
penguin_combined_plots
```


## 

```{r class.source = "fold-show", eval = T, echo = T, warning = F, message = F}
penguin_combined_plots_new <- subplot(penguins_scatter_new4, penguin_hist, 
                                  nrows = 2, margin = 0.05) 
penguin_combined_plots_new <- penguin_combined_plots_new %>% 
                          layout(title = "Palmer Penguin Species Data",
                                 xaxis = list(title = 'body_mass_g'), 
                                 yaxis = list(title = "bill_length_mm"),
                                 xaxis2 = list(title = 'body_mass_g'), 
                                 yaxis2 = list(title = "count"))
```

```{r class.source = "fold-show", eval = T, echo = T, warning = F, message = F, fig.dim = c(10, 8), fig.align = "center"}
penguin_combined_plots_new
```

Note that this set of graphs is actually more informative than the previous subplots, since the colours for both graphs here align with the data being represented. It is always important to take such presentation possibilities into account when developing your subplots.

<br>

#### That's everything covered. #### {-}

<br>

# References {- #Ref}
<div id="refs"></div>

<br>

<font color = "grey">
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/CC" target="_blank"> BY-NC-ND. </a>
</font>