Show the code
# load the tidyverse
library(tidyverse)
library(patchwork)
library(ggthemes)
This Quarto document serves as a practical illustration of the concepts covered in the productive workflow online course. It’s designed primarily for educational purposes, so the focus is on demonstrating Quarto techniques rather than on the rigor of its scientific content. The callout reference page is here.
This is the case study to make an html report by quarto. You can read more about the penguin dataset here.
Let’s load libraries before we start!
# load the tidyverse
library(tidyverse)
library(patchwork)
library(ggthemes)
The dataset has already been loaded and cleaned in the previous step of this pipeline.
Let’s load the clean version, together with a few functions available in functions.R
!
# Source functions
source(file = "functions.R")
# Read the clean dataset
<- readRDS(file = "../input/clean_data.rds") data
Now, let’s make some descriptive analysis, including summary statistics and graphs.
What’s striking is the slightly negative relationship between bill length
and bill depth
:
%>%
data ggplot(
aes(x = bill_length_mm, y = bill_depth_mm)
+
) geom_point(color="#69b3a2") +
labs(
x = "Bill Length (mm)",
y = "Bill Depth (mm)",
title = paste("Surprising relationship?")
+
) theme_classic()
It is also interesting to note that bill length
a and bill depth
are quite different from one specie to another. The average of a variable can be computed as follow:
\[Avg = \frac{1}{n} \displaystyle\sum_{i=1}^{n} a_i = \frac{a_1+a_2+\dots+a_n}{n}\]
bill length
and bill depth
averages are summarized in the 2 tables below.
# Calculating mean bill length for different species and islands using dplyr
%>%
data # filter(species == "Adelie") %>%
group_by(species) %>%
summarize(mean_bill_length = round(mean(as.numeric(bill_length_mm), na.rm = TRUE), 2))
# Calculating average bill depth for different species and islands using dplyr
%>%
data # filter(species == "Adelie") %>%
group_by(species) %>%
summarize(mean_bill_depth = round(mean(as.numeric(bill_depth_mm), na.rm = TRUE), 2))
<-
adelie_avg %>%
datafilter(species == "Adelie") %>%
# group_by(species) %>%
summarize(mean_bill_length = round(mean(as.numeric(bill_length_mm), na.rm = TRUE), 2))
# A tibble: 3 × 2
species mean_bill_length
<chr> <dbl>
1 Adelie 38.8
2 Chinstrap 48.8
3 Gentoo 47.5
# A tibble: 3 × 2
species mean_bill_depth
<chr> <dbl>
1 Adelie 18.3
2 Chinstrap 18.4
3 Gentoo 15.0
For instance, the average bill length for the specie Adelie
is 38.81
Now, let’s check the relationship between bill depth and bill length for the specie Adelie
on the island Torgersen
:
#
# Use the function in functions.R
<- create_scatterplot(data, "Adelie", "#6689c6")
p1 <- create_scatterplot(data, "Chinstrap", "#e85252")
p2 <- create_scatterplot(data, "Gentoo", "#9a6fb0")
p3
+p2+p3 p1