Extra credit

Author

Heleine Fouda

Research questions:

This preliminary analysis explores the possible relationship that may exist between the penguins’ bill depth and their length (Fig. 1); the penguins sex and their body mass (Fig. 2), and finally, between the penguins’ geographic location and their body mass ( Fig. 3).

Data overview

The present analysis uses the penguins data set from the palmerpenguins. palmerpenguins by (Horst, Hill, and Gorman 2020)is a penguin data set that presents size measurements, clutch observations, and blood isotope ratios for three types of penguins observed in the Palmer Archipelago near Palmer Station, Antartica. The data were collected from 2007 to 2009.

#load - packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(palmerpenguins)
library(gt)

Data exploration

The penguins data frame is a 344 * 8 table. It contains 8 variables listed as columns and 344 observations listed as rows. The variables are either factors,decimal or integers.

# glimpse-penguins
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
# str - penguins
str(penguins)
tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
# head -penguins
head(penguins)
# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>
# names - penguins
names(penguins)
[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"             
# unique
unique(penguins)
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Descriptive statistics

summary(penguins)
      species          island    bill_length_mm  bill_depth_mm  
 Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
 Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
 Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
                                 Mean   :43.92   Mean   :17.15  
                                 3rd Qu.:48.50   3rd Qu.:18.70  
                                 Max.   :59.60   Max.   :21.50  
                                 NA's   :2       NA's   :2      
 flipper_length_mm  body_mass_g       sex           year     
 Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
 1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
 Median :197.0     Median :4050   NA's  : 11   Median :2008  
 Mean   :200.9     Mean   :4202                Mean   :2008  
 3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
 Max.   :231.0     Max.   :6300                Max.   :2009  
 NA's   :2         NA's   :2                                 
#ratio male-female
ratio_male_female <- 168/165
print(ratio_male_female)
[1] 1.018182

Data visualization

The figure below is a scatterplot of species of penguins

# label: fig-bill - dims -species
# fig-width: 5
# fig-asp: 0.618
# fig-alt: |
# A scatterplot of penguins bill depth and length, colored by species of penguins. There is a relatively strong, linear association.
# fig-cap: A scatterplot of penguins bill depth and length, colored by species of penguins.

ggplot(data = penguins,
       aes(
         x = bill_length_mm, y = bill_depth_mm, color = species, shape = species)) +
  geom_point() +
  theme_classic() 
Warning: Removed 2 rows containing missing values (`geom_point()`).

labs(x = "Bill length", y = "Bill depth (mm)")
$x
[1] "Bill length"

$y
[1] "Bill depth (mm)"

attr(,"class")
[1] "labels"

The figure below is a scatterplot of the relationship between penguins’ sex and their body mass.

ggplot(data = penguins,
       aes(
         x = sex, y = body_mass_g, color = species, shape = species)) +
  geom_point() +
  theme_classic() 
Warning: Removed 2 rows containing missing values (`geom_point()`).

labs(x = "sex", y = "body_mass_g (g)")
$x
[1] "sex"

$y
[1] "body_mass_g (g)"

attr(,"class")
[1] "labels"
ggplot(data = penguins,
       aes(
         x = island, y = body_mass_g, color = species, shape = species)) +
  geom_point() +
  theme_classic() 
Warning: Removed 2 rows containing missing values (`geom_point()`).

labs(x = "island", y = " body_mass_g")
$x
[1] "island"

$y
[1] " body_mass_g"

attr(,"class")
[1] "labels"
ggplot(data = penguins,
       aes(
         x = species, y = body_mass_g, color = island)) +
  geom_point() +
  theme_classic() 
Warning: Removed 2 rows containing missing values (`geom_point()`).

labs(x = "species", y = " body_mass_g")
$x
[1] "species"

$y
[1] " body_mass_g"

attr(,"class")
[1] "labels"

Data transformation

# Transforming sex column into male-only-column
penguins %>% 
  filter (sex == "male")
# A tibble: 168 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.3          20.6               190        3650
 3 Adelie  Torgersen           39.2          19.6               195        4675
 4 Adelie  Torgersen           38.6          21.2               191        3800
 5 Adelie  Torgersen           34.6          21.1               198        4400
 6 Adelie  Torgersen           42.5          20.7               197        4500
 7 Adelie  Torgersen           46            21.5               194        4200
 8 Adelie  Biscoe              37.7          18.7               180        3600
 9 Adelie  Biscoe              38.2          18.1               185        3950
10 Adelie  Biscoe              38.8          17.2               180        3800
# ℹ 158 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# Transforming sex column into a female -only-column
penguins %>% 
  filter (sex == "female")
# A tibble: 165 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.5          17.4               186        3800
 2 Adelie  Torgersen           40.3          18                 195        3250
 3 Adelie  Torgersen           36.7          19.3               193        3450
 4 Adelie  Torgersen           38.9          17.8               181        3625
 5 Adelie  Torgersen           41.1          17.6               182        3200
 6 Adelie  Torgersen           36.6          17.8               185        3700
 7 Adelie  Torgersen           38.7          19                 195        3450
 8 Adelie  Torgersen           34.4          18.4               184        3325
 9 Adelie  Biscoe              37.8          18.3               174        3400
10 Adelie  Biscoe              35.9          19.2               189        3800
# ℹ 155 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# Adding a ratio column/variable
penguins %>% 
  filter (sex == "female"| sex == "male") %>% 
  mutate(ratio_male_to_female = ratio_male_female)
# A tibble: 333 × 9
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 323 more rows
# ℹ 3 more variables: sex <fct>, year <int>, ratio_male_to_female <dbl>

Creating a new table

# Adding a new ratio column to the original data set
penguins_new <- penguins %>% 
  filter (sex == "female"| sex == "male") %>% 
  mutate(ratio_male_to_female = ratio_male_female)
view(penguins_new)
penguins_new %>% 
  arrange(desc('bill_length_mm')) %>% 
  arrange(desc('body_mass_g')) %>% 
  arrange(desc('year')) 
# A tibble: 333 × 9
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 323 more rows
# ℹ 3 more variables: sex <fct>, year <int>, ratio_male_to_female <dbl>

The table below shows a much smaller penguins data set. The table contains only 15 rows and 5 renamed(i.,e.,capitalized) variables.

# Building a new and smaller penguins table
# label: tbl-penguins - top15
# tbl-cap: First 15 penguins
penguins_new %>% 
 slice_head(n = 15) %>% 
select(year, island, species, bill_length_mm, body_mass_g, ratio_male_to_female, -flipper_length_mm, -bill_depth_mm, -sex ) %>% 
  rename(Year = "year", Species = "species",Bill_length = "bill_length_mm", Body_mass= "body_mass_g", Island = "island", 
Ratio_male_female = "ratio_male_to_female") %>% 
  gt()
Year Island Species Bill_length Body_mass Ratio_male_female
2007 Torgersen Adelie 39.1 3750 1.018182
2007 Torgersen Adelie 39.5 3800 1.018182
2007 Torgersen Adelie 40.3 3250 1.018182
2007 Torgersen Adelie 36.7 3450 1.018182
2007 Torgersen Adelie 39.3 3650 1.018182
2007 Torgersen Adelie 38.9 3625 1.018182
2007 Torgersen Adelie 39.2 4675 1.018182
2007 Torgersen Adelie 41.1 3200 1.018182
2007 Torgersen Adelie 38.6 3800 1.018182
2007 Torgersen Adelie 34.6 4400 1.018182
2007 Torgersen Adelie 36.6 3700 1.018182
2007 Torgersen Adelie 38.7 3450 1.018182
2007 Torgersen Adelie 42.5 4500 1.018182
2007 Torgersen Adelie 34.4 3325 1.018182
2007 Torgersen Adelie 46.0 4200 1.018182

References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. “Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.” https://doi.org/10.5281/zenodo.3960218.