FIg 1: PalmerPenguins
The goal of this analysis is to generate insights on how the species of penguins differ from one another.
Data source: Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.The palmerpenguins package contains two datasets: penguins and penguins_raw. For this analysis the focus is on penguins dataset.
Data Limitation: Data obtained is outdated, hence, this could greatly affect the result of this analysis, a more recent data would provide a well rounded analysis.
palmerpenguins dataset was installed from CRAN with:
install.packages(“palmerpenguins”)
Setting up my environment NOTES: Loading needed packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.6 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(skimr)
library(ggplot2)
library(here)
## here() starts at C:/Users/USER/Documents
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggpubr)
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.1
previewing dataset using skim function
skim(penguins)
| Name | penguins |
| Number of rows | 344 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| factor | 3 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| species | 0 | 1.00 | FALSE | 3 | Ade: 152, Gen: 124, Chi: 68 |
| island | 0 | 1.00 | FALSE | 3 | Bis: 168, Dre: 124, Tor: 52 |
| sex | 11 | 0.97 | FALSE | 2 | mal: 168, fem: 165 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |
| bill_depth_mm | 2 | 0.99 | 17.15 | 1.97 | 13.1 | 15.60 | 17.30 | 18.7 | 21.5 | ▅▅▇▇▂ |
| flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |
| body_mass_g | 2 | 0.99 | 4201.75 | 801.95 | 2700.0 | 3550.00 | 4050.00 | 4750.0 | 6300.0 | ▃▇▆▃▂ |
| year | 0 | 1.00 | 2008.03 | 0.82 | 2007.0 | 2007.00 | 2008.00 | 2009.0 | 2009.0 | ▇▁▇▁▇ |
Using clean names function to clean dataset
penguins <- clean_names(penguins)
Data manipulation: filtering and
grouping dataset based on species, Calculating mean values for each
parameters
penguins %>% arrange(-body_mass_g)
View(penguins %>% arrange(-body_mass_g)
)
penguins %>% arrange(-body_mass_g) %>% filter(species=="Gentoo")
Gentoo_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(species=="Gentoo")
Gentoo_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
avg_body_mass_g = mean(body_mass_g))
penguins %>% arrange(-body_mass_g) %>% filter(species=="Adelie")
Adelie_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(species=="Adelie")
Adelie_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
avg_body_mass_g = mean(body_mass_g))
penguins %>% arrange(-body_mass_g) %>% filter(species=="Chinstrap")
Chinstrap_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(species=="Chinstrap")
Chinstrap_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
avg_body_mass_g = mean(body_mass_g))
Creating dataframe after finding mean parameters for each species
species <- c("Gentoo","Chinstrap","Adelie")
Avg_bill_length_mm <- c(47.6,48.8,38.8)
Avg_bill_depth_mm <- c(15,18.4,18.3)
Avg_flipper_length_mm <- c(217.2,196,190)
Avg_body_mass_g <- c(5092.4,3733,3706.2)
penguins_average <- data.frame(species=c("Gentoo","Chinstrap","Adelie"),Avg_bill_length_mm=c(47.6,48.8,38.8),
Avg_bill_depth_mm= c(15,18.4,18.3), Avg_flipper_length_mm = c(217.2,196,190),
Avg_body_mass_g=c(5092.4,3733,3706.2))
Fig 3: Penguine features
#Correlation between flipper_length_mm and body_mass_g
ggscatter(penguins, x = "flipper_length_mm", y = "body_mass_g",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",color="species",
title = "correlation between flipper_length_mm and body_mass_g" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).
#Correlation between flipper_length_mm and bill_length_mm
ggscatter(penguins, x = "flipper_length_mm", y = "bill_length_mm",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",color="species",
title = "correlation between flipper_length_mm and bill_length_mm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).
#Correlation between flipper_length_mm and bill_depth_mm
ggscatter(penguins, x = "flipper_length_mm", y = "bill_depth_mm",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",color="species",
title = "correlation between flipper_length_mm and bill_depth_mm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).
# Correlation between bill_length_mm and bill_depth_mm
ggscatter(penguins, x = "bill_length_mm", y = "bill_depth_mm",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",color="species",
title = "Correlation between bill_length_mm and bill_depth_mm" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).
# Correlation between bill_depth_mm and body_mass_g
ggscatter(penguins, x = "bill_depth_mm", y = "body_mass_g",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",color="species",
title = "Correlation between bill_depth_mm and body_mass_g" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).
# Bar chart comparing average parameter/characteristics of penguin species
# Average body mass of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_body_mass_g,color=species,
fill=species)) + coord_flip() + labs(title="Average body mass of penguin species")
# Average flipper length of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_flipper_length_mm,color=species,
fill=species)) + coord_flip() + labs(title="Average flipper length of penguin species")
# Average bill length of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_bill_length_mm,color=species,
fill=species)) + coord_flip() + labs(title=" Average bill length of penguin species")
# Average bill depth of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_bill_depth_mm,color=species,
fill=species)) + coord_flip() + labs(title="Average bill depth of penguin species")
Comparing penguin species based on sex
# Comparing penguin species based on sex
penguins %>% arrange(-body_mass_g) %>% filter(sex =="male")
Male_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(sex =="male")
Male_penguins %>% summarise(avg_body_mass_g = mean(body_mass_g),avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm))
penguins %>% arrange(-body_mass_g) %>% filter(sex =="female")
Female_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(sex =="female")
Female_penguins %>% summarise(avg_body_mass_g = mean(body_mass_g),avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm))
penguin_sex <- c("male","female")
Average_body_mass <- c(4546,3862)
Average_flipper_length <- c(205, 197)
Average_bill_length <- c(46,42 )
Average_bill_depth <- c(18,16 )
Sex_summary <- data.frame(penguin_sex = c("male","female"),
Average_body_mass = c(4546,3862),
Average_flipper_length = c(205, 197),
Average_bill_length = c(46,42 ),
Average_bill_depth = c(18,16 ))
# Bar chart comparing average penguins parameter/characteristics based on sex
#Average body mass of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_body_mass,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average body mass of male and female penguins")
#Average flipper length of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_flipper_length,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average flipper length of male and female penguins")
#Average bill length of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_bill_length,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average bill length of male and female penguins")
#Average bill depth of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_bill_depth,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average bill depth of male and female penguins")
NOTE: When interpreting correlation, it’s important to remember that just because two variables are correlated, it does not mean that one causes the other.
-> The Gentoo Penguin specie is the largest of the three species while Adelie and Chinstrap specie are not so different in terms of body size.
-> Flipper length varies directly and correlates positively with Body mass. This implies that there is a high tendency that body mass will increase as flipper length increases.
-> Flipper length varies directly and correlates positively with Bill length.
-> Flipper length correlates negatively with Bill depth.
-> Bill length correlates positively with body mass but has a weak negative correlation with Bill depth.
-> Gentoo specie have the longest Flipper length and shortest Bill depth.
-> Chinstrap specie have the longest Bill length while Adelie specie have the shortest Bill length.
-> Adelie and Chinstrap specie have about the same Bill depth.
-> Male Penguins are larger than Female Penguins.