ORDER OF ACTIVITY:

OBJECTIVES OF ANALYSIS

DATA PREPARATION

DATA PROCESSING

DATA ANALYSIS

FIg 1: PalmerPenguins

OBJECTIVES OF ANALYSIS

The goal of this analysis is to generate insights on how the species of penguins differ from one another.

DATA PREPARATION

Data source: Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.The palmerpenguins package contains two datasets: penguins and penguins_raw. For this analysis the focus is on penguins dataset.

Data Limitation: Data obtained is outdated, hence, this could greatly affect the result of this analysis, a more recent data would provide a well rounded analysis.

DATA PROCESSING

palmerpenguins dataset was installed from CRAN with:

install.packages(“palmerpenguins”)

Setting up my environment NOTES: Loading needed packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.6     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(skimr)
library(ggplot2)
library(here)
## here() starts at C:/Users/USER/Documents
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(ggpubr)
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.2.1

previewing dataset using skim function

skim(penguins)
Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

Using clean names function to clean dataset

penguins <- clean_names(penguins)

FIg 2: Showing Penguin Species Data manipulation: filtering and grouping dataset based on species, Calculating mean values for each parameters

penguins %>% arrange(-body_mass_g)
   View(penguins %>% arrange(-body_mass_g)
                                             )
      penguins %>% arrange(-body_mass_g) %>% filter(species=="Gentoo")
         Gentoo_penguins <-   penguins %>% arrange(-body_mass_g) %>% filter(species=="Gentoo")
          
            Gentoo_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
                                                        
               avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
                 
                  avg_body_mass_g = mean(body_mass_g))
                   penguins %>% arrange(-body_mass_g) %>% filter(species=="Adelie")
                     Adelie_penguins <-   penguins %>% arrange(-body_mass_g) %>% filter(species=="Adelie")
       
                       Adelie_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
                                                   
                          avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
                                                   
                              avg_body_mass_g = mean(body_mass_g))
                                penguins %>% arrange(-body_mass_g) %>% filter(species=="Chinstrap")
                                   Chinstrap_penguins <-   penguins %>% arrange(-body_mass_g) %>% filter(species=="Chinstrap")
                       
                                      Chinstrap_penguins %>% drop_na() %>% summarise(avg_bill_length_mm = mean(bill_length_mm),
                                                                   
                                         avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm),
                                                                   
                                            avg_body_mass_g = mean(body_mass_g))

Creating dataframe after finding mean parameters for each species

                                      species <- c("Gentoo","Chinstrap","Adelie") 
                                      Avg_bill_length_mm <- c(47.6,48.8,38.8)
                                      Avg_bill_depth_mm <- c(15,18.4,18.3)
                                      Avg_flipper_length_mm <- c(217.2,196,190)
                                      Avg_body_mass_g <- c(5092.4,3733,3706.2)   
  

penguins_average <- data.frame(species=c("Gentoo","Chinstrap","Adelie"),Avg_bill_length_mm=c(47.6,48.8,38.8),
Avg_bill_depth_mm= c(15,18.4,18.3), Avg_flipper_length_mm = c(217.2,196,190),
Avg_body_mass_g=c(5092.4,3733,3706.2))     

DATA ANALYSIS

Fig 3: Penguine features

#Correlation between flipper_length_mm and body_mass_g

ggscatter(penguins, x = "flipper_length_mm", y = "body_mass_g", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",color="species",
          title = "correlation between flipper_length_mm and body_mass_g" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

#Correlation between flipper_length_mm and bill_length_mm

ggscatter(penguins, x = "flipper_length_mm", y = "bill_length_mm", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",color="species",
          title = "correlation between flipper_length_mm and bill_length_mm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

#Correlation between flipper_length_mm and bill_depth_mm

ggscatter(penguins, x = "flipper_length_mm", y = "bill_depth_mm", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",color="species",
          title = "correlation between flipper_length_mm and bill_depth_mm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

# Correlation between bill_length_mm and bill_depth_mm

ggscatter(penguins, x = "bill_length_mm", y = "bill_depth_mm", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",color="species",
          title = "Correlation between bill_length_mm and bill_depth_mm" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

# Correlation between bill_depth_mm and body_mass_g

ggscatter(penguins, x = "bill_depth_mm", y = "body_mass_g", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "pearson",color="species",
          title = "Correlation between bill_depth_mm and body_mass_g" )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_cor).
## Warning: Removed 2 rows containing missing values (geom_point).

# Bar chart comparing average parameter/characteristics of penguin species

# Average body mass of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_body_mass_g,color=species,
   fill=species)) + coord_flip() + labs(title="Average body mass of penguin species")

# Average flipper length of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_flipper_length_mm,color=species,
     fill=species)) + coord_flip() + labs(title="Average flipper length of penguin species")

# Average bill length of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_bill_length_mm,color=species,
 fill=species)) + coord_flip() + labs(title=" Average bill length of penguin species")

# Average bill depth of penguin species
ggplot(penguins_average)+geom_col(aes(species,Avg_bill_depth_mm,color=species,
   fill=species)) + coord_flip() + labs(title="Average bill depth of penguin species")

Comparing penguin species based on sex

# Comparing penguin species based on sex

penguins %>% arrange(-body_mass_g) %>% filter(sex =="male")
Male_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(sex =="male")
Male_penguins %>% summarise(avg_body_mass_g = mean(body_mass_g),avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm))    
penguins %>% arrange(-body_mass_g) %>% filter(sex =="female")
Female_penguins <- penguins %>% arrange(-body_mass_g) %>% filter(sex =="female")
Female_penguins %>%  summarise(avg_body_mass_g = mean(body_mass_g),avg_bill_length_mm = mean(bill_length_mm),
avg_bill_depth_mm = mean(bill_depth_mm), avg_flipper_length_mm=mean(flipper_length_mm)) 
penguin_sex <- c("male","female")
Average_body_mass <- c(4546,3862)
Average_flipper_length  <- c(205, 197)
Average_bill_length  <- c(46,42 )
Average_bill_depth  <- c(18,16 )

Sex_summary <- data.frame(penguin_sex = c("male","female"),
                          Average_body_mass = c(4546,3862),
                          Average_flipper_length = c(205, 197),
                          Average_bill_length = c(46,42 ),
                          Average_bill_depth = c(18,16 ))


# Bar chart comparing average penguins parameter/characteristics based on sex

#Average body mass of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_body_mass,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average body mass of male and female penguins")

#Average flipper length of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_flipper_length,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average flipper length of male and female penguins")

#Average bill length of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_bill_length,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average bill length of male and female penguins")

#Average bill depth of male and female penguins
ggplot(Sex_summary)+geom_col(aes(penguin_sex,Average_bill_depth,color=penguin_sex,
fill=penguin_sex)) + coord_flip() + labs(title="Average bill depth of male and female penguins")

INSIGHTS

NOTE: When interpreting correlation, it’s important to remember that just because two variables are correlated, it does not mean that one causes the other.

-> The Gentoo Penguin specie is the largest of the three species while Adelie and Chinstrap specie are not so different in terms of body size.

-> Flipper length varies directly and correlates positively with Body mass. This implies that there is a high tendency that body mass will increase as flipper length increases.

-> Flipper length varies directly and correlates positively with Bill length.

-> Flipper length correlates negatively with Bill depth.

-> Bill length correlates positively with body mass but has a weak negative correlation with Bill depth.

-> Gentoo specie have the longest Flipper length and shortest Bill depth.

-> Chinstrap specie have the longest Bill length while Adelie specie have the shortest Bill length.

-> Adelie and Chinstrap specie have about the same Bill depth.

-> Male Penguins are larger than Female Penguins.