Introduction

In the current activity, we are utilizing Palmer penguin dataset which has been generated by Dr. Kristen Gorman. This dataset consists information about different species of penguins found in the Palmer archipelago of Antarctica continent. First, we would glimpse through the dataset and then clean the data for any empty rows are present by removing those rows. Then, we would be drawing a plot showcasing the features of the dataset.

Adding required libraries

suppressPackageStartupMessages(library(tidyverse))
library(palmerpenguins)

Exploring Dataset

peng_df <- penguins
glimpse(peng_df)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
na_count <- data.frame(sapply(peng_df,function(y) sum(length(which(is.na(y))))))
print(na_count)
##                   sapply.peng_df..function.y..sum.length.which.is.na.y.....
## species                                                                   0
## island                                                                    0
## bill_length_mm                                                            2
## bill_depth_mm                                                             2
## flipper_length_mm                                                         2
## body_mass_g                                                               2
## sex                                                                      11
## year                                                                      0

As we can see, there are few columns which have missing values. Now we will remove those rows containing nas.

Cleaning Data

peng_df_clean <- peng_df %>% drop_na()
na_count1 <- data.frame(sapply(peng_df_clean,function(y) sum(length(which(is.na(y))))))
print(na_count1)
##                   sapply.peng_df_clean..function.y..sum.length.which.is.na.y.....
## species                                                                         0
## island                                                                          0
## bill_length_mm                                                                  0
## bill_depth_mm                                                                   0
## flipper_length_mm                                                               0
## body_mass_g                                                                     0
## sex                                                                             0
## year                                                                            0

Now, we have cleaned data and now we will draw plots depicting different dimensions

Plot showing correlation between flipper length and body mass

Plot showing differences based on bill length and depth

Conclusion

From the plots we can conclude that there’s a strong correlation between flipper length and body mass and Gentoos species are the largest and have the longest bills