In the current activity, we are utilizing Palmer penguin dataset which has been generated by Dr. Kristen Gorman. This dataset consists information about different species of penguins found in the Palmer archipelago of Antarctica continent. First, we would glimpse through the dataset and then clean the data for any empty rows are present by removing those rows. Then, we would be drawing a plot showcasing the features of the dataset.
suppressPackageStartupMessages(library(tidyverse))
library(palmerpenguins)
peng_df <- penguins
glimpse(peng_df)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
na_count <- data.frame(sapply(peng_df,function(y) sum(length(which(is.na(y))))))
print(na_count)
## sapply.peng_df..function.y..sum.length.which.is.na.y.....
## species 0
## island 0
## bill_length_mm 2
## bill_depth_mm 2
## flipper_length_mm 2
## body_mass_g 2
## sex 11
## year 0
As we can see, there are few columns which have missing values. Now we will remove those rows containing nas.
peng_df_clean <- peng_df %>% drop_na()
na_count1 <- data.frame(sapply(peng_df_clean,function(y) sum(length(which(is.na(y))))))
print(na_count1)
## sapply.peng_df_clean..function.y..sum.length.which.is.na.y.....
## species 0
## island 0
## bill_length_mm 0
## bill_depth_mm 0
## flipper_length_mm 0
## body_mass_g 0
## sex 0
## year 0
Now, we have cleaned data and now we will draw plots depicting different dimensions
From the plots we can conclude that there’s a strong correlation between flipper length and body mass and Gentoos species are the largest and have the longest bills