Ask

Penguins are mostly any of the several flightless sea birds, of order Sphenisciformes, usually found in the Southern hemisphere, marked by their usual upright stance, walking on short legs, and generally their stark black and white pulmage.

The study task

The task is to analyze and show the trends on behavioral patterns, body sizes, flipper lengths of the sea birds called penguins, with respect to its species and the respective islands through which the data prospects were conducted.

Prepare

The repository to this Palmer Archipelago (Antarctica) penguin data was kaggle. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. The data were collected from the year 2007-2009

  1. Data Source: The data is a public dataset made accessible to everyone.
  2. Data Organization: This data is organized into two files, with each mainly consisting information about species, island, body mass, flipper length, sex, clutch completion et al. in the files, as its columns respectively.
  3. Data Integrity: As this data is concerned, even though in formation about all the known three species where collected. The sample size representing the whole population is not up to 30. This palmerpenguins dataset only contains two files, and hence the dataset could be biased. And prior to the year the data was collected, likely to be true there must have been additional penguins species in the respective islands through which the information was gathered.

Process

For this analysis, each file was checked up with Excel if there exists duplicates, leading, trailing and repeated spaces in the data. Hence, there was none to be detected.

Setting my environment in R, to get the summaries of the dataset files respectively.

Setting my R environment by loading the ‘tidyverse’ and other packages useful for the analysis and visualizations.

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## Warning: package 'purrr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Rows: 344 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): studyName, Species, Region, Island, Stage, Individual ID, Clutch C...
## dbl  (7): Sample Number, Culmen Length (mm), Culmen Depth (mm), Flipper Leng...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 344 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (4): culmen_length_mm, culmen_depth_mm, flipper_length_mm, body_mass_g
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Analyze

Getting to know the data

## spec_tbl_df [344 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ studyName          : chr [1:344] "PAL0708" "PAL0708" "PAL0708" "PAL0708" ...
##  $ Sample Number      : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Species            : chr [1:344] "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" ...
##  $ Region             : chr [1:344] "Anvers" "Anvers" "Anvers" "Anvers" ...
##  $ Island             : chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
##  $ Stage              : chr [1:344] "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" ...
##  $ Individual ID      : chr [1:344] "N1A1" "N1A2" "N2A1" "N2A2" ...
##  $ Clutch Completion  : chr [1:344] "Yes" "Yes" "Yes" "Yes" ...
##  $ Date Egg           : chr [1:344] "11/11/07" "11/11/07" "11/16/07" "11/16/07" ...
##  $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ Culmen Depth (mm)  : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ Body Mass (g)      : num [1:344] 3750 3800 3250 NA 3450 ...
##  $ Sex                : chr [1:344] "MALE" "FEMALE" "FEMALE" NA ...
##  $ Delta 15 N (o/oo)  : num [1:344] NA 8.95 8.37 NA 8.77 ...
##  $ Delta 13 C (o/oo)  : num [1:344] NA -24.7 -25.3 NA -25.3 ...
##  $ Comments           : chr [1:344] "Not enough blood for isotopes." NA NA "Adult not sampled." ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   studyName = col_character(),
##   ..   `Sample Number` = col_double(),
##   ..   Species = col_character(),
##   ..   Region = col_character(),
##   ..   Island = col_character(),
##   ..   Stage = col_character(),
##   ..   `Individual ID` = col_character(),
##   ..   `Clutch Completion` = col_character(),
##   ..   `Date Egg` = col_character(),
##   ..   `Culmen Length (mm)` = col_double(),
##   ..   `Culmen Depth (mm)` = col_double(),
##   ..   `Flipper Length (mm)` = col_double(),
##   ..   `Body Mass (g)` = col_double(),
##   ..   Sex = col_character(),
##   ..   `Delta 15 N (o/oo)` = col_double(),
##   ..   `Delta 13 C (o/oo)` = col_double(),
##   ..   Comments = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
##  [1] "studyName"           "Sample Number"       "Species"            
##  [4] "Region"              "Island"              "Stage"              
##  [7] "Individual ID"       "Clutch Completion"   "Date Egg"           
## [10] "Culmen Length (mm)"  "Culmen Depth (mm)"   "Flipper Length (mm)"
## [13] "Body Mass (g)"       "Sex"                 "Delta 15 N (o/oo)"  
## [16] "Delta 13 C (o/oo)"   "Comments"

penguins_iter file has 344 rows and 17 columns, with the column names StudyName, Sample Number, Species, Region, Island, etc.

## spec_tbl_df [344 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ species          : chr [1:344] "Adelie" "Adelie" "Adelie" "Adelie" ...
##  $ island           : chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
##  $ culmen_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ culmen_depth_mm  : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : num [1:344] 3750 3800 3250 NA 3450 ...
##  $ sex              : chr [1:344] "MALE" "FEMALE" "FEMALE" NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   species = col_character(),
##   ..   island = col_character(),
##   ..   culmen_length_mm = col_double(),
##   ..   culmen_depth_mm = col_double(),
##   ..   flipper_length_mm = col_double(),
##   ..   body_mass_g = col_double(),
##   ..   sex = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
## [1] "species"           "island"            "culmen_length_mm" 
## [4] "culmen_depth_mm"   "flipper_length_mm" "body_mass_g"      
## [7] "sex"

Penguins_size file has 344 rows and 7 columns, with column names Species, Island, flipper_length_mm, body_mass_g and sex

## # A tibble: 344 × 17
##    studyName Sampl…¹ Species Region Island Stage Indiv…² Clutc…³ Date …⁴ Culme…⁵
##    <chr>       <dbl> <chr>   <chr>  <chr>  <chr> <chr>   <chr>   <chr>     <dbl>
##  1 PAL0708        29 Adelie… Anvers Biscoe Adul… N18A1   No      11/10/…    37.9
##  2 PAL0708        30 Adelie… Anvers Biscoe Adul… N18A2   No      11/10/…    40.5
##  3 PAL0708        21 Adelie… Anvers Biscoe Adul… N11A1   Yes     11/12/…    37.8
##  4 PAL0708        22 Adelie… Anvers Biscoe Adul… N11A2   Yes     11/12/…    37.7
##  5 PAL0708        23 Adelie… Anvers Biscoe Adul… N12A1   Yes     11/12/…    35.9
##  6 PAL0708        24 Adelie… Anvers Biscoe Adul… N12A2   Yes     11/12/…    38.2
##  7 PAL0708        25 Adelie… Anvers Biscoe Adul… N13A1   Yes     11/10/…    38.8
##  8 PAL0708        26 Adelie… Anvers Biscoe Adul… N13A2   Yes     11/10/…    35.3
##  9 PAL0708        27 Adelie… Anvers Biscoe Adul… N17A1   Yes     11/12/…    40.6
## 10 PAL0708        28 Adelie… Anvers Biscoe Adul… N17A2   Yes     11/12/…    40.5
## # … with 334 more rows, 7 more variables: `Culmen Depth (mm)` <dbl>,
## #   `Flipper Length (mm)` <dbl>, `Body Mass (g)` <dbl>, Sex <chr>,
## #   `Delta 15 N (o/oo)` <dbl>, `Delta 13 C (o/oo)` <dbl>, Comments <chr>, and
## #   abbreviated variable names ¹​`Sample Number`, ²​`Individual ID`,
## #   ³​`Clutch Completion`, ⁴​`Date Egg`, ⁵​`Culmen Length (mm)`
## `summarise()` has grouped output by 'Species', 'Island', 'Clutch Completion'.
## You can override using the `.groups` argument.
## # A tibble: 13 × 3
## # Groups:   Species, Island, Clutch Completion [3]
##    Species                             Island    `Clutch Completion`
##    <chr>                               <chr>     <chr>              
##  1 Adelie Penguin (Pygoscelis adeliae) Biscoe    No                 
##  2 Adelie Penguin (Pygoscelis adeliae) Biscoe    No                 
##  3 Adelie Penguin (Pygoscelis adeliae) Dream     No                 
##  4 Adelie Penguin (Pygoscelis adeliae) Dream     No                 
##  5 Adelie Penguin (Pygoscelis adeliae) Dream     No                 
##  6 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
##  7 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
##  8 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
##  9 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
## 10 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
## 11 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
## 12 Adelie Penguin (Pygoscelis adeliae) Torgersen No                 
## 13 Adelie Penguin (Pygoscelis adeliae) Torgersen No

Species vs Island

Here, i want to determine the island that has the largest population of penguins species

## # A tibble: 344 × 2
##    Species                             Island   
##    <chr>                               <chr>    
##  1 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  2 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  3 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  4 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  5 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  6 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  7 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  8 Adelie Penguin (Pygoscelis adeliae) Torgersen
##  9 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 10 Adelie Penguin (Pygoscelis adeliae) Torgersen
## # … with 334 more rows

Clutching Pattern

A clutch is the total number of eggs a bird lays per each nesting attempt. Clutch sizes differ, and some birds have more than one nesting attempt per year. Here, we’re going to look at each of the penguins specie relatively with respect to its clutch completion pattern

## # A tibble: 344 × 2
##    Species                             `Clutch Completion`
##    <chr>                               <chr>              
##  1 Adelie Penguin (Pygoscelis adeliae) Yes                
##  2 Adelie Penguin (Pygoscelis adeliae) Yes                
##  3 Adelie Penguin (Pygoscelis adeliae) Yes                
##  4 Adelie Penguin (Pygoscelis adeliae) Yes                
##  5 Adelie Penguin (Pygoscelis adeliae) Yes                
##  6 Adelie Penguin (Pygoscelis adeliae) Yes                
##  7 Adelie Penguin (Pygoscelis adeliae) No                 
##  8 Adelie Penguin (Pygoscelis adeliae) No                 
##  9 Adelie Penguin (Pygoscelis adeliae) Yes                
## 10 Adelie Penguin (Pygoscelis adeliae) Yes                
## # … with 334 more rows
## # A tibble: 344 × 7
##    species island    culmen_length_mm culmen_depth_mm flipper_le…¹ body_…² sex  
##    <chr>   <chr>                <dbl>           <dbl>        <dbl>   <dbl> <chr>
##  1 Adelie  Biscoe                37.9            18.6          172    3150 FEMA…
##  2 Adelie  Biscoe                37.8            18.3          174    3400 FEMA…
##  3 Adelie  Torgersen             40.2            17            176    3450 FEMA…
##  4 Adelie  Dream                 33.1            16.1          178    2900 FEMA…
##  5 Adelie  Dream                 39.5            16.7          178    3250 FEMA…
##  6 Adelie  Dream                 37.2            18.1          178    3900 MALE 
##  7 Adelie  Dream                 37.5            18.9          179    2975 <NA> 
##  8 Adelie  Dream                 42.2            18.5          180    3550 FEMA…
##  9 Adelie  Biscoe                37.7            18.7          180    3600 MALE 
## 10 Adelie  Torgersen             37.8            17.3          180    3700 <NA> 
## # … with 334 more rows, and abbreviated variable names ¹​flipper_length_mm,
## #   ²​body_mass_g
## `summarise()` has grouped output by 'species', 'flipper_length_mm',
## 'body_mass_g', 'sex'. You can override using the `.groups` argument.
## # A tibble: 334 × 4
## # Groups:   species, flipper_length_mm, body_mass_g, sex [313]
##    species flipper_length_mm body_mass_g sex   
##    <chr>               <dbl>       <dbl> <chr> 
##  1 Adelie                172        3150 FEMALE
##  2 Adelie                174        3400 FEMALE
##  3 Adelie                176        3450 FEMALE
##  4 Adelie                178        2900 FEMALE
##  5 Adelie                178        3250 FEMALE
##  6 Adelie                178        3900 MALE  
##  7 Adelie                180        3550 FEMALE
##  8 Adelie                180        3600 MALE  
##  9 Adelie                180        3800 MALE  
## 10 Adelie                180        3950 MALE  
## # … with 324 more rows

Species (Flipper Length and Body Mass) Pattern

Correlating penguins species w.r.t their flipper length and body mass respectively

## # A tibble: 344 × 3
##    species flipper_length_mm body_mass_g
##    <chr>               <dbl>       <dbl>
##  1 Adelie                181        3750
##  2 Adelie                186        3800
##  3 Adelie                195        3250
##  4 Adelie                 NA          NA
##  5 Adelie                193        3450
##  6 Adelie                190        3650
##  7 Adelie                181        3625
##  8 Adelie                195        4675
##  9 Adelie                193        3475
## 10 Adelie                190        4250
## # … with 334 more rows

Before i proceed to work on the bodymass measured in gram, i need to do a little mathematical conversion, from gram to kilogram

## # A tibble: 344 × 8
##    species island    culmen_length_mm culmen_dep…¹ flipp…² body_…³ sex   body_…⁴
##    <chr>   <chr>                <dbl>        <dbl>   <dbl>   <dbl> <chr>   <dbl>
##  1 Adelie  Torgersen             39.1         18.7     181    3750 MALE     3.75
##  2 Adelie  Torgersen             39.5         17.4     186    3800 FEMA…    3.8 
##  3 Adelie  Torgersen             40.3         18       195    3250 FEMA…    3.25
##  4 Adelie  Torgersen             NA           NA        NA      NA <NA>    NA   
##  5 Adelie  Torgersen             36.7         19.3     193    3450 FEMA…    3.45
##  6 Adelie  Torgersen             39.3         20.6     190    3650 MALE     3.65
##  7 Adelie  Torgersen             38.9         17.8     181    3625 FEMA…    3.62
##  8 Adelie  Torgersen             39.2         19.6     195    4675 MALE     4.68
##  9 Adelie  Torgersen             34.1         18.1     193    3475 <NA>     3.48
## 10 Adelie  Torgersen             42           20.2     190    4250 <NA>     4.25
## # … with 334 more rows, and abbreviated variable names ¹​culmen_depth_mm,
## #   ²​flipper_length_mm, ³​body_mass_g, ⁴​body_mass_kg

Sharing of data through the art of Visualizations

The plot above shows the clutch completion pattern with respect to each species of penguins in the study area. However, it can be deduced accordingly that Adelie Penguin Specie happens to be the one that exhibits highest number of clutch completion, with both the “Yes” and “No” indications. The second highest is the Gentoo specie, with its highest number of “Yes” and the lowest of “No” clutch completion in the species. The Chinstrap specie has its lowest of “Yes” and medium of “No” clutch completion.

The plot above shows the three Islands the data was collected with respect to their clutch completion. However, Biscoe island has highest of “Yes” and medium of “No” clutch completion. Dream Island has medium of “Yes” and highest of “No” clutch completion. Torgersen Island has the lowest of both “Yes” and “No” clutch completion.

## Warning: Removed 2 rows containing missing values (geom_point).

The plot above shows the relationship between the body mass and the flipper length with respect to the three species of penguins recorded. However, Gentoo specie has the largest body mass as well as the longest flipper length. The Chinstrap specie has relatively the medium or larger of body mass as well as the longer flipper length, even though smaller population of its specie were collected in the data. And the Adelie specie has relatively the lowest or large of the body mass and the lowest of long flipper length.

## Warning: Removed 2 rows containing missing values (geom_point).

The plot above also shows the relationship between the body mass and the flipper length with respect to the sex of all the three species of penguins recorded in this data. However, of all the species, the “Male” penguins have larger body mass as well as longer flipper length than the “Female” species. Even though there are some penguins that the information about their sex were not collected or documented.

Act (Recommendation)

Penguins are mostly any of the several flightless sea birds, of order Sphenisciformes, usually found in the Southern hemisphere, marked by their usual upright stance, walking on short legs, and (generally) their stark black and white pulmage. For this analysis, i will only recommend that more data on PalmerPenguins should be collected, and if possible the study prospects should be directed to more than just three Islands. And also, for us to be rest assured of the data not been biased, the sample size of the data to be collected should be extended to a minimum of 30, as stipulated by the Data Analysts.