Penguins are mostly any of the several flightless sea birds, of order Sphenisciformes, usually found in the Southern hemisphere, marked by their usual upright stance, walking on short legs, and generally their stark black and white pulmage.
The task is to analyze and show the trends on behavioral patterns, body sizes, flipper lengths of the sea birds called penguins, with respect to its species and the respective islands through which the data prospects were conducted.
The repository to this Palmer Archipelago (Antarctica) penguin data was kaggle. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. The data were collected from the year 2007-2009
For this analysis, each file was checked up with Excel if there exists duplicates, leading, trailing and repeated spaces in the data. Hence, there was none to be detected.
Setting my R environment by loading the ‘tidyverse’ and other packages useful for the analysis and visualizations.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## Warning: package 'purrr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Rows: 344 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): studyName, Species, Region, Island, Stage, Individual ID, Clutch C...
## dbl (7): Sample Number, Culmen Length (mm), Culmen Depth (mm), Flipper Leng...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 344 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (4): culmen_length_mm, culmen_depth_mm, flipper_length_mm, body_mass_g
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spec_tbl_df [344 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ studyName : chr [1:344] "PAL0708" "PAL0708" "PAL0708" "PAL0708" ...
## $ Sample Number : num [1:344] 1 2 3 4 5 6 7 8 9 10 ...
## $ Species : chr [1:344] "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" "Adelie Penguin (Pygoscelis adeliae)" ...
## $ Region : chr [1:344] "Anvers" "Anvers" "Anvers" "Anvers" ...
## $ Island : chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
## $ Stage : chr [1:344] "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" "Adult, 1 Egg Stage" ...
## $ Individual ID : chr [1:344] "N1A1" "N1A2" "N2A1" "N2A2" ...
## $ Clutch Completion : chr [1:344] "Yes" "Yes" "Yes" "Yes" ...
## $ Date Egg : chr [1:344] "11/11/07" "11/11/07" "11/16/07" "11/16/07" ...
## $ Culmen Length (mm) : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ Culmen Depth (mm) : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ Flipper Length (mm): num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
## $ Body Mass (g) : num [1:344] 3750 3800 3250 NA 3450 ...
## $ Sex : chr [1:344] "MALE" "FEMALE" "FEMALE" NA ...
## $ Delta 15 N (o/oo) : num [1:344] NA 8.95 8.37 NA 8.77 ...
## $ Delta 13 C (o/oo) : num [1:344] NA -24.7 -25.3 NA -25.3 ...
## $ Comments : chr [1:344] "Not enough blood for isotopes." NA NA "Adult not sampled." ...
## - attr(*, "spec")=
## .. cols(
## .. studyName = col_character(),
## .. `Sample Number` = col_double(),
## .. Species = col_character(),
## .. Region = col_character(),
## .. Island = col_character(),
## .. Stage = col_character(),
## .. `Individual ID` = col_character(),
## .. `Clutch Completion` = col_character(),
## .. `Date Egg` = col_character(),
## .. `Culmen Length (mm)` = col_double(),
## .. `Culmen Depth (mm)` = col_double(),
## .. `Flipper Length (mm)` = col_double(),
## .. `Body Mass (g)` = col_double(),
## .. Sex = col_character(),
## .. `Delta 15 N (o/oo)` = col_double(),
## .. `Delta 13 C (o/oo)` = col_double(),
## .. Comments = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
## [1] "studyName" "Sample Number" "Species"
## [4] "Region" "Island" "Stage"
## [7] "Individual ID" "Clutch Completion" "Date Egg"
## [10] "Culmen Length (mm)" "Culmen Depth (mm)" "Flipper Length (mm)"
## [13] "Body Mass (g)" "Sex" "Delta 15 N (o/oo)"
## [16] "Delta 13 C (o/oo)" "Comments"
penguins_iter file has 344 rows and 17 columns, with the column names StudyName, Sample Number, Species, Region, Island, etc.
## spec_tbl_df [344 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ species : chr [1:344] "Adelie" "Adelie" "Adelie" "Adelie" ...
## $ island : chr [1:344] "Torgersen" "Torgersen" "Torgersen" "Torgersen" ...
## $ culmen_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ culmen_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ flipper_length_mm: num [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
## $ body_mass_g : num [1:344] 3750 3800 3250 NA 3450 ...
## $ sex : chr [1:344] "MALE" "FEMALE" "FEMALE" NA ...
## - attr(*, "spec")=
## .. cols(
## .. species = col_character(),
## .. island = col_character(),
## .. culmen_length_mm = col_double(),
## .. culmen_depth_mm = col_double(),
## .. flipper_length_mm = col_double(),
## .. body_mass_g = col_double(),
## .. sex = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
## [1] "species" "island" "culmen_length_mm"
## [4] "culmen_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex"
Penguins_size file has 344 rows and 7 columns, with column names Species, Island, flipper_length_mm, body_mass_g and sex
## # A tibble: 344 × 17
## studyName Sampl…¹ Species Region Island Stage Indiv…² Clutc…³ Date …⁴ Culme…⁵
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 PAL0708 29 Adelie… Anvers Biscoe Adul… N18A1 No 11/10/… 37.9
## 2 PAL0708 30 Adelie… Anvers Biscoe Adul… N18A2 No 11/10/… 40.5
## 3 PAL0708 21 Adelie… Anvers Biscoe Adul… N11A1 Yes 11/12/… 37.8
## 4 PAL0708 22 Adelie… Anvers Biscoe Adul… N11A2 Yes 11/12/… 37.7
## 5 PAL0708 23 Adelie… Anvers Biscoe Adul… N12A1 Yes 11/12/… 35.9
## 6 PAL0708 24 Adelie… Anvers Biscoe Adul… N12A2 Yes 11/12/… 38.2
## 7 PAL0708 25 Adelie… Anvers Biscoe Adul… N13A1 Yes 11/10/… 38.8
## 8 PAL0708 26 Adelie… Anvers Biscoe Adul… N13A2 Yes 11/10/… 35.3
## 9 PAL0708 27 Adelie… Anvers Biscoe Adul… N17A1 Yes 11/12/… 40.6
## 10 PAL0708 28 Adelie… Anvers Biscoe Adul… N17A2 Yes 11/12/… 40.5
## # … with 334 more rows, 7 more variables: `Culmen Depth (mm)` <dbl>,
## # `Flipper Length (mm)` <dbl>, `Body Mass (g)` <dbl>, Sex <chr>,
## # `Delta 15 N (o/oo)` <dbl>, `Delta 13 C (o/oo)` <dbl>, Comments <chr>, and
## # abbreviated variable names ¹`Sample Number`, ²`Individual ID`,
## # ³`Clutch Completion`, ⁴`Date Egg`, ⁵`Culmen Length (mm)`
## `summarise()` has grouped output by 'Species', 'Island', 'Clutch Completion'.
## You can override using the `.groups` argument.
## # A tibble: 13 × 3
## # Groups: Species, Island, Clutch Completion [3]
## Species Island `Clutch Completion`
## <chr> <chr> <chr>
## 1 Adelie Penguin (Pygoscelis adeliae) Biscoe No
## 2 Adelie Penguin (Pygoscelis adeliae) Biscoe No
## 3 Adelie Penguin (Pygoscelis adeliae) Dream No
## 4 Adelie Penguin (Pygoscelis adeliae) Dream No
## 5 Adelie Penguin (Pygoscelis adeliae) Dream No
## 6 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 7 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 8 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 9 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 10 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 11 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 12 Adelie Penguin (Pygoscelis adeliae) Torgersen No
## 13 Adelie Penguin (Pygoscelis adeliae) Torgersen No
Here, i want to determine the island that has the largest population of penguins species
## # A tibble: 344 × 2
## Species Island
## <chr> <chr>
## 1 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 2 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 3 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 4 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 5 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 6 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 7 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 8 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 9 Adelie Penguin (Pygoscelis adeliae) Torgersen
## 10 Adelie Penguin (Pygoscelis adeliae) Torgersen
## # … with 334 more rows
A clutch is the total number of eggs a bird lays per each nesting attempt. Clutch sizes differ, and some birds have more than one nesting attempt per year. Here, we’re going to look at each of the penguins specie relatively with respect to its clutch completion pattern
## # A tibble: 344 × 2
## Species `Clutch Completion`
## <chr> <chr>
## 1 Adelie Penguin (Pygoscelis adeliae) Yes
## 2 Adelie Penguin (Pygoscelis adeliae) Yes
## 3 Adelie Penguin (Pygoscelis adeliae) Yes
## 4 Adelie Penguin (Pygoscelis adeliae) Yes
## 5 Adelie Penguin (Pygoscelis adeliae) Yes
## 6 Adelie Penguin (Pygoscelis adeliae) Yes
## 7 Adelie Penguin (Pygoscelis adeliae) No
## 8 Adelie Penguin (Pygoscelis adeliae) No
## 9 Adelie Penguin (Pygoscelis adeliae) Yes
## 10 Adelie Penguin (Pygoscelis adeliae) Yes
## # … with 334 more rows
## # A tibble: 344 × 7
## species island culmen_length_mm culmen_depth_mm flipper_le…¹ body_…² sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Adelie Biscoe 37.9 18.6 172 3150 FEMA…
## 2 Adelie Biscoe 37.8 18.3 174 3400 FEMA…
## 3 Adelie Torgersen 40.2 17 176 3450 FEMA…
## 4 Adelie Dream 33.1 16.1 178 2900 FEMA…
## 5 Adelie Dream 39.5 16.7 178 3250 FEMA…
## 6 Adelie Dream 37.2 18.1 178 3900 MALE
## 7 Adelie Dream 37.5 18.9 179 2975 <NA>
## 8 Adelie Dream 42.2 18.5 180 3550 FEMA…
## 9 Adelie Biscoe 37.7 18.7 180 3600 MALE
## 10 Adelie Torgersen 37.8 17.3 180 3700 <NA>
## # … with 334 more rows, and abbreviated variable names ¹flipper_length_mm,
## # ²body_mass_g
## `summarise()` has grouped output by 'species', 'flipper_length_mm',
## 'body_mass_g', 'sex'. You can override using the `.groups` argument.
## # A tibble: 334 × 4
## # Groups: species, flipper_length_mm, body_mass_g, sex [313]
## species flipper_length_mm body_mass_g sex
## <chr> <dbl> <dbl> <chr>
## 1 Adelie 172 3150 FEMALE
## 2 Adelie 174 3400 FEMALE
## 3 Adelie 176 3450 FEMALE
## 4 Adelie 178 2900 FEMALE
## 5 Adelie 178 3250 FEMALE
## 6 Adelie 178 3900 MALE
## 7 Adelie 180 3550 FEMALE
## 8 Adelie 180 3600 MALE
## 9 Adelie 180 3800 MALE
## 10 Adelie 180 3950 MALE
## # … with 324 more rows
Correlating penguins species w.r.t their flipper length and body mass respectively
## # A tibble: 344 × 3
## species flipper_length_mm body_mass_g
## <chr> <dbl> <dbl>
## 1 Adelie 181 3750
## 2 Adelie 186 3800
## 3 Adelie 195 3250
## 4 Adelie NA NA
## 5 Adelie 193 3450
## 6 Adelie 190 3650
## 7 Adelie 181 3625
## 8 Adelie 195 4675
## 9 Adelie 193 3475
## 10 Adelie 190 4250
## # … with 334 more rows
Before i proceed to work on the bodymass measured in gram, i need to do a little mathematical conversion, from gram to kilogram
## # A tibble: 344 × 8
## species island culmen_length_mm culmen_dep…¹ flipp…² body_…³ sex body_…⁴
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Adelie Torgersen 39.1 18.7 181 3750 MALE 3.75
## 2 Adelie Torgersen 39.5 17.4 186 3800 FEMA… 3.8
## 3 Adelie Torgersen 40.3 18 195 3250 FEMA… 3.25
## 4 Adelie Torgersen NA NA NA NA <NA> NA
## 5 Adelie Torgersen 36.7 19.3 193 3450 FEMA… 3.45
## 6 Adelie Torgersen 39.3 20.6 190 3650 MALE 3.65
## 7 Adelie Torgersen 38.9 17.8 181 3625 FEMA… 3.62
## 8 Adelie Torgersen 39.2 19.6 195 4675 MALE 4.68
## 9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 3.48
## 10 Adelie Torgersen 42 20.2 190 4250 <NA> 4.25
## # … with 334 more rows, and abbreviated variable names ¹culmen_depth_mm,
## # ²flipper_length_mm, ³body_mass_g, ⁴body_mass_kg
Penguins are mostly any of the several flightless sea birds, of order Sphenisciformes, usually found in the Southern hemisphere, marked by their usual upright stance, walking on short legs, and (generally) their stark black and white pulmage. For this analysis, i will only recommend that more data on PalmerPenguins should be collected, and if possible the study prospects should be directed to more than just three Islands. And also, for us to be rest assured of the data not been biased, the sample size of the data to be collected should be extended to a minimum of 30, as stipulated by the Data Analysts.