Dataset Description

  • Description: This dataset includes data from three different penguin species, Adelie, Chinstrap, and Gentoo. It has information on the flipper length and the culmen length and depth. The culmen refers to the upper ridge of a bird’s beak.
  • Source: https://www.kaggle.com/datasets/amulyas/penguin-size-dataset

Number of Penguins in each Island

Barplot: I want to start off by showing a stacked bar graph, showing the number of penguins existing in the dataset, grouped by species and which island they live in.

Discussion: From the graph we see that the data only includes data from 3 species of penguins, Adelie, Chinstrap, and Gentoo, from 3 different islands, Biscoe, Dream, and Torgersen. Adelie is the only species to live in multiple islands.

Adelie Penguin Island Distribution

Pie Chart: Since Adelie was the only species living in more than one island (all three of them), I wanted to show a pie chart specifically for Adelie penguins, showing the percentage of them who live in Biscoe, Dream, or Torgerson.

Discussion: They appear to be distributed fairly evenly throughout the three islands.

Culmen Length vs Culmen Depth vs Flipper Length

3d Scatter Plot: Now I will show a 3D scatter plot displaying the culmen length vs culmen depth vs flipper length of all penguins, colored by species. For slide spacing purposes, I will show the graph on a separate slide. (Apologies for the labels getting cut off, I tried to fix)

3D Scatterplot

Statistical Analysis

The code on the next slide shows the beginning of the code for a table showing a statistical analysis of the variables included in the 3D scatter plot. It shows the mean, median, standard deviation, 1st quartile, 3rd quartile, minimum, and maximum values for culmen length (CL), culmen depth (CD), and flipper length (FL). It shows separate analysis for each species. However, this table is too large to show on the slide effectively. I have created a separate table showing only the mean for culmen length, culmen depth, and flipper length to analyze, which will be shown after the code of the bigger table.

Statistical Analysis Code

stats <- penguin_data %>%
  group_by(species) %>%
  summarize(
    mean_CL = mean(culmen_length_mm, na.rm = TRUE),
    SD_CL = sd(culmen_length_mm, na.rm = TRUE),
    min_CL = min(culmen_length_mm, na.rm = TRUE),
    Q1_CL = quantile(culmen_length_mm, 0.25, na.rm = TRUE),
    median_CL = median(culmen_length_mm, na.rm = TRUE),
    Q3_CL = quantile(culmen_length_mm, 0.75, na.rm = TRUE),
    max_CL = max(culmen_length_mm, na.rm = TRUE),
    
    # This chunk inside summarize repeats for CD and FL
    
    mean_CD = mean(culmen_depth_mm, na.rm = TRUE),
    SD_CD = sd(culmen_depth_mm, na.rm = TRUE),
    min_CD = min(culmen_depth_mm, na.rm = TRUE),
    Q1_CD = quantile(culmen_depth_mm, 0.25, na.rm = TRUE),
    median_CD = median(culmen_depth_mm, na.rm = TRUE),
    Q3_CD = quantile(culmen_depth_mm, 0.75, na.rm = TRUE),
    max_CD = max(culmen_depth_mm, na.rm = TRUE),
    
    mean_FL = mean(flipper_length_mm, na.rm = TRUE),
    SD_FL = sd(flipper_length_mm, na.rm = TRUE),
    min_FL = min(flipper_length_mm, na.rm = TRUE),
    Q1_FL = quantile(flipper_length_mm, 0.25, na.rm = TRUE),
    median_FL = median(flipper_length_mm, na.rm = TRUE),
    Q3_Flipper_Length = quantile(flipper_length_mm, 0.75, na.rm = TRUE),
    max_FL = max(flipper_length_mm, na.rm = TRUE)
  )

stats

Mean Only Table

Discussion: It appears that Gentoo has the longest flippers, the second longest culmen, and the least depth to the culmen. The Chinstrap, second place for flipper length, seems to have the biggest culmen overall, having a slightly longer length than Gentoo and slightly larger depth than Adelie.

## # A tibble: 3 × 4
##   species   mean_CL mean_CD mean_FL
##   <chr>       <dbl>   <dbl>   <dbl>
## 1 Adelie       38.8    18.3    190.
## 2 Chinstrap    48.8    18.4    196.
## 3 Gentoo       47.5    15.0    217.

Culmen Length Vs Culmen Depth

I wanted to include another 2D scatter plot to show just the culmen length vs depth grouped by species. I thought it would be easier to visually see the differences in length and depth between species. With this it is more clear to see how Gentoo penguins have less depth to their culmens, and Adelies clearly have less length.

Distribution of Flipper Length by Species

I also want to show a histogram of the flipper lengths by species. I think this more clearly shows the differences between the lengths in each species. We can see from the histogram that Gentoos typically have longer flippers, and Adelie is typically the shortest. Chinstrap seems to be more in the middle, but closer to the length of Adelie than Gentoo.

Body Mass Distribution by Species

This boxplot shows the distribution of body mass between the 3 penguin species. We can see that Gentoos weigh significantly more than Chinstraps or Adelie. Chinstrap and Adlie seem to be similar in body mass.

Concluding Thoughts

It appears that, excluding the beak, Gentoo’s tend to be bigger than the other two species in the sense of weight and flipper length. However, they appear to have smaller culmen depth. I think using the mean to analyze the data in a more numerical way was very efficient. Along with the visual scatter plots showing the difference between culment length/depth and flipper length, the table showing the mean made it easy to numerically and accurately evaluate what the graph showed.