When studying the penguins on the Palmer islands, how do penguin mass body mass, penguin species, and penguin sex all interact with each other? We will try to explore these questions by looking at an appropriate data set.
We do some exploratory data analysis to determine more about this data set.
# Our usual commands to check this
names(penguins)
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "year"
dim(penguins)
## [1] 344 8
head(penguins)
## # A tibble: 6 Ă— 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## # ℹ 2 more variables: sex <fct>, year <int>
# Looking at some of the variables/columns
penguins$species # categorical: Adelie, Chinstrap, Gentoo
## [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [8] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [15] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [22] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [29] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [36] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [43] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [50] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [57] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [64] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [71] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [78] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [85] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [92] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [99] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [106] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [113] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [120] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [127] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [134] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [141] Adelie Adelie Adelie Adelie Adelie Adelie Adelie
## [148] Adelie Adelie Adelie Adelie Adelie Gentoo Gentoo
## [155] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [162] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [169] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [176] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [183] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [190] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [197] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [204] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [211] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [218] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [225] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [232] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [239] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [246] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [253] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [260] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [267] Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo Gentoo
## [274] Gentoo Gentoo Gentoo Chinstrap Chinstrap Chinstrap Chinstrap
## [281] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [288] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [295] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [302] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [309] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [316] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [323] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [330] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [337] Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap Chinstrap
## [344] Chinstrap
## Levels: Adelie Chinstrap Gentoo
penguins$island # categorical: Torgersen, Dream, Biscoe
## [1] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen
## [8] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen
## [15] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Biscoe
## [22] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [29] Biscoe Biscoe Dream Dream Dream Dream Dream
## [36] Dream Dream Dream Dream Dream Dream Dream
## [43] Dream Dream Dream Dream Dream Dream Dream
## [50] Dream Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [57] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [64] Biscoe Biscoe Biscoe Biscoe Biscoe Torgersen Torgersen
## [71] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen
## [78] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen
## [85] Dream Dream Dream Dream Dream Dream Dream
## [92] Dream Dream Dream Dream Dream Dream Dream
## [99] Dream Dream Biscoe Biscoe Biscoe Biscoe Biscoe
## [106] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [113] Biscoe Biscoe Biscoe Biscoe Torgersen Torgersen Torgersen
## [120] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen
## [127] Torgersen Torgersen Torgersen Torgersen Torgersen Torgersen Dream
## [134] Dream Dream Dream Dream Dream Dream Dream
## [141] Dream Dream Dream Dream Dream Dream Dream
## [148] Dream Dream Dream Dream Dream Biscoe Biscoe
## [155] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [162] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [169] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [176] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [183] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [190] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [197] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [204] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [211] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [218] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [225] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [232] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [239] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [246] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [253] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [260] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [267] Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe Biscoe
## [274] Biscoe Biscoe Biscoe Dream Dream Dream Dream
## [281] Dream Dream Dream Dream Dream Dream Dream
## [288] Dream Dream Dream Dream Dream Dream Dream
## [295] Dream Dream Dream Dream Dream Dream Dream
## [302] Dream Dream Dream Dream Dream Dream Dream
## [309] Dream Dream Dream Dream Dream Dream Dream
## [316] Dream Dream Dream Dream Dream Dream Dream
## [323] Dream Dream Dream Dream Dream Dream Dream
## [330] Dream Dream Dream Dream Dream Dream Dream
## [337] Dream Dream Dream Dream Dream Dream Dream
## [344] Dream
## Levels: Biscoe Dream Torgersen
penguins$sex # categorical with some NAs: male, female
## [1] male female female <NA> female male female male <NA> <NA>
## [11] <NA> <NA> female male male female female male female male
## [21] female male female male male female male female female male
## [31] female male female male female male male female female male
## [41] female male female male female male male <NA> female male
## [51] female male female male female male female male female male
## [61] female male female male female male female male female male
## [71] female male female male female male female male female male
## [81] female male female male female male male female male female
## [91] female male female male female male female male female male
## [101] female male female male female male female male female male
## [111] female male female male female male female male female male
## [121] female male female male female male female male female male
## [131] female male female male female male female male female male
## [141] female male female male female male male female female male
## [151] female male female male female male male female female male
## [161] female male female male female male female male female male
## [171] female male male female female male female male <NA> male
## [181] female male male female female male female male female male
## [191] female male female male female male male female female male
## [201] female male female male female male female male female male
## [211] female male female male female male female male <NA> male
## [221] female male female male male female female male female male
## [231] female male female male female male female male female male
## [241] female male female male female male female male male female
## [251] female male female male female male <NA> male female male
## [261] female male female male female male female male <NA> male
## [271] female <NA> female male female male female male male female
## [281] male female female male female male female male female male
## [291] female male male female female male female male female male
## [301] female male female male female male female male female male
## [311] male female female male female male male female male female
## [321] female male female male male female female male female male
## [331] female male female male male female male female female male
## [341] female male male female
## Levels: female male
penguins$body_mass_g # quantitative
## [1] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 3300 3700 3200 3800 4400
## [16] 3700 3450 4500 3325 4200 3400 3600 3800 3950 3800 3800 3550 3200 3150 3950
## [31] 3250 3900 3300 3900 3325 4150 3950 3550 3300 4650 3150 3900 3100 4400 3000
## [46] 4600 3425 2975 3450 4150 3500 4300 3450 4050 2900 3700 3550 3800 2850 3750
## [61] 3150 4400 3600 4050 2850 3950 3350 4100 3050 4450 3600 3900 3550 4150 3700
## [76] 4250 3700 3900 3550 4000 3200 4700 3800 4200 3350 3550 3800 3500 3950 3600
## [91] 3550 4300 3400 4450 3300 4300 3700 4350 2900 4100 3725 4725 3075 4250 2925
## [106] 3550 3750 3900 3175 4775 3825 4600 3200 4275 3900 4075 2900 3775 3350 3325
## [121] 3150 3500 3450 3875 3050 4000 3275 4300 3050 4000 3325 3500 3500 4475 3425
## [136] 3900 3175 3975 3400 4250 3400 3475 3050 3725 3000 3650 4250 3475 3450 3750
## [151] 3700 4000 4500 5700 4450 5700 5400 4550 4800 5200 4400 5150 4650 5550 4650
## [166] 5850 4200 5850 4150 6300 4800 5350 5700 5000 4400 5050 5000 5100 4100 5650
## [181] 4600 5550 5250 4700 5050 6050 5150 5400 4950 5250 4350 5350 3950 5700 4300
## [196] 4750 5550 4900 4200 5400 5100 5300 4850 5300 4400 5000 4900 5050 4300 5000
## [211] 4450 5550 4200 5300 4400 5650 4700 5700 4650 5800 4700 5550 4750 5000 5100
## [226] 5200 4700 5800 4600 6000 4750 5950 4625 5450 4725 5350 4750 5600 4600 5300
## [241] 4875 5550 4950 5400 4750 5650 4850 5200 4925 4875 4625 5250 4850 5600 4975
## [256] 5500 4725 5500 4700 5500 4575 5500 5000 5950 4650 5500 4375 5850 4875 6000
## [271] 4925 NA 4850 5750 5200 5400 3500 3900 3650 3525 3725 3950 3250 3750 4150
## [286] 3700 3800 3775 3700 4050 3575 4050 3300 3700 3450 4400 3600 3400 2900 3800
## [301] 3300 4150 3400 3800 3700 4550 3200 4300 3350 4100 3600 3900 3850 4800 2700
## [316] 4500 3950 3650 3550 3500 3675 4450 3400 4300 3250 3675 3325 3950 3600 4050
## [331] 3350 3450 3250 4050 3800 3525 3950 3650 3650 4000 3400 3775 4100 3775
penguins$flipper_length_mm # quantitative
## [1] 181 186 195 NA 193 190 181 195 193 190 186 180 182 191 198 185 195 197
## [19] 184 194 174 180 189 185 180 187 183 187 172 180 178 178 188 184 195 196
## [37] 190 180 181 184 182 195 186 196 185 190 182 179 190 191 186 188 190 200
## [55] 187 191 186 193 181 194 185 195 185 192 184 192 195 188 190 198 190 190
## [73] 196 197 190 195 191 184 187 195 189 196 187 193 191 194 190 189 189 190
## [91] 202 205 185 186 187 208 190 196 178 192 192 203 183 190 193 184 199 190
## [109] 181 197 198 191 193 197 191 196 188 199 189 189 187 198 176 202 186 199
## [127] 191 195 191 210 190 197 193 199 187 190 191 200 185 193 193 187 188 190
## [145] 192 185 190 184 195 193 187 201 211 230 210 218 215 210 211 219 209 215
## [163] 214 216 214 213 210 217 210 221 209 222 218 215 213 215 215 215 216 215
## [181] 210 220 222 209 207 230 220 220 213 219 208 208 208 225 210 216 222 217
## [199] 210 225 213 215 210 220 210 225 217 220 208 220 208 224 208 221 214 231
## [217] 219 230 214 229 220 223 216 221 221 217 216 230 209 220 215 223 212 221
## [235] 212 224 212 228 218 218 212 230 218 228 212 224 214 226 216 222 203 225
## [253] 219 228 215 228 216 215 210 219 208 209 216 229 213 230 217 230 217 222
## [271] 214 NA 215 222 212 213 192 196 193 188 197 198 178 197 195 198 193 194
## [289] 185 201 190 201 197 181 190 195 181 191 187 193 195 197 200 200 191 205
## [307] 187 201 187 203 195 199 195 210 192 205 210 187 196 196 196 201 190 212
## [325] 187 198 199 201 193 203 187 197 191 203 202 194 206 189 195 207 202 193
## [343] 210 198
As we can see, there are eight variables: names(penguins), dim(penguins), head(penguins), species, island, sex, body mass, and flipper length
We might be interested in knowing what the “categories” in our categorical variables might be. We enter the following R code to check this:
levels(as.factor(penguins$species))
## [1] "Adelie" "Chinstrap" "Gentoo"
levels(as.factor(penguins$island))
## [1] "Biscoe" "Dream" "Torgersen"
levels(as.factor(penguins$sex))
## [1] "female" "male"
table(penguins$species)
##
## Adelie Chinstrap Gentoo
## 152 68 124
table(penguins$island)
##
## Biscoe Dream Torgersen
## 168 124 52
table(penguins$sex)
##
## female male
## 165 168
# remember, this data set has some NA values.
# let's clean the data by removing these rows.
colSums(is.na(penguins))
## species island bill_length_mm bill_depth_mm
## 0 0 2 2
## flipper_length_mm body_mass_g sex year
## 2 2 11 0
penguins2 <- penguins %>% drop_na()
We see that each categorical variable has the following categories: species, island, bill length, bill depth, and flipper length as well as sex. We did see that this data set had some NA values in it. We will “clean” the data by removing the NAs, and naming this new data set “penguins2”. We will only work with penguins2 for the rest of this report.
colSums(is.na(penguins))
## species island bill_length_mm bill_depth_mm
## 0 0 2 2
## flipper_length_mm body_mass_g sex year
## 2 2 11 0
penguins2 <- penguins %>% drop_na()
We are interested in checking the independence of the categorical variables species and island. We can do this by making a contingency table and by checking appropriate graphs. We include a few of those below:
# Basic bar graph with species
ggplot(data = penguins, aes(x = species)) +
geom_bar()
# Basic bar graphs, “facet wrapped” for island
ggplot(data = penguins2, aes(x = species)) +
geom_bar() +
facet_wrap(~island)
In the data below, we see that the distribution for species changes for different islands. Specifically, 100% if Gentoo’s are on Biscoe island. 100% of Chinstraps are on Dream island. 25% of Adelies are on Biscoe, 40% are on Dream, and 33% on Torgersen. Torgersen has 100% Adelies. Dream has 52% Chinstrap and 48% Adelie. Biscoe has 77% of Gentoo and 23% of Adelie. This difference (in species) across groups (of island) suggests that these two variables are NOT independent.
We might also be interested in comparing body mass (a quantitative variable) across categories of species and of sex. We can do this by calculating summary statistics (mean, standard deviation, and so on) for each species and/or sex category. We include a few of those below:
ggplot(data = penguins2, aes(x = species, fill = island)) +
geom_bar()
ggplot(data = penguins2, aes(x = species, fill = island)) +
geom_bar(position = "dodge")
ggplot(data = penguins2, aes(x = species, fill = island)) +
geom_bar(position = "fill") # segmented (100%) bars
We can draw the following conclusion: All of the Gentoos are on the Biscoe island and all of the Chinstraps are on the Dream island while the Adelie’s are spread across each island.
To further explore the difference in body mass across categories of sex and/or species, we can create displays of the data! We do so here, creating appropriate histograms and boxplots:
# Body mass histograms
ggplot(data = penguins2, aes(x = body_mass_g)) +
geom_histogram(binwidth = 200)
ggplot(data = penguins2, aes(x = body_mass_g, fill = species)) +
geom_histogram(binwidth = 200)
ggplot(data = penguins2, aes(x = body_mass_g, fill = species)) +
geom_histogram(binwidth = 200) +
facet_wrap(~species)
############################################################
### One Quantitative and One Categorical Variable – Graphing Boxplots
############################################################
# A single overall boxplot for body mass
ggplot(data = penguins2, aes(x = body_mass_g)) +
geom_boxplot()
# Boxplots of body mass by species (vertical)
ggplot(data = penguins2, aes(y = species, x = body_mass_g)) +
geom_boxplot()
# Boxplots of body mass by sex (horizontal)
ggplot(data = penguins2, aes(y = sex, x = body_mass_g)) +
geom_boxplot()
# Boxplots by species and sex together
ggplot(data = penguins2, aes(y = interaction(species, sex), x = body_mass_g, fill = sex)) +
geom_boxplot()
By looking at the histograms and the boxplots, we see the following: The most body mass is around 3500 at the count of 35. The Adelie species has the most counts and the Gentoo’s have the largest body mass.
This data set allows us to explore the interplay between species and island, and between species, sex, and body mass, of the Palmer Penguins. By using a contingency table and some bar charts, we see that there is a relationship between species and island (with different islands having vastly different distributions of penguin species). By comparing summary statistics and looking at histograms and boxplots, we see that there is a relationship between body mass and species / sex (with males typically having more body mass than females for each species, and Gentoo penguins typically having more body mass than the other two species of penguin). There are many other comparisons we could make in this data set, and we may return to it in the future!