Data were collected and made available by Dr. Kristen Gorman
and the Palmer Station, Antarctica LTER, a member of the Long Term
Ecological Research Network. The aim was to explore all the data about
ecological sexual dimorphism and environmental variability within a
Community of Antarctic penguins of genus Pygoscelis (Gorman, Williams, and Fraser 2014)
Penguins package is available at CRAN - Package palmerpenguins (Horst, Hill, and Gorman 2020)
df = na.omit(penguins) #remove rows with NA values
kable(df) %>% kable_styling(fixed_thead = T, full_width = FALSE) %>%
scroll_box( height = "600px")
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|---|
| Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
| Adelie | Torgersen | 38.9 | 17.8 | 181 | 3625 | female | 2007 |
| Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | male | 2007 |
| Adelie | Torgersen | 41.1 | 17.6 | 182 | 3200 | female | 2007 |
| Adelie | Torgersen | 38.6 | 21.2 | 191 | 3800 | male | 2007 |
| Adelie | Torgersen | 34.6 | 21.1 | 198 | 4400 | male | 2007 |
| Adelie | Torgersen | 36.6 | 17.8 | 185 | 3700 | female | 2007 |
| Adelie | Torgersen | 38.7 | 19.0 | 195 | 3450 | female | 2007 |
| Adelie | Torgersen | 42.5 | 20.7 | 197 | 4500 | male | 2007 |
| Adelie | Torgersen | 34.4 | 18.4 | 184 | 3325 | female | 2007 |
| Adelie | Torgersen | 46.0 | 21.5 | 194 | 4200 | male | 2007 |
| Adelie | Biscoe | 37.8 | 18.3 | 174 | 3400 | female | 2007 |
| Adelie | Biscoe | 37.7 | 18.7 | 180 | 3600 | male | 2007 |
| Adelie | Biscoe | 35.9 | 19.2 | 189 | 3800 | female | 2007 |
| Adelie | Biscoe | 38.2 | 18.1 | 185 | 3950 | male | 2007 |
| Adelie | Biscoe | 38.8 | 17.2 | 180 | 3800 | male | 2007 |
| Adelie | Biscoe | 35.3 | 18.9 | 187 | 3800 | female | 2007 |
| Adelie | Biscoe | 40.6 | 18.6 | 183 | 3550 | male | 2007 |
| Adelie | Biscoe | 40.5 | 17.9 | 187 | 3200 | female | 2007 |
| Adelie | Biscoe | 37.9 | 18.6 | 172 | 3150 | female | 2007 |
| Adelie | Biscoe | 40.5 | 18.9 | 180 | 3950 | male | 2007 |
| Adelie | Dream | 39.5 | 16.7 | 178 | 3250 | female | 2007 |
| Adelie | Dream | 37.2 | 18.1 | 178 | 3900 | male | 2007 |
| Adelie | Dream | 39.5 | 17.8 | 188 | 3300 | female | 2007 |
| Adelie | Dream | 40.9 | 18.9 | 184 | 3900 | male | 2007 |
| Adelie | Dream | 36.4 | 17.0 | 195 | 3325 | female | 2007 |
| Adelie | Dream | 39.2 | 21.1 | 196 | 4150 | male | 2007 |
| Adelie | Dream | 38.8 | 20.0 | 190 | 3950 | male | 2007 |
| Adelie | Dream | 42.2 | 18.5 | 180 | 3550 | female | 2007 |
| Adelie | Dream | 37.6 | 19.3 | 181 | 3300 | female | 2007 |
| Adelie | Dream | 39.8 | 19.1 | 184 | 4650 | male | 2007 |
| Adelie | Dream | 36.5 | 18.0 | 182 | 3150 | female | 2007 |
| Adelie | Dream | 40.8 | 18.4 | 195 | 3900 | male | 2007 |
| Adelie | Dream | 36.0 | 18.5 | 186 | 3100 | female | 2007 |
| Adelie | Dream | 44.1 | 19.7 | 196 | 4400 | male | 2007 |
| Adelie | Dream | 37.0 | 16.9 | 185 | 3000 | female | 2007 |
| Adelie | Dream | 39.6 | 18.8 | 190 | 4600 | male | 2007 |
| Adelie | Dream | 41.1 | 19.0 | 182 | 3425 | male | 2007 |
| Adelie | Dream | 36.0 | 17.9 | 190 | 3450 | female | 2007 |
| Adelie | Dream | 42.3 | 21.2 | 191 | 4150 | male | 2007 |
| Adelie | Biscoe | 39.6 | 17.7 | 186 | 3500 | female | 2008 |
| Adelie | Biscoe | 40.1 | 18.9 | 188 | 4300 | male | 2008 |
| Adelie | Biscoe | 35.0 | 17.9 | 190 | 3450 | female | 2008 |
| Adelie | Biscoe | 42.0 | 19.5 | 200 | 4050 | male | 2008 |
| Adelie | Biscoe | 34.5 | 18.1 | 187 | 2900 | female | 2008 |
| Adelie | Biscoe | 41.4 | 18.6 | 191 | 3700 | male | 2008 |
| Adelie | Biscoe | 39.0 | 17.5 | 186 | 3550 | female | 2008 |
| Adelie | Biscoe | 40.6 | 18.8 | 193 | 3800 | male | 2008 |
| Adelie | Biscoe | 36.5 | 16.6 | 181 | 2850 | female | 2008 |
| Adelie | Biscoe | 37.6 | 19.1 | 194 | 3750 | male | 2008 |
| Adelie | Biscoe | 35.7 | 16.9 | 185 | 3150 | female | 2008 |
| Adelie | Biscoe | 41.3 | 21.1 | 195 | 4400 | male | 2008 |
| Adelie | Biscoe | 37.6 | 17.0 | 185 | 3600 | female | 2008 |
| Adelie | Biscoe | 41.1 | 18.2 | 192 | 4050 | male | 2008 |
| Adelie | Biscoe | 36.4 | 17.1 | 184 | 2850 | female | 2008 |
| Adelie | Biscoe | 41.6 | 18.0 | 192 | 3950 | male | 2008 |
| Adelie | Biscoe | 35.5 | 16.2 | 195 | 3350 | female | 2008 |
| Adelie | Biscoe | 41.1 | 19.1 | 188 | 4100 | male | 2008 |
| Adelie | Torgersen | 35.9 | 16.6 | 190 | 3050 | female | 2008 |
| Adelie | Torgersen | 41.8 | 19.4 | 198 | 4450 | male | 2008 |
| Adelie | Torgersen | 33.5 | 19.0 | 190 | 3600 | female | 2008 |
| Adelie | Torgersen | 39.7 | 18.4 | 190 | 3900 | male | 2008 |
| Adelie | Torgersen | 39.6 | 17.2 | 196 | 3550 | female | 2008 |
| Adelie | Torgersen | 45.8 | 18.9 | 197 | 4150 | male | 2008 |
| Adelie | Torgersen | 35.5 | 17.5 | 190 | 3700 | female | 2008 |
| Adelie | Torgersen | 42.8 | 18.5 | 195 | 4250 | male | 2008 |
| Adelie | Torgersen | 40.9 | 16.8 | 191 | 3700 | female | 2008 |
| Adelie | Torgersen | 37.2 | 19.4 | 184 | 3900 | male | 2008 |
| Adelie | Torgersen | 36.2 | 16.1 | 187 | 3550 | female | 2008 |
| Adelie | Torgersen | 42.1 | 19.1 | 195 | 4000 | male | 2008 |
| Adelie | Torgersen | 34.6 | 17.2 | 189 | 3200 | female | 2008 |
| Adelie | Torgersen | 42.9 | 17.6 | 196 | 4700 | male | 2008 |
| Adelie | Torgersen | 36.7 | 18.8 | 187 | 3800 | female | 2008 |
| Adelie | Torgersen | 35.1 | 19.4 | 193 | 4200 | male | 2008 |
| Adelie | Dream | 37.3 | 17.8 | 191 | 3350 | female | 2008 |
| Adelie | Dream | 41.3 | 20.3 | 194 | 3550 | male | 2008 |
| Adelie | Dream | 36.3 | 19.5 | 190 | 3800 | male | 2008 |
| Adelie | Dream | 36.9 | 18.6 | 189 | 3500 | female | 2008 |
| Adelie | Dream | 38.3 | 19.2 | 189 | 3950 | male | 2008 |
| Adelie | Dream | 38.9 | 18.8 | 190 | 3600 | female | 2008 |
| Adelie | Dream | 35.7 | 18.0 | 202 | 3550 | female | 2008 |
| Adelie | Dream | 41.1 | 18.1 | 205 | 4300 | male | 2008 |
| Adelie | Dream | 34.0 | 17.1 | 185 | 3400 | female | 2008 |
| Adelie | Dream | 39.6 | 18.1 | 186 | 4450 | male | 2008 |
| Adelie | Dream | 36.2 | 17.3 | 187 | 3300 | female | 2008 |
| Adelie | Dream | 40.8 | 18.9 | 208 | 4300 | male | 2008 |
| Adelie | Dream | 38.1 | 18.6 | 190 | 3700 | female | 2008 |
| Adelie | Dream | 40.3 | 18.5 | 196 | 4350 | male | 2008 |
| Adelie | Dream | 33.1 | 16.1 | 178 | 2900 | female | 2008 |
| Adelie | Dream | 43.2 | 18.5 | 192 | 4100 | male | 2008 |
| Adelie | Biscoe | 35.0 | 17.9 | 192 | 3725 | female | 2009 |
| Adelie | Biscoe | 41.0 | 20.0 | 203 | 4725 | male | 2009 |
| Adelie | Biscoe | 37.7 | 16.0 | 183 | 3075 | female | 2009 |
| Adelie | Biscoe | 37.8 | 20.0 | 190 | 4250 | male | 2009 |
| Adelie | Biscoe | 37.9 | 18.6 | 193 | 2925 | female | 2009 |
| Adelie | Biscoe | 39.7 | 18.9 | 184 | 3550 | male | 2009 |
| Adelie | Biscoe | 38.6 | 17.2 | 199 | 3750 | female | 2009 |
| Adelie | Biscoe | 38.2 | 20.0 | 190 | 3900 | male | 2009 |
| Adelie | Biscoe | 38.1 | 17.0 | 181 | 3175 | female | 2009 |
| Adelie | Biscoe | 43.2 | 19.0 | 197 | 4775 | male | 2009 |
| Adelie | Biscoe | 38.1 | 16.5 | 198 | 3825 | female | 2009 |
| Adelie | Biscoe | 45.6 | 20.3 | 191 | 4600 | male | 2009 |
| Adelie | Biscoe | 39.7 | 17.7 | 193 | 3200 | female | 2009 |
| Adelie | Biscoe | 42.2 | 19.5 | 197 | 4275 | male | 2009 |
| Adelie | Biscoe | 39.6 | 20.7 | 191 | 3900 | female | 2009 |
| Adelie | Biscoe | 42.7 | 18.3 | 196 | 4075 | male | 2009 |
| Adelie | Torgersen | 38.6 | 17.0 | 188 | 2900 | female | 2009 |
| Adelie | Torgersen | 37.3 | 20.5 | 199 | 3775 | male | 2009 |
| Adelie | Torgersen | 35.7 | 17.0 | 189 | 3350 | female | 2009 |
| Adelie | Torgersen | 41.1 | 18.6 | 189 | 3325 | male | 2009 |
| Adelie | Torgersen | 36.2 | 17.2 | 187 | 3150 | female | 2009 |
| Adelie | Torgersen | 37.7 | 19.8 | 198 | 3500 | male | 2009 |
| Adelie | Torgersen | 40.2 | 17.0 | 176 | 3450 | female | 2009 |
| Adelie | Torgersen | 41.4 | 18.5 | 202 | 3875 | male | 2009 |
| Adelie | Torgersen | 35.2 | 15.9 | 186 | 3050 | female | 2009 |
| Adelie | Torgersen | 40.6 | 19.0 | 199 | 4000 | male | 2009 |
| Adelie | Torgersen | 38.8 | 17.6 | 191 | 3275 | female | 2009 |
| Adelie | Torgersen | 41.5 | 18.3 | 195 | 4300 | male | 2009 |
| Adelie | Torgersen | 39.0 | 17.1 | 191 | 3050 | female | 2009 |
| Adelie | Torgersen | 44.1 | 18.0 | 210 | 4000 | male | 2009 |
| Adelie | Torgersen | 38.5 | 17.9 | 190 | 3325 | female | 2009 |
| Adelie | Torgersen | 43.1 | 19.2 | 197 | 3500 | male | 2009 |
| Adelie | Dream | 36.8 | 18.5 | 193 | 3500 | female | 2009 |
| Adelie | Dream | 37.5 | 18.5 | 199 | 4475 | male | 2009 |
| Adelie | Dream | 38.1 | 17.6 | 187 | 3425 | female | 2009 |
| Adelie | Dream | 41.1 | 17.5 | 190 | 3900 | male | 2009 |
| Adelie | Dream | 35.6 | 17.5 | 191 | 3175 | female | 2009 |
| Adelie | Dream | 40.2 | 20.1 | 200 | 3975 | male | 2009 |
| Adelie | Dream | 37.0 | 16.5 | 185 | 3400 | female | 2009 |
| Adelie | Dream | 39.7 | 17.9 | 193 | 4250 | male | 2009 |
| Adelie | Dream | 40.2 | 17.1 | 193 | 3400 | female | 2009 |
| Adelie | Dream | 40.6 | 17.2 | 187 | 3475 | male | 2009 |
| Adelie | Dream | 32.1 | 15.5 | 188 | 3050 | female | 2009 |
| Adelie | Dream | 40.7 | 17.0 | 190 | 3725 | male | 2009 |
| Adelie | Dream | 37.3 | 16.8 | 192 | 3000 | female | 2009 |
| Adelie | Dream | 39.0 | 18.7 | 185 | 3650 | male | 2009 |
| Adelie | Dream | 39.2 | 18.6 | 190 | 4250 | male | 2009 |
| Adelie | Dream | 36.6 | 18.4 | 184 | 3475 | female | 2009 |
| Adelie | Dream | 36.0 | 17.8 | 195 | 3450 | female | 2009 |
| Adelie | Dream | 37.8 | 18.1 | 193 | 3750 | male | 2009 |
| Adelie | Dream | 36.0 | 17.1 | 187 | 3700 | female | 2009 |
| Adelie | Dream | 41.5 | 18.5 | 201 | 4000 | male | 2009 |
| Gentoo | Biscoe | 46.1 | 13.2 | 211 | 4500 | female | 2007 |
| Gentoo | Biscoe | 50.0 | 16.3 | 230 | 5700 | male | 2007 |
| Gentoo | Biscoe | 48.7 | 14.1 | 210 | 4450 | female | 2007 |
| Gentoo | Biscoe | 50.0 | 15.2 | 218 | 5700 | male | 2007 |
| Gentoo | Biscoe | 47.6 | 14.5 | 215 | 5400 | male | 2007 |
| Gentoo | Biscoe | 46.5 | 13.5 | 210 | 4550 | female | 2007 |
| Gentoo | Biscoe | 45.4 | 14.6 | 211 | 4800 | female | 2007 |
| Gentoo | Biscoe | 46.7 | 15.3 | 219 | 5200 | male | 2007 |
| Gentoo | Biscoe | 43.3 | 13.4 | 209 | 4400 | female | 2007 |
| Gentoo | Biscoe | 46.8 | 15.4 | 215 | 5150 | male | 2007 |
| Gentoo | Biscoe | 40.9 | 13.7 | 214 | 4650 | female | 2007 |
| Gentoo | Biscoe | 49.0 | 16.1 | 216 | 5550 | male | 2007 |
| Gentoo | Biscoe | 45.5 | 13.7 | 214 | 4650 | female | 2007 |
| Gentoo | Biscoe | 48.4 | 14.6 | 213 | 5850 | male | 2007 |
| Gentoo | Biscoe | 45.8 | 14.6 | 210 | 4200 | female | 2007 |
| Gentoo | Biscoe | 49.3 | 15.7 | 217 | 5850 | male | 2007 |
| Gentoo | Biscoe | 42.0 | 13.5 | 210 | 4150 | female | 2007 |
| Gentoo | Biscoe | 49.2 | 15.2 | 221 | 6300 | male | 2007 |
| Gentoo | Biscoe | 46.2 | 14.5 | 209 | 4800 | female | 2007 |
| Gentoo | Biscoe | 48.7 | 15.1 | 222 | 5350 | male | 2007 |
| Gentoo | Biscoe | 50.2 | 14.3 | 218 | 5700 | male | 2007 |
| Gentoo | Biscoe | 45.1 | 14.5 | 215 | 5000 | female | 2007 |
| Gentoo | Biscoe | 46.5 | 14.5 | 213 | 4400 | female | 2007 |
| Gentoo | Biscoe | 46.3 | 15.8 | 215 | 5050 | male | 2007 |
| Gentoo | Biscoe | 42.9 | 13.1 | 215 | 5000 | female | 2007 |
| Gentoo | Biscoe | 46.1 | 15.1 | 215 | 5100 | male | 2007 |
| Gentoo | Biscoe | 47.8 | 15.0 | 215 | 5650 | male | 2007 |
| Gentoo | Biscoe | 48.2 | 14.3 | 210 | 4600 | female | 2007 |
| Gentoo | Biscoe | 50.0 | 15.3 | 220 | 5550 | male | 2007 |
| Gentoo | Biscoe | 47.3 | 15.3 | 222 | 5250 | male | 2007 |
| Gentoo | Biscoe | 42.8 | 14.2 | 209 | 4700 | female | 2007 |
| Gentoo | Biscoe | 45.1 | 14.5 | 207 | 5050 | female | 2007 |
| Gentoo | Biscoe | 59.6 | 17.0 | 230 | 6050 | male | 2007 |
| Gentoo | Biscoe | 49.1 | 14.8 | 220 | 5150 | female | 2008 |
| Gentoo | Biscoe | 48.4 | 16.3 | 220 | 5400 | male | 2008 |
| Gentoo | Biscoe | 42.6 | 13.7 | 213 | 4950 | female | 2008 |
| Gentoo | Biscoe | 44.4 | 17.3 | 219 | 5250 | male | 2008 |
| Gentoo | Biscoe | 44.0 | 13.6 | 208 | 4350 | female | 2008 |
| Gentoo | Biscoe | 48.7 | 15.7 | 208 | 5350 | male | 2008 |
| Gentoo | Biscoe | 42.7 | 13.7 | 208 | 3950 | female | 2008 |
| Gentoo | Biscoe | 49.6 | 16.0 | 225 | 5700 | male | 2008 |
| Gentoo | Biscoe | 45.3 | 13.7 | 210 | 4300 | female | 2008 |
| Gentoo | Biscoe | 49.6 | 15.0 | 216 | 4750 | male | 2008 |
| Gentoo | Biscoe | 50.5 | 15.9 | 222 | 5550 | male | 2008 |
| Gentoo | Biscoe | 43.6 | 13.9 | 217 | 4900 | female | 2008 |
| Gentoo | Biscoe | 45.5 | 13.9 | 210 | 4200 | female | 2008 |
| Gentoo | Biscoe | 50.5 | 15.9 | 225 | 5400 | male | 2008 |
| Gentoo | Biscoe | 44.9 | 13.3 | 213 | 5100 | female | 2008 |
| Gentoo | Biscoe | 45.2 | 15.8 | 215 | 5300 | male | 2008 |
| Gentoo | Biscoe | 46.6 | 14.2 | 210 | 4850 | female | 2008 |
| Gentoo | Biscoe | 48.5 | 14.1 | 220 | 5300 | male | 2008 |
| Gentoo | Biscoe | 45.1 | 14.4 | 210 | 4400 | female | 2008 |
| Gentoo | Biscoe | 50.1 | 15.0 | 225 | 5000 | male | 2008 |
| Gentoo | Biscoe | 46.5 | 14.4 | 217 | 4900 | female | 2008 |
| Gentoo | Biscoe | 45.0 | 15.4 | 220 | 5050 | male | 2008 |
| Gentoo | Biscoe | 43.8 | 13.9 | 208 | 4300 | female | 2008 |
| Gentoo | Biscoe | 45.5 | 15.0 | 220 | 5000 | male | 2008 |
| Gentoo | Biscoe | 43.2 | 14.5 | 208 | 4450 | female | 2008 |
| Gentoo | Biscoe | 50.4 | 15.3 | 224 | 5550 | male | 2008 |
| Gentoo | Biscoe | 45.3 | 13.8 | 208 | 4200 | female | 2008 |
| Gentoo | Biscoe | 46.2 | 14.9 | 221 | 5300 | male | 2008 |
| Gentoo | Biscoe | 45.7 | 13.9 | 214 | 4400 | female | 2008 |
| Gentoo | Biscoe | 54.3 | 15.7 | 231 | 5650 | male | 2008 |
| Gentoo | Biscoe | 45.8 | 14.2 | 219 | 4700 | female | 2008 |
| Gentoo | Biscoe | 49.8 | 16.8 | 230 | 5700 | male | 2008 |
| Gentoo | Biscoe | 49.5 | 16.2 | 229 | 5800 | male | 2008 |
| Gentoo | Biscoe | 43.5 | 14.2 | 220 | 4700 | female | 2008 |
| Gentoo | Biscoe | 50.7 | 15.0 | 223 | 5550 | male | 2008 |
| Gentoo | Biscoe | 47.7 | 15.0 | 216 | 4750 | female | 2008 |
| Gentoo | Biscoe | 46.4 | 15.6 | 221 | 5000 | male | 2008 |
| Gentoo | Biscoe | 48.2 | 15.6 | 221 | 5100 | male | 2008 |
| Gentoo | Biscoe | 46.5 | 14.8 | 217 | 5200 | female | 2008 |
| Gentoo | Biscoe | 46.4 | 15.0 | 216 | 4700 | female | 2008 |
| Gentoo | Biscoe | 48.6 | 16.0 | 230 | 5800 | male | 2008 |
| Gentoo | Biscoe | 47.5 | 14.2 | 209 | 4600 | female | 2008 |
| Gentoo | Biscoe | 51.1 | 16.3 | 220 | 6000 | male | 2008 |
| Gentoo | Biscoe | 45.2 | 13.8 | 215 | 4750 | female | 2008 |
| Gentoo | Biscoe | 45.2 | 16.4 | 223 | 5950 | male | 2008 |
| Gentoo | Biscoe | 49.1 | 14.5 | 212 | 4625 | female | 2009 |
| Gentoo | Biscoe | 52.5 | 15.6 | 221 | 5450 | male | 2009 |
| Gentoo | Biscoe | 47.4 | 14.6 | 212 | 4725 | female | 2009 |
| Gentoo | Biscoe | 50.0 | 15.9 | 224 | 5350 | male | 2009 |
| Gentoo | Biscoe | 44.9 | 13.8 | 212 | 4750 | female | 2009 |
| Gentoo | Biscoe | 50.8 | 17.3 | 228 | 5600 | male | 2009 |
| Gentoo | Biscoe | 43.4 | 14.4 | 218 | 4600 | female | 2009 |
| Gentoo | Biscoe | 51.3 | 14.2 | 218 | 5300 | male | 2009 |
| Gentoo | Biscoe | 47.5 | 14.0 | 212 | 4875 | female | 2009 |
| Gentoo | Biscoe | 52.1 | 17.0 | 230 | 5550 | male | 2009 |
| Gentoo | Biscoe | 47.5 | 15.0 | 218 | 4950 | female | 2009 |
| Gentoo | Biscoe | 52.2 | 17.1 | 228 | 5400 | male | 2009 |
| Gentoo | Biscoe | 45.5 | 14.5 | 212 | 4750 | female | 2009 |
| Gentoo | Biscoe | 49.5 | 16.1 | 224 | 5650 | male | 2009 |
| Gentoo | Biscoe | 44.5 | 14.7 | 214 | 4850 | female | 2009 |
| Gentoo | Biscoe | 50.8 | 15.7 | 226 | 5200 | male | 2009 |
| Gentoo | Biscoe | 49.4 | 15.8 | 216 | 4925 | male | 2009 |
| Gentoo | Biscoe | 46.9 | 14.6 | 222 | 4875 | female | 2009 |
| Gentoo | Biscoe | 48.4 | 14.4 | 203 | 4625 | female | 2009 |
| Gentoo | Biscoe | 51.1 | 16.5 | 225 | 5250 | male | 2009 |
| Gentoo | Biscoe | 48.5 | 15.0 | 219 | 4850 | female | 2009 |
| Gentoo | Biscoe | 55.9 | 17.0 | 228 | 5600 | male | 2009 |
| Gentoo | Biscoe | 47.2 | 15.5 | 215 | 4975 | female | 2009 |
| Gentoo | Biscoe | 49.1 | 15.0 | 228 | 5500 | male | 2009 |
| Gentoo | Biscoe | 46.8 | 16.1 | 215 | 5500 | male | 2009 |
| Gentoo | Biscoe | 41.7 | 14.7 | 210 | 4700 | female | 2009 |
| Gentoo | Biscoe | 53.4 | 15.8 | 219 | 5500 | male | 2009 |
| Gentoo | Biscoe | 43.3 | 14.0 | 208 | 4575 | female | 2009 |
| Gentoo | Biscoe | 48.1 | 15.1 | 209 | 5500 | male | 2009 |
| Gentoo | Biscoe | 50.5 | 15.2 | 216 | 5000 | female | 2009 |
| Gentoo | Biscoe | 49.8 | 15.9 | 229 | 5950 | male | 2009 |
| Gentoo | Biscoe | 43.5 | 15.2 | 213 | 4650 | female | 2009 |
| Gentoo | Biscoe | 51.5 | 16.3 | 230 | 5500 | male | 2009 |
| Gentoo | Biscoe | 46.2 | 14.1 | 217 | 4375 | female | 2009 |
| Gentoo | Biscoe | 55.1 | 16.0 | 230 | 5850 | male | 2009 |
| Gentoo | Biscoe | 48.8 | 16.2 | 222 | 6000 | male | 2009 |
| Gentoo | Biscoe | 47.2 | 13.7 | 214 | 4925 | female | 2009 |
| Gentoo | Biscoe | 46.8 | 14.3 | 215 | 4850 | female | 2009 |
| Gentoo | Biscoe | 50.4 | 15.7 | 222 | 5750 | male | 2009 |
| Gentoo | Biscoe | 45.2 | 14.8 | 212 | 5200 | female | 2009 |
| Gentoo | Biscoe | 49.9 | 16.1 | 213 | 5400 | male | 2009 |
| Chinstrap | Dream | 46.5 | 17.9 | 192 | 3500 | female | 2007 |
| Chinstrap | Dream | 50.0 | 19.5 | 196 | 3900 | male | 2007 |
| Chinstrap | Dream | 51.3 | 19.2 | 193 | 3650 | male | 2007 |
| Chinstrap | Dream | 45.4 | 18.7 | 188 | 3525 | female | 2007 |
| Chinstrap | Dream | 52.7 | 19.8 | 197 | 3725 | male | 2007 |
| Chinstrap | Dream | 45.2 | 17.8 | 198 | 3950 | female | 2007 |
| Chinstrap | Dream | 46.1 | 18.2 | 178 | 3250 | female | 2007 |
| Chinstrap | Dream | 51.3 | 18.2 | 197 | 3750 | male | 2007 |
| Chinstrap | Dream | 46.0 | 18.9 | 195 | 4150 | female | 2007 |
| Chinstrap | Dream | 51.3 | 19.9 | 198 | 3700 | male | 2007 |
| Chinstrap | Dream | 46.6 | 17.8 | 193 | 3800 | female | 2007 |
| Chinstrap | Dream | 51.7 | 20.3 | 194 | 3775 | male | 2007 |
| Chinstrap | Dream | 47.0 | 17.3 | 185 | 3700 | female | 2007 |
| Chinstrap | Dream | 52.0 | 18.1 | 201 | 4050 | male | 2007 |
| Chinstrap | Dream | 45.9 | 17.1 | 190 | 3575 | female | 2007 |
| Chinstrap | Dream | 50.5 | 19.6 | 201 | 4050 | male | 2007 |
| Chinstrap | Dream | 50.3 | 20.0 | 197 | 3300 | male | 2007 |
| Chinstrap | Dream | 58.0 | 17.8 | 181 | 3700 | female | 2007 |
| Chinstrap | Dream | 46.4 | 18.6 | 190 | 3450 | female | 2007 |
| Chinstrap | Dream | 49.2 | 18.2 | 195 | 4400 | male | 2007 |
| Chinstrap | Dream | 42.4 | 17.3 | 181 | 3600 | female | 2007 |
| Chinstrap | Dream | 48.5 | 17.5 | 191 | 3400 | male | 2007 |
| Chinstrap | Dream | 43.2 | 16.6 | 187 | 2900 | female | 2007 |
| Chinstrap | Dream | 50.6 | 19.4 | 193 | 3800 | male | 2007 |
| Chinstrap | Dream | 46.7 | 17.9 | 195 | 3300 | female | 2007 |
| Chinstrap | Dream | 52.0 | 19.0 | 197 | 4150 | male | 2007 |
| Chinstrap | Dream | 50.5 | 18.4 | 200 | 3400 | female | 2008 |
| Chinstrap | Dream | 49.5 | 19.0 | 200 | 3800 | male | 2008 |
| Chinstrap | Dream | 46.4 | 17.8 | 191 | 3700 | female | 2008 |
| Chinstrap | Dream | 52.8 | 20.0 | 205 | 4550 | male | 2008 |
| Chinstrap | Dream | 40.9 | 16.6 | 187 | 3200 | female | 2008 |
| Chinstrap | Dream | 54.2 | 20.8 | 201 | 4300 | male | 2008 |
| Chinstrap | Dream | 42.5 | 16.7 | 187 | 3350 | female | 2008 |
| Chinstrap | Dream | 51.0 | 18.8 | 203 | 4100 | male | 2008 |
| Chinstrap | Dream | 49.7 | 18.6 | 195 | 3600 | male | 2008 |
| Chinstrap | Dream | 47.5 | 16.8 | 199 | 3900 | female | 2008 |
| Chinstrap | Dream | 47.6 | 18.3 | 195 | 3850 | female | 2008 |
| Chinstrap | Dream | 52.0 | 20.7 | 210 | 4800 | male | 2008 |
| Chinstrap | Dream | 46.9 | 16.6 | 192 | 2700 | female | 2008 |
| Chinstrap | Dream | 53.5 | 19.9 | 205 | 4500 | male | 2008 |
| Chinstrap | Dream | 49.0 | 19.5 | 210 | 3950 | male | 2008 |
| Chinstrap | Dream | 46.2 | 17.5 | 187 | 3650 | female | 2008 |
| Chinstrap | Dream | 50.9 | 19.1 | 196 | 3550 | male | 2008 |
| Chinstrap | Dream | 45.5 | 17.0 | 196 | 3500 | female | 2008 |
| Chinstrap | Dream | 50.9 | 17.9 | 196 | 3675 | female | 2009 |
| Chinstrap | Dream | 50.8 | 18.5 | 201 | 4450 | male | 2009 |
| Chinstrap | Dream | 50.1 | 17.9 | 190 | 3400 | female | 2009 |
| Chinstrap | Dream | 49.0 | 19.6 | 212 | 4300 | male | 2009 |
| Chinstrap | Dream | 51.5 | 18.7 | 187 | 3250 | male | 2009 |
| Chinstrap | Dream | 49.8 | 17.3 | 198 | 3675 | female | 2009 |
| Chinstrap | Dream | 48.1 | 16.4 | 199 | 3325 | female | 2009 |
| Chinstrap | Dream | 51.4 | 19.0 | 201 | 3950 | male | 2009 |
| Chinstrap | Dream | 45.7 | 17.3 | 193 | 3600 | female | 2009 |
| Chinstrap | Dream | 50.7 | 19.7 | 203 | 4050 | male | 2009 |
| Chinstrap | Dream | 42.5 | 17.3 | 187 | 3350 | female | 2009 |
| Chinstrap | Dream | 52.2 | 18.8 | 197 | 3450 | male | 2009 |
| Chinstrap | Dream | 45.2 | 16.6 | 191 | 3250 | female | 2009 |
| Chinstrap | Dream | 49.3 | 19.9 | 203 | 4050 | male | 2009 |
| Chinstrap | Dream | 50.2 | 18.8 | 202 | 3800 | male | 2009 |
| Chinstrap | Dream | 45.6 | 19.4 | 194 | 3525 | female | 2009 |
| Chinstrap | Dream | 51.9 | 19.5 | 206 | 3950 | male | 2009 |
| Chinstrap | Dream | 46.8 | 16.5 | 189 | 3650 | female | 2009 |
| Chinstrap | Dream | 45.7 | 17.0 | 195 | 3650 | female | 2009 |
| Chinstrap | Dream | 55.8 | 19.8 | 207 | 4000 | male | 2009 |
| Chinstrap | Dream | 43.5 | 18.1 | 202 | 3400 | female | 2009 |
| Chinstrap | Dream | 49.6 | 18.2 | 193 | 3775 | male | 2009 |
| Chinstrap | Dream | 50.8 | 19.0 | 210 | 4100 | male | 2009 |
| Chinstrap | Dream | 50.2 | 18.7 | 198 | 3775 | female | 2009 |
We can try to understand for example if there’s any correlation
between body mass and flipper length of all the penguins, no matter what
species they belong to.
We can perform for istance a simple linear regression model between
those two variables.
reg = lm(df$flipper_length_mm ~ df$body_mass_g) %>% summary()
x_coeff = reg$coefficients[2] %>% round(digits= 4)
intercept = reg$coefficients[1] %>% round(digits = 0)
reg_formula = paste('y = ', intercept, '+', x_coeff, 'x', sep = '')
r_squared = reg$r.squared %>% round(digits = 3)
reg
##
## Call:
## lm(formula = df$flipper_length_mm ~ df$body_mass_g)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.698 -4.983 1.056 5.101 13.933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.370e+02 1.999e+00 68.56 <2e-16 ***
## df$body_mass_g 1.520e-02 4.667e-04 32.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.847 on 331 degrees of freedom
## Multiple R-squared: 0.7621, Adjusted R-squared: 0.7614
## F-statistic: 1060 on 1 and 331 DF, p-value: < 2.2e-16
Just take a look at the graph below
ggplot(df, aes(x=body_mass_g, y=flipper_length_mm))+geom_point(color= 'deepskyblue3')+
geom_smooth(method = 'lm', se=FALSE, color= 'grey')+
annotate('text', x=3200, y=230, label = reg_formula, size= 3.5)+
annotate('text', x=3200, y=226, label = paste('R² =',r_squared), size= 3.5)+
theme_bw()
There’s certainly a correlation between the two variables, but we can
go deeper in our exploration and ask ouselves if there is any kind of
difference among the species.
ggplot(df, aes(x=body_mass_g, y=flipper_length_mm, colour=species))+
geom_point()+
theme_bw()+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
geom_smooth(method = 'lm', se=FALSE, color= 'grey')+
annotate('text', x=3200, y=230, label = reg_formula, size= 3.5)+
annotate('text', x=3200, y=226, label = paste('R² =',r_squared), size= 3.5)+
stat_ellipse()
We can now add more levels of complexity (thus more information) to
our data. We may further group the data acoording to another categorical
variable, e.g., the sex of penguins.
ggplot(df, aes(x=body_mass_g, y=flipper_length_mm, colour=species, shape= sex))+
geom_point()+
theme_bw()+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
annotate('text', x=3200, y=230, label = reg_formula, size= 3.5)+
annotate('text', x=3200, y=226, label = paste('R² =',r_squared), size= 3.5)+
stat_ellipse()
We already collect a lot of information so far: we saw a statistical
correlation between body mass and flipper length and, based on our
original dataset, that penguins of the gentoo species have
greater mass and longer flippers. Males, in particular, have recorded
higher values than females.
Similarly we may explore the other variables looking for further information. Let’s say we want to know if there are differences regarding the size of the beak among the observed samples.
ggplot(df, aes(x=bill_length_mm, y=bill_depth_mm, colour=species, shape= sex))+
geom_point()+
theme_bw()+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))
In this case, for these particular attributes, differences are more
obvious, even if there’s not a clear correlation between the two
variables.
We can keep looking for any kind of correlations. For example
splitting the graphs and separate the samples according to others
categorical variables.
ggplot(df, aes(x=bill_length_mm, y=bill_depth_mm, colour=species, shape= sex))+
geom_point()+
theme_bw()+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
facet_grid(~island)+
stat_ellipse()
We may wonder at this point if there is a tool of some kind with
which we can perform a complete analysis of all the variables and
generate a 2-dimensional plot which summarize the relationship between
each samples and each variable.
Principal component analysis (PCA) is a popular technique for
analyzing large datasets containing a high number of dimensions/features
per observation, increasing the interpretability of data while
preserving the maximum amount of information, and enabling the
visualization of multidimensional data.
Formally, PCA is a statistical technique for reducing the
dimensionality of a dataset. This is accomplished by linearly
transforming the data into a new coordinate system where (most of) the
variation in the data can be described with fewer dimensions than the
initial data. (“Principal Component
Analysis” 2022)
So, we want to perform PCA in order to better understand all
statistical relationships among the variables, in our case all the
penguins characteristics.
First step is scaling data, which consist in subtracting for all
values within each column their mean and then divide by their standard
deviation
\[ X^*_{i,j} = \frac{X_{i,j} - \overline{X_i}}{S_{X_i}} \]
This operation it’s called autoscaling and allows you to have all the data in the same order of magnitude.
df_scaled = df[,3:6] %>% scale() %>% data.frame()
kable(df_scaled) %>% kable_styling(fixed_thead = T, full_width = FALSE) %>%
scroll_box( height = "500px")
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
|---|---|---|---|
| -0.8946955 | 0.7795590 | -1.4246077 | -0.5676206 |
| -0.8215515 | 0.1194043 | -1.0678666 | -0.5055254 |
| -0.6752636 | 0.4240910 | -0.4257325 | -1.1885721 |
| -1.3335592 | 1.0842457 | -0.5684290 | -0.9401915 |
| -0.8581235 | 1.7444004 | -0.7824736 | -0.6918109 |
| -0.9312674 | 0.3225288 | -1.4246077 | -0.7228585 |
| -0.8764095 | 1.2365891 | -0.4257325 | 0.5811398 |
| -0.5289757 | 0.2209665 | -1.3532595 | -1.2506673 |
| -0.9861254 | 2.0490872 | -0.7111254 | -0.5055254 |
| -1.7175649 | 1.9983060 | -0.2116878 | 0.2396164 |
| -1.3518452 | 0.3225288 | -1.1392148 | -0.6297157 |
| -0.9678394 | 0.9319023 | -0.4257325 | -0.9401915 |
| -0.2729719 | 1.7951815 | -0.2830361 | 0.3638067 |
| -1.7541369 | 0.6272156 | -1.2105630 | -1.0954294 |
| 0.3670377 | 2.2014306 | -0.4970807 | -0.0087642 |
| -1.1324133 | 0.5764344 | -1.9240453 | -1.0022867 |
| -1.1506993 | 0.7795590 | -1.4959559 | -0.7539060 |
| -1.4798471 | 1.0334646 | -0.8538219 | -0.5055254 |
| -1.0592694 | 0.4748722 | -1.1392148 | -0.3192400 |
| -0.9495534 | 0.0178420 | -1.4959559 | -0.5055254 |
| -1.5895630 | 0.8811212 | -0.9965183 | -0.5055254 |
| -0.6204057 | 0.7287778 | -1.2819112 | -0.8160012 |
| -0.6386916 | 0.3733099 | -0.9965183 | -1.2506673 |
| -1.1141273 | 0.7287778 | -2.0667417 | -1.3127624 |
| -0.6386916 | 0.8811212 | -1.4959559 | -0.3192400 |
| -0.8215515 | -0.2360636 | -1.6386524 | -1.1885721 |
| -1.2421292 | 0.4748722 | -1.6386524 | -0.3813351 |
| -0.8215515 | 0.3225288 | -0.9251701 | -1.1264770 |
| -0.5655477 | 0.8811212 | -1.2105630 | -0.3813351 |
| -1.3884171 | -0.0837202 | -0.4257325 | -1.0954294 |
| -0.8764095 | 1.9983060 | -0.3543843 | -0.0708593 |
| -0.9495534 | 1.4397136 | -0.7824736 | -0.3192400 |
| -0.3278299 | 0.6779967 | -1.4959559 | -0.8160012 |
| -1.1689853 | 1.0842457 | -1.4246077 | -1.1264770 |
| -0.7666936 | 0.9826835 | -1.2105630 | 0.5500922 |
| -1.3701311 | 0.4240910 | -1.3532595 | -1.3127624 |
| -0.5838337 | 0.6272156 | -0.4257325 | -0.3813351 |
| -1.4615611 | 0.6779967 | -1.0678666 | -1.3748576 |
| 0.0196039 | 1.2873702 | -0.3543843 | 0.2396164 |
| -1.2787012 | -0.1345014 | -1.1392148 | -1.4990479 |
| -0.8032655 | 0.8303401 | -0.7824736 | 0.4879971 |
| -0.5289757 | 0.9319023 | -1.3532595 | -0.9712391 |
| -1.4615611 | 0.3733099 | -0.7824736 | -0.9401915 |
| -0.3095439 | 2.0490872 | -0.7111254 | -0.0708593 |
| -0.8032655 | 0.2717477 | -1.0678666 | -0.8780964 |
| -0.7118356 | 0.8811212 | -0.9251701 | 0.1154261 |
| -1.6444210 | 0.3733099 | -0.7824736 | -0.9401915 |
| -0.3644018 | 1.1858080 | -0.0689914 | -0.1950496 |
| -1.7358509 | 0.4748722 | -0.9965183 | -1.6232382 |
| -0.4741178 | 0.7287778 | -0.7111254 | -0.6297157 |
| -0.9129815 | 0.1701854 | -1.0678666 | -0.8160012 |
| -0.6204057 | 0.8303401 | -0.5684290 | -0.5055254 |
| -1.3701311 | -0.2868448 | -1.4246077 | -1.6853334 |
| -1.1689853 | 0.9826835 | -0.4970807 | -0.5676206 |
| -1.5164190 | -0.1345014 | -1.1392148 | -1.3127624 |
| -0.4924037 | 1.9983060 | -0.4257325 | 0.2396164 |
| -1.1689853 | -0.0837202 | -1.1392148 | -0.7539060 |
| -0.5289757 | 0.5256533 | -0.6397772 | -0.1950496 |
| -1.3884171 | -0.0329391 | -1.2105630 | -1.6853334 |
| -0.4375458 | 0.4240910 | -0.6397772 | -0.3192400 |
| -1.5529910 | -0.4899693 | -0.4257325 | -1.0643818 |
| -0.5289757 | 0.9826835 | -0.9251701 | -0.1329545 |
| -1.4798471 | -0.2868448 | -0.7824736 | -1.4369527 |
| -0.4009738 | 1.1350269 | -0.2116878 | 0.3017116 |
| -1.9187108 | 0.9319023 | -0.7824736 | -0.7539060 |
| -0.7849795 | 0.6272156 | -0.7824736 | -0.3813351 |
| -0.8032655 | 0.0178420 | -0.3543843 | -0.8160012 |
| 0.3304657 | 0.8811212 | -0.2830361 | -0.0708593 |
| -1.5529910 | 0.1701854 | -0.7824736 | -0.6297157 |
| -0.2181139 | 0.6779967 | -0.4257325 | 0.0533310 |
| -0.5655477 | -0.1852825 | -0.7111254 | -0.6297157 |
| -1.2421292 | 1.1350269 | -1.2105630 | -0.3813351 |
| -1.4249891 | -0.5407504 | -0.9965183 | -0.8160012 |
| -0.3461159 | 0.9826835 | -0.4257325 | -0.2571448 |
| -1.7175649 | 0.0178420 | -0.8538219 | -1.2506673 |
| -0.1998280 | 0.2209665 | -0.3543843 | 0.6121874 |
| -1.3335592 | 0.8303401 | -0.9965183 | -0.5055254 |
| -1.6261350 | 1.1350269 | -0.5684290 | -0.0087642 |
| -1.2238432 | 0.3225288 | -0.7111254 | -1.0643818 |
| -0.4924037 | 1.5920570 | -0.4970807 | -0.8160012 |
| -1.4067031 | 1.1858080 | -0.7824736 | -0.5055254 |
| -1.2969872 | 0.7287778 | -0.8538219 | -0.8780964 |
| -1.0409834 | 1.0334646 | -0.8538219 | -0.3192400 |
| -0.9312674 | 0.8303401 | -0.7824736 | -0.7539060 |
| -1.5164190 | 0.4240910 | 0.0737051 | -0.8160012 |
| -0.5289757 | 0.4748722 | 0.2877498 | 0.1154261 |
| -1.8272808 | -0.0329391 | -1.1392148 | -1.0022867 |
| -0.8032655 | 0.4748722 | -1.0678666 | 0.3017116 |
| -1.4249891 | 0.0686231 | -0.9965183 | -1.1264770 |
| -0.5838337 | 0.8811212 | 0.5017944 | 0.1154261 |
| -1.0775553 | 0.7287778 | -0.7824736 | -0.6297157 |
| -0.6752636 | 0.6779967 | -0.3543843 | 0.1775213 |
| -1.9918547 | -0.5407504 | -1.6386524 | -1.6232382 |
| -0.1449700 | 0.6779967 | -0.6397772 | -0.1329545 |
| -1.6444210 | 0.3733099 | -0.6397772 | -0.5986682 |
| -0.5472617 | 1.4397136 | 0.1450533 | 0.6432349 |
| -1.1506993 | -0.5915315 | -1.2819112 | -1.4059052 |
| -1.1324133 | 1.4397136 | -0.7824736 | 0.0533310 |
| -1.1141273 | 0.7287778 | -0.5684290 | -1.5921906 |
| -0.7849795 | 0.8811212 | -1.2105630 | -0.8160012 |
| -0.9861254 | 0.0178420 | -0.1403396 | -0.5676206 |
| -1.0592694 | 1.4397136 | -0.7824736 | -0.3813351 |
| -1.0775553 | -0.0837202 | -1.4246077 | -1.2817149 |
| -0.1449700 | 0.9319023 | -0.2830361 | 0.7053301 |
| -1.0775553 | -0.3376259 | -0.2116878 | -0.4744778 |
| 0.2938937 | 1.5920570 | -0.7111254 | 0.4879971 |
| -0.7849795 | 0.2717477 | -0.5684290 | -1.2506673 |
| -0.3278299 | 1.1858080 | -0.2830361 | 0.0843786 |
| -0.8032655 | 1.7951815 | -0.7111254 | -0.3813351 |
| -0.2363999 | 0.5764344 | -0.3543843 | -0.1640021 |
| -0.9861254 | -0.0837202 | -0.9251701 | -1.6232382 |
| -1.2238432 | 1.6936193 | -0.1403396 | -0.5365730 |
| -1.5164190 | -0.0837202 | -0.8538219 | -1.0643818 |
| -0.5289757 | 0.7287778 | -0.8538219 | -1.0954294 |
| -1.4249891 | 0.0178420 | -0.9965183 | -1.3127624 |
| -1.1506993 | 1.3381514 | -0.2116878 | -0.8780964 |
| -0.6935496 | -0.0837202 | -1.7813488 | -0.9401915 |
| -0.4741178 | 0.6779967 | 0.0737051 | -0.4123827 |
| -1.6078490 | -0.6423127 | -1.0678666 | -1.4369527 |
| -0.6204057 | 0.9319023 | -0.1403396 | -0.2571448 |
| -0.9495534 | 0.2209665 | -0.7111254 | -1.1575245 |
| -0.4558318 | 0.5764344 | -0.4257325 | 0.1154261 |
| -0.9129815 | -0.0329391 | -0.7111254 | -1.4369527 |
| 0.0196039 | 0.4240910 | 0.6444909 | -0.2571448 |
| -1.0044114 | 0.3733099 | -0.7824736 | -1.0954294 |
| -0.1632560 | 1.0334646 | -0.2830361 | -0.8780964 |
| -1.3152732 | 0.6779967 | -0.5684290 | -0.8780964 |
| -1.1872713 | 0.6779967 | -0.1403396 | 0.3327592 |
| -1.0775553 | 0.2209665 | -0.9965183 | -0.9712391 |
| -0.5289757 | 0.1701854 | -0.7824736 | -0.3813351 |
| -1.5347050 | 0.1701854 | -0.7111254 | -1.2817149 |
| -0.6935496 | 1.4904948 | -0.0689914 | -0.2881924 |
| -1.2787012 | -0.3376259 | -1.1392148 | -1.0022867 |
| -0.7849795 | 0.3733099 | -0.5684290 | 0.0533310 |
| -0.6935496 | -0.0329391 | -0.5684290 | -1.0022867 |
| -0.6204057 | 0.0178420 | -0.9965183 | -0.9091439 |
| -2.1747146 | -0.8454372 | -0.9251701 | -1.4369527 |
| -0.6021197 | -0.0837202 | -0.7824736 | -0.5986682 |
| -1.2238432 | -0.1852825 | -0.6397772 | -1.4990479 |
| -0.9129815 | 0.7795590 | -1.1392148 | -0.6918109 |
| -0.8764095 | 0.7287778 | -0.7824736 | 0.0533310 |
| -1.3518452 | 0.6272156 | -1.2105630 | -0.9091439 |
| -1.4615611 | 0.3225288 | -0.4257325 | -0.9401915 |
| -1.1324133 | 0.4748722 | -0.5684290 | -0.5676206 |
| -1.4615611 | -0.0329391 | -0.9965183 | -0.6297157 |
| -0.4558318 | 0.6779967 | 0.0023568 | -0.2571448 |
| 0.3853236 | -2.0134031 | 0.7158391 | 0.3638067 |
| 1.0984771 | -0.4391881 | 2.0714554 | 1.8540905 |
| 0.8607593 | -1.5563730 | 0.6444909 | 0.3017116 |
| 1.0984771 | -0.9977806 | 1.2152767 | 1.8540905 |
| 0.6596135 | -1.3532485 | 1.0012320 | 1.4815195 |
| 0.4584676 | -1.8610598 | 0.6444909 | 0.4259019 |
| 0.2573217 | -1.3024673 | 0.7158391 | 0.7363777 |
| 0.4950396 | -0.9469994 | 1.2866249 | 1.2331389 |
| -0.1266840 | -1.9118409 | 0.5731427 | 0.2396164 |
| 0.5133256 | -0.8962183 | 1.0012320 | 1.1710438 |
| -0.5655477 | -1.7594975 | 0.9298838 | 0.5500922 |
| 0.9156173 | -0.5407504 | 1.0725802 | 1.6678050 |
| 0.2756077 | -1.7594975 | 0.9298838 | 0.5500922 |
| 0.8059014 | -1.3024673 | 0.8585356 | 2.0403759 |
| 0.3304657 | -1.3024673 | 0.6444909 | -0.0087642 |
| 0.9704752 | -0.7438749 | 1.1439285 | 2.0403759 |
| -0.3644018 | -1.8610598 | 0.6444909 | -0.0708593 |
| 0.9521892 | -0.9977806 | 1.4293214 | 2.5992323 |
| 0.4036096 | -1.3532485 | 0.5731427 | 0.7363777 |
| 0.8607593 | -1.0485617 | 1.5006696 | 1.4194244 |
| 1.1350491 | -1.4548107 | 1.2152767 | 1.8540905 |
| 0.2024638 | -1.3532485 | 1.0012320 | 0.9847583 |
| 0.4584676 | -1.3532485 | 0.8585356 | 0.2396164 |
| 0.4218956 | -0.6930938 | 1.0012320 | 1.0468535 |
| -0.1998280 | -2.0641843 | 1.0012320 | 0.9847583 |
| 0.3853236 | -1.0485617 | 1.0012320 | 1.1089486 |
| 0.6961854 | -1.0993428 | 1.0012320 | 1.7919953 |
| 0.7693294 | -1.4548107 | 0.6444909 | 0.4879971 |
| 1.0984771 | -0.9469994 | 1.3579732 | 1.6678050 |
| 0.6047555 | -0.9469994 | 1.5006696 | 1.2952341 |
| -0.2181139 | -1.5055919 | 0.5731427 | 0.6121874 |
| 0.2024638 | -1.3532485 | 0.4304462 | 1.0468535 |
| 2.8539319 | -0.0837202 | 2.0714554 | 2.2887566 |
| 0.9339033 | -1.2009051 | 1.3579732 | 1.1710438 |
| 0.8059014 | -0.4391881 | 1.3579732 | 1.4815195 |
| -0.2546859 | -1.7594975 | 0.8585356 | 0.9226631 |
| 0.0744619 | 0.0686231 | 1.2866249 | 1.2952341 |
| 0.0013179 | -1.8102786 | 0.5017944 | 0.1775213 |
| 0.8607593 | -0.7438749 | 0.5017944 | 1.4194244 |
| -0.2363999 | -1.7594975 | 0.5017944 | -0.3192400 |
| 1.0253332 | -0.5915315 | 1.7147143 | 1.8540905 |
| 0.2390357 | -1.7594975 | 0.6444909 | 0.1154261 |
| 1.0253332 | -1.0993428 | 1.0725802 | 0.6742825 |
| 1.1899071 | -0.6423127 | 1.5006696 | 1.6678050 |
| -0.0718260 | -1.6579352 | 1.1439285 | 0.8605680 |
| 0.2756077 | -1.6579352 | 0.6444909 | -0.0087642 |
| 1.1899071 | -0.6423127 | 1.7147143 | 1.4815195 |
| 0.1658918 | -1.9626220 | 0.8585356 | 1.1089486 |
| 0.2207498 | -0.6930938 | 1.0012320 | 1.3573292 |
| 0.4767536 | -1.5055919 | 0.6444909 | 0.7984728 |
| 0.8241873 | -1.5563730 | 1.3579732 | 1.3573292 |
| 0.2024638 | -1.4040296 | 0.6444909 | 0.2396164 |
| 1.1167631 | -1.0993428 | 1.7147143 | 0.9847583 |
| 0.4584676 | -1.4040296 | 1.1439285 | 0.8605680 |
| 0.1841778 | -0.8962183 | 1.3579732 | 1.0468535 |
| -0.0352541 | -1.6579352 | 0.5017944 | 0.1154261 |
| 0.2756077 | -1.0993428 | 1.3579732 | 0.9847583 |
| -0.1449700 | -1.3532485 | 0.5017944 | 0.3017116 |
| 1.1716211 | -0.9469994 | 1.6433661 | 1.6678050 |
| 0.2390357 | -1.7087164 | 0.5017944 | -0.0087642 |
| 0.4036096 | -1.1501239 | 1.4293214 | 1.3573292 |
| 0.3121797 | -1.6579352 | 0.9298838 | 0.2396164 |
| 1.8847746 | -0.7438749 | 2.1428037 | 1.7919953 |
| 0.3304657 | -1.5055919 | 1.2866249 | 0.6121874 |
| 1.0619052 | -0.1852825 | 2.0714554 | 1.8540905 |
| 1.0070472 | -0.4899693 | 2.0001072 | 1.9782808 |
| -0.0901120 | -1.5055919 | 1.3579732 | 0.6121874 |
| 1.2264791 | -1.0993428 | 1.5720178 | 1.6678050 |
| 0.6778994 | -1.0993428 | 1.0725802 | 0.6742825 |
| 0.4401816 | -0.7946560 | 1.4293214 | 0.9847583 |
| 0.7693294 | -0.7946560 | 1.4293214 | 1.1089486 |
| 0.4584676 | -1.2009051 | 1.1439285 | 1.2331389 |
| 0.4401816 | -1.0993428 | 1.0725802 | 0.6121874 |
| 0.8424733 | -0.5915315 | 2.0714554 | 1.9782808 |
| 0.6413275 | -1.5055919 | 0.5731427 | 0.4879971 |
| 1.2996230 | -0.4391881 | 1.3579732 | 2.2266614 |
| 0.2207498 | -1.7087164 | 1.0012320 | 0.6742825 |
| 0.2207498 | -0.3884070 | 1.5720178 | 2.1645662 |
| 0.9339033 | -1.3532485 | 0.7871873 | 0.5190446 |
| 1.5556268 | -0.7946560 | 1.4293214 | 1.5436147 |
| 0.6230415 | -1.3024673 | 0.7871873 | 0.6432349 |
| 1.0984771 | -0.6423127 | 1.6433661 | 1.4194244 |
| 0.1658918 | -1.7087164 | 0.7871873 | 0.6742825 |
| 1.2447650 | 0.0686231 | 1.9287590 | 1.7299002 |
| -0.1083980 | -1.4040296 | 1.2152767 | 0.4879971 |
| 1.3361950 | -1.5055919 | 1.2152767 | 1.3573292 |
| 0.6413275 | -1.6071541 | 0.7871873 | 0.8295204 |
| 1.4824829 | -0.0837202 | 2.0714554 | 1.6678050 |
| 0.6413275 | -1.0993428 | 1.2152767 | 0.9226631 |
| 1.5007689 | -0.0329391 | 1.9287590 | 1.4815195 |
| 0.2756077 | -1.3532485 | 0.7871873 | 0.6742825 |
| 1.0070472 | -0.5407504 | 1.6433661 | 1.7919953 |
| 0.0927478 | -1.2516862 | 0.9298838 | 0.7984728 |
| 1.2447650 | -0.7438749 | 1.7860625 | 1.2331389 |
| 0.9887612 | -0.6930938 | 1.0725802 | 0.8916156 |
| 0.5316115 | -1.3024673 | 1.5006696 | 0.8295204 |
| 0.8059014 | -1.4040296 | 0.1450533 | 0.5190446 |
| 1.2996230 | -0.3376259 | 1.7147143 | 1.2952341 |
| 0.8241873 | -1.0993428 | 1.2866249 | 0.7984728 |
| 2.1773504 | -0.0837202 | 1.9287590 | 1.7299002 |
| 0.5864695 | -0.8454372 | 1.0012320 | 0.9537107 |
| 0.9339033 | -1.0993428 | 1.9287590 | 1.6057098 |
| 0.5133256 | -0.5407504 | 1.0012320 | 1.6057098 |
| -0.4192598 | -1.2516862 | 0.6444909 | 0.6121874 |
| 1.7202007 | -0.6930938 | 1.2866249 | 1.6057098 |
| -0.1266840 | -1.6071541 | 0.5017944 | 0.4569495 |
| 0.7510434 | -1.0485617 | 0.5731427 | 1.6057098 |
| 1.1899071 | -0.9977806 | 1.0725802 | 0.9847583 |
| 1.0619052 | -0.6423127 | 2.0001072 | 2.1645662 |
| -0.0901120 | -0.9977806 | 0.8585356 | 0.5500922 |
| 1.3727670 | -0.4391881 | 2.0714554 | 1.6057098 |
| 0.4036096 | -1.5563730 | 1.1439285 | 0.2085689 |
| 2.0310625 | -0.5915315 | 2.0714554 | 2.0403759 |
| 0.8790453 | -0.4899693 | 1.5006696 | 2.2266614 |
| 0.5864695 | -1.7594975 | 0.9298838 | 0.8916156 |
| 0.5133256 | -1.4548107 | 1.0012320 | 0.7984728 |
| 1.1716211 | -0.7438749 | 1.5006696 | 1.9161856 |
| 0.2207498 | -1.2009051 | 0.7871873 | 1.2331389 |
| 1.0801912 | -0.5407504 | 0.8585356 | 1.4815195 |
| 0.4584676 | 0.3733099 | -0.6397772 | -0.8780964 |
| 1.0984771 | 1.1858080 | -0.3543843 | -0.3813351 |
| 1.3361950 | 1.0334646 | -0.5684290 | -0.6918109 |
| 0.2573217 | 0.7795590 | -0.9251701 | -0.8470488 |
| 1.5921988 | 1.3381514 | -0.2830361 | -0.5986682 |
| 0.2207498 | 0.3225288 | -0.2116878 | -0.3192400 |
| 0.3853236 | 0.5256533 | -1.6386524 | -1.1885721 |
| 1.3361950 | 0.5256533 | -0.2830361 | -0.5676206 |
| 0.3670377 | 0.8811212 | -0.4257325 | -0.0708593 |
| 1.3361950 | 1.3889325 | -0.2116878 | -0.6297157 |
| 0.4767536 | 0.3225288 | -0.5684290 | -0.5055254 |
| 1.4093389 | 1.5920570 | -0.4970807 | -0.5365730 |
| 0.5498975 | 0.0686231 | -1.1392148 | -0.6297157 |
| 1.4641969 | 0.4748722 | 0.0023568 | -0.1950496 |
| 0.3487517 | -0.0329391 | -0.7824736 | -0.7849536 |
| 1.1899071 | 1.2365891 | 0.0023568 | -0.1950496 |
| 1.1533351 | 1.4397136 | -0.2830361 | -1.1264770 |
| 2.5613561 | 0.3225288 | -1.4246077 | -0.6297157 |
| 0.4401816 | 0.7287778 | -0.7824736 | -0.9401915 |
| 0.9521892 | 0.5256533 | -0.4257325 | 0.2396164 |
| -0.2912579 | 0.0686231 | -1.4246077 | -0.7539060 |
| 0.8241873 | 0.1701854 | -0.7111254 | -1.0022867 |
| -0.1449700 | -0.2868448 | -0.9965183 | -1.6232382 |
| 1.2081931 | 1.1350269 | -0.5684290 | -0.5055254 |
| 0.4950396 | 0.3733099 | -0.4257325 | -1.1264770 |
| 1.4641969 | 0.9319023 | -0.2830361 | -0.0708593 |
| 1.1899071 | 0.6272156 | -0.0689914 | -1.0022867 |
| 1.0070472 | 0.9319023 | -0.0689914 | -0.5055254 |
| 0.4401816 | 0.3225288 | -0.7111254 | -0.6297157 |
| 1.6104848 | 1.4397136 | 0.2877498 | 0.4259019 |
| -0.5655477 | -0.2868448 | -0.9965183 | -1.2506673 |
| 1.8664886 | 1.8459627 | 0.0023568 | 0.1154261 |
| -0.2729719 | -0.2360636 | -0.9965183 | -1.0643818 |
| 1.2813370 | 0.8303401 | 0.1450533 | -0.1329545 |
| 1.0436192 | 0.7287778 | -0.4257325 | -0.7539060 |
| 0.6413275 | -0.1852825 | -0.1403396 | -0.3813351 |
| 0.6596135 | 0.5764344 | -0.4257325 | -0.4434303 |
| 1.4641969 | 1.7951815 | 0.6444909 | 0.7363777 |
| 0.5316115 | -0.2868448 | -0.6397772 | -1.8716188 |
| 1.7384867 | 1.3889325 | 0.2877498 | 0.3638067 |
| 0.9156173 | 1.1858080 | 0.6444909 | -0.3192400 |
| 0.4036096 | 0.1701854 | -0.9965183 | -0.6918109 |
| 1.2630510 | 0.9826835 | -0.3543843 | -0.8160012 |
| 0.2756077 | -0.0837202 | -0.3543843 | -0.8780964 |
| 1.2630510 | 0.3733099 | -0.3543843 | -0.6607633 |
| 1.2447650 | 0.6779967 | 0.0023568 | 0.3017116 |
| 1.1167631 | 0.3733099 | -0.7824736 | -1.0022867 |
| 0.9156173 | 1.2365891 | 0.7871873 | 0.1154261 |
| 1.3727670 | 0.7795590 | -0.9965183 | -1.1885721 |
| 1.0619052 | 0.0686231 | -0.2116878 | -0.6607633 |
| 0.7510434 | -0.3884070 | -0.1403396 | -1.0954294 |
| 1.3544810 | 0.9319023 | 0.0023568 | -0.3192400 |
| 0.3121797 | 0.0686231 | -0.5684290 | -0.7539060 |
| 1.2264791 | 1.2873702 | 0.1450533 | -0.1950496 |
| -0.2729719 | 0.0686231 | -0.9965183 | -1.0643818 |
| 1.5007689 | 0.8303401 | -0.2830361 | -0.9401915 |
| 0.2207498 | -0.2868448 | -0.7111254 | -1.1885721 |
| 0.9704752 | 1.3889325 | 0.1450533 | -0.1950496 |
| 1.1350491 | 0.8303401 | 0.0737051 | -0.5055254 |
| 0.2938937 | 1.1350269 | -0.4970807 | -0.8470488 |
| 1.4459109 | 1.1858080 | 0.3590980 | -0.3192400 |
| 0.5133256 | -0.3376259 | -0.8538219 | -0.6918109 |
| 0.3121797 | -0.0837202 | -0.4257325 | -0.6918109 |
| 2.1590644 | 1.3381514 | 0.4304462 | -0.2571448 |
| -0.0901120 | 0.4748722 | 0.0737051 | -1.0022867 |
| 1.0253332 | 0.5256533 | -0.5684290 | -0.5365730 |
| 1.2447650 | 0.9319023 | 0.6444909 | -0.1329545 |
| 1.1350491 | 0.7795590 | -0.2116878 | -0.5365730 |
Now we can generate the correlation matrix which tell us the degree
of correlation between each pair of variables.
df_cor = cor(df_scaled)
kable(df_cor) %>% kable_styling(full_width = FALSE)
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
|---|---|---|---|---|
| bill_length_mm | 1.0000000 | -0.2286256 | 0.6530956 | 0.5894511 |
| bill_depth_mm | -0.2286256 | 1.0000000 | -0.5777917 | -0.4720157 |
| flipper_length_mm | 0.6530956 | -0.5777917 | 1.0000000 | 0.8729789 |
| body_mass_g | 0.5894511 | -0.4720157 | 0.8729789 | 1.0000000 |
library(ggcorrplot)
ggcorrplot(df_cor,
type = 'full',
method = 'square',
lab = TRUE,
lab_size = 5,
colors = c("tomato2", "white", "springgreen3"),
title="Correlogram",
ggtheme=theme_bw)
General formula for correlation between two variables \(X\) and \(Y\) is
\[ corr(X,Y) = \frac{cov(X,Y)}{\sigma_x \sigma_y} \]
We can now calculate the eigenvalues
and eigenvectors of this matrix.
Between a generic matrix \({\bf C}\), a scalar \(\lambda\) (eigenvalue) and a vector \({\bf v}\) (eigenvector) exists the following relationship
\[ {\bf Cv} = {\bf \lambda v} \]
For our correlation matrix eigenvalues are
eig = eigen(df_cor)
eig$values
## [1] 2.7453557 0.7781172 0.3686425 0.1078846
You may notice that the sum of these eigenvalues is equal to the
number of variables of our original data matrix.
Each of them rapresents the amount of variance explained by every
principal component.
We can also express them as a percentage of total variance and
put them in a graph also known as scree
plot.
values_prc = round((eig$values/sum(eig$values)), digits = 4)
pc = prcomp(df_scaled)
ggscreeplot(pc)+
geom_col(fill = 'cornflowerblue') +
geom_line()+
theme_bw()+
geom_text(label=(paste0(values_prc*100,'%')), nudge_y = 0.05, nudge_x = 0.15)
pc = pca(df_scaled)
Looking at this chart it’s clear that most of the total variance is
explained by the first principal component (the 68.63%).
We can therefore represent our original data using a 2-dimensional graph with PC1 and PC2 as x and Y axis.
Loadings are the eigenvectors of the correlation matrix and they are normalized, which means that for a \(p\) loading, for \(m\) variables and \(a\) principal component:
\[ \sum_{i=1}^m p_{a,j}^2 = 1 \]
For our dataset loading vectors will be
load1 = eig$vectors[,1]
load2 = eig$vectors[,2]
load_df = data.frame(load1,load2, row.names = colnames(df[3:6]))
kable(load_df) %>% kable_styling(full_width = FALSE)
| load1 | load2 | |
|---|---|---|
| bill_length_mm | 0.4537532 | -0.6001949 |
| bill_depth_mm | -0.3990472 | -0.7961695 |
| flipper_length_mm | 0.5768250 | -0.0057882 |
| body_mass_g | 0.5496747 | -0.0764637 |
We can project the original observation values (objects) onto the
principal components and call them scores. To do this properly
we have to multiply each loading vectors component for every value in
the original data matrix.
The loadings can be understood then as the weights for each original
variable when calculating the principal component.
In our dataset, the first two score vectors are
score_pc1 = as.matrix(df_scaled) %*% load1
score_pc2 = as.matrix(df_scaled) %*% load2
score = data.frame(score_pc1, score_pc2)
kable(score) %>% kable_styling(full_width = FALSE, fixed_thead = TRUE) %>% scroll_box(height = '500px')
| score_pc1 | score_pc2 |
|---|---|
| -1.8508078 | -0.0320212 |
| -1.3142762 | 0.4428603 |
| -1.3745366 | 0.1609882 |
| -1.8824555 | 0.0123327 |
| -1.9170957 | -0.8163696 |
| -1.7703561 | 0.3656727 |
| -0.8172664 | -0.5004899 |
| -1.7962546 | 0.2450252 |
| -1.9532096 | -0.9967828 |
| -1.5671647 | -0.5772133 |
| -1.7453746 | 0.6093273 |
| -1.5734059 | -0.0867052 |
| -0.8035110 | -1.2916122 |
| -2.3466466 | 0.6442216 |
| -1.0034763 | -1.9694587 |
| -2.4046297 | 0.3085044 |
| -2.1105221 | 0.1362880 |
| -1.8542668 | 0.1089801 |
| -1.5027490 | 0.2886935 |
| -1.5787620 | 0.6030250 |
| -1.9255694 | 0.2969481 |
| -1.7603015 | -0.1380520 |
| -1.7010535 | 0.1875201 |
| -2.7100962 | 0.2008041 |
| -1.6798002 | -0.2851133 |
| -1.8771248 | 0.7814051 |
| -1.9079424 | 0.4060840 |
| -1.6543430 | 0.3277930 |
| -1.5161213 | -0.3259178 |
| -1.4442933 | 0.9862011 |
| -1.4384594 | -1.0575044 |
| -1.6322051 | -0.5473996 |
| -1.7307465 | -0.2719852 |
| -2.4040413 | -0.0672440 |
| -1.1359380 | -0.3572722 |
| -2.2931199 | 0.5929089 |
| -0.9703884 | -0.1173334 |
| -2.3054372 | 0.4487289 |
| -0.5775328 | -1.0530018 |
| -2.0076586 | 0.9957725 |
| -0.8789399 | -0.2117605 |
| -1.9263569 | -0.3423663 |
| -1.7803061 | 0.6564231 |
| -1.4072836 | -1.4360998 |
| -1.5715639 | 0.3390821 |
| -1.1448211 | -0.2777526 |
| -1.8632794 | 0.7661747 |
| -0.7855517 | -0.7100785 |
| -2.4442139 | 0.7936569 |
| -1.2622829 | -0.2434011 |
| -1.5466876 | 0.4810458 |
| -1.2186145 | -0.2467827 |
| -2.2553712 | 1.1878354 |
| -1.5213032 | -0.0344841 |
| -2.0131274 | 1.1242055 |
| -1.1347103 | -1.3113099 |
| -1.5685531 | 0.8325149 |
| -0.9260382 | -0.0824033 |
| -2.2415225 | 0.9954197 |
| -0.9122878 | -0.0469222 |
| -1.3397906 | 1.4060466 |
| -1.2389045 | -0.4493729 |
| -1.7982276 | 1.2309775 |
| -0.5911361 | -0.6848560 |
| -2.1082476 | 0.4718237 |
| -1.2674362 | 0.0054582 |
| -1.0245570 | 0.5323563 |
| -0.4038710 | -0.8928092 |
| -1.5700758 | 0.8492803 |
| -0.5857811 | -0.4105031 |
| -0.9390163 | 0.5392216 |
| -1.9244427 | -0.1219889 |
| -1.4541603 | 1.3539626 |
| -0.9361074 | -0.5525192 |
| -1.9664363 | 1.1172411 |
| -0.0467625 | -0.1007499 |
| -1.7891428 | 0.1837263 |
| -1.5234947 | 0.0762845 |
| -1.6792857 | 0.5632595 |
| -1.5939995 | -0.9067374 |
| -1.8407143 | -0.0566247 |
| -1.8545020 | 0.2702989 |
| -1.5527346 | -0.1686678 |
| -1.6196639 | -0.0399740 |
| -1.2633326 | 0.6344664 |
| -0.2000928 | -0.0710817 |
| -2.0240494 | 1.2061822 |
| -1.0041096 | 0.0871482 |
| -1.8679898 | 0.8925381 |
| -0.2636310 | -0.3628382 |
| -1.5772501 | 0.1191920 |
| -0.6837945 | -0.1460332 |
| -2.5254941 | 1.7596336 |
| -0.7784545 | -0.4389207 |
| -1.5932417 | 0.7392346 |
| -0.3855951 | -0.8678161 |
| -1.7983134 | 1.2765238 |
| -1.5103855 | -0.4661362 |
| -1.9994265 | 0.2134977 |
| -1.8546142 | -0.1609797 |
| -0.8475354 | 0.6218768 |
| -1.7161212 | -0.4768007 |
| -1.9818114 | 0.8196492 |
| -0.2132138 | -0.7072358 |
| -0.7371308 | 0.9530563 |
| -0.6439060 | -1.4771387 |
| -1.4799713 | 0.3537043 |
| -0.7388288 | -0.7521560 |
| -1.7006517 | -0.9138785 |
| -0.6318573 | -0.3024621 |
| -1.8399634 | 0.7879967 |
| -1.6070488 | -0.5720229 |
| -1.7322412 | 1.0631311 |
| -1.6254768 | -0.1740395 |
| -1.9501221 | 0.9472126 |
| -1.6608932 | -0.3063837 |
| -1.8256180 | 0.5651217 |
| -0.6698465 | -0.2241316 |
| -1.8790822 | 1.5924682 |
| -0.8756815 | -0.3491134 |
| -1.5654959 | 0.4866150 |
| -0.6189860 | -0.1917133 |
| -1.6011755 | 0.6881827 |
| 0.0700754 | -0.3334827 |
| -1.6582033 | 0.3939143 |
| -1.1324087 | -0.6560469 |
| -1.6779135 | 0.3200526 |
| -0.7073229 | 0.1481622 |
| -1.6858025 | 0.5508489 |
| -0.9688971 | 0.2156795 |
| -1.8790104 | 0.8877464 |
| -1.1076862 | -0.7479860 |
| -1.6535452 | 1.1195099 |
| -0.8037246 | 0.1731350 |
| -1.1803717 | 0.5224187 |
| -1.3631810 | 0.4334435 |
| -1.9729321 | 2.0935936 |
| -1.0202285 | 0.4783501 |
| -1.6744145 | 1.0003866 |
| -1.7627476 | -0.0132019 |
| -1.1105260 | -0.0537630 |
| -2.0617091 | 0.3885241 |
| -1.5542648 | 0.6947886 |
| -1.3432233 | 0.3482825 |
| -1.5709992 | 0.9573650 |
| -0.6173743 | -0.2465638 |
| 1.5911740 | 1.3397795 |
| 2.8877082 | -0.4633927 |
| 1.5492403 | 0.6957130 |
| 2.6167477 | -0.0137027 |
| 2.2312012 | 0.5624408 |
| 1.5565478 | 1.1702527 |
| 1.4541886 | 0.8220921 |
| 2.0225060 | 0.3551143 |
| 1.1677457 | 1.5765451 |
| 1.8117853 | 0.3101087 |
| 1.2842555 | 1.6928527 |
| 2.1666905 | -0.2527546 |
| 1.6659325 | 1.1879955 |
| 2.5021941 | 0.3923029 |
| 1.0366368 | 0.8355807 |
| 2.5185870 | -0.1528596 |
| 0.9101111 | 1.7021189 |
| 3.0834210 | 0.0158834 |
| 1.4585204 | 0.7755471 |
| 2.4548433 | 0.2009890 |
| 2.8157190 | 0.3282205 |
| 1.7507110 | 0.8748039 |
| 1.3749771 | 0.7789539 |
| 1.6209782 | 0.2127590 |
| 1.8518669 | 1.6822828 |
| 1.7803641 | 0.5129740 |
| 2.3171362 | 0.3145985 |
| 1.5696219 | 0.6554839 |
| 2.5763981 | -0.0407150 |
| 2.2298884 | 0.2832764 |
| 1.1689393 | 1.2794897 |
| 1.4555996 | 0.8733597 |
| 3.7813279 | -1.8332565 |
| 2.3299854 | 0.2981976 |
| 2.1386038 | -0.2551723 |
| 1.5889474 | 1.4781999 |
| 1.4605183 | -0.2058128 |
| 1.1100112 | 1.4240192 |
| 1.7570828 | -0.0358117 |
| 0.7088248 | 1.5642501 |
| 2.7095339 | -0.2961360 |
| 1.2457911 | 1.2448339 |
| 1.8932651 | 0.2020971 |
| 2.5786112 | -0.3389990 |
| 1.7618822 | 1.2906837 |
| 1.1535934 | 1.1515189 |
| 2.5996811 | -0.3259939 |
| 1.9632386 | 1.3732488 |
| 1.7003683 | 0.3097456 |
| 1.6277895 | 0.8477767 |
| 2.5244464 | 0.6328171 |
| 1.1556123 | 0.9742755 |
| 2.4758113 | 0.1197644 |
| 1.9011843 | 0.7702522 |
| 1.7999464 | 0.5150927 |
| 0.9984922 | 1.3294264 |
| 1.8883572 | 0.6266865 |
| 0.9295203 | 1.1384510 |
| 2.7742092 | -0.0862675 |
| 1.0749519 | 1.2147255 |
| 2.2126508 | 0.5613897 |
| 1.4713383 | 1.1089245 |
| 3.3731009 | -0.6884070 |
| 1.8294135 | 0.9461052 |
| 2.7697932 | -0.6435943 |
| 2.8935945 | -0.3771696 |
| 1.6797304 | 1.1981208 |
| 2.8187379 | 0.0025112 |
| 1.7356159 | 0.4106251 |
| 1.8826041 | 0.2849148 |
| 2.1002203 | 0.0778659 |
| 2.0249208 | 0.5800425 |
| 1.5936185 | 0.5580501 |
| 2.9006022 | -0.1979454 |
| 1.4906493 | 0.7731534 |
| 2.7722172 | -0.6084778 |
| 1.7301962 | 1.1705816 |
| 2.3517452 | 0.0021352 |
| 1.7031467 | 0.4726468 |
| 2.6959302 | -0.4273020 |
| 1.6100924 | 0.6092980 |
| 2.4829069 | -0.2659571 |
| 1.5818379 | 1.2047460 |
| 2.6008710 | -0.9451758 |
| 1.4803298 | 1.1385572 |
| 2.6541965 | 0.2859083 |
| 1.8423705 | 0.8266611 |
| 2.8177071 | -0.9626396 |
| 1.9378607 | 0.4127573 |
| 2.6210331 | -0.9989751 |
| 1.4897733 | 0.8558823 |
| 2.6056849 | -0.3204302 |
| 1.5168471 | 0.8744511 |
| 2.5697281 | -0.2594795 |
| 1.8340203 | -0.1160138 |
| 2.0825565 | 0.6457999 |
| 1.2949305 | 0.5936200 |
| 2.4254842 | -0.6201831 |
| 1.9937251 | 0.3120888 |
| 3.0848267 | -1.3836176 |
| 1.7052481 | 0.2423958 |
| 2.8576258 | 0.1807968 |
| 1.9088618 | -0.0061402 |
| 1.0175038 | 1.1976515 |
| 2.6818992 | -0.6108612 |
| 1.1244683 | 1.3177577 |
| 1.9724351 | 0.2579645 |
| 2.0980735 | -0.0012802 |
| 3.0816751 | -0.3030479 |
| 1.1548695 | 0.8014558 |
| 2.8756395 | -0.6090279 |
| 1.5786971 | 0.9743231 |
| 3.4740604 | -0.9160785 |
| 2.6839537 | -0.3164447 |
| 1.9947138 | 0.9753037 |
| 1.8298973 | 0.7833311 |
| 2.7473705 | -0.2661552 |
| 1.7112784 | 0.7247844 |
| 2.0155037 | -0.3360480 |
| -0.7926440 | -0.5015423 |
| -0.3887839 | -1.5721950 |
| -0.5142535 | -1.5686019 |
| -1.1935828 | -0.7049808 |
| -0.3038554 | -1.9736103 |
| -0.3261233 | -0.3636449 |
| -1.6334624 | -0.5494111 |
| -0.0787267 | -1.1754459 |
| -0.4695872 | -0.9139336 |
| -0.4161926 | -1.8584275 |
| -0.5181344 | -0.5009882 |
| -0.5774831 | -2.0695198 |
| -0.7811325 | -0.3299370 |
| 0.3690332 | -1.2419817 |
| -0.7114281 | -0.1185443 |
| -0.0593877 | -1.6838101 |
| -0.8336425 | -1.7507091 |
| -0.1343689 | -1.7377042 |
| -1.0592328 | -0.7680059 |
| 0.1084363 | -1.0058660 |
| -1.3956955 | 0.1860681 |
| -0.6550610 | -0.5494149 |
| -1.4183857 | 0.4452740 |
| -0.5104665 | -1.5868806 |
| -0.7891116 | -0.5057394 |
| 0.0902991 | -1.6136993 |
| -0.3010921 | -1.1365082 |
| -0.2325927 | -1.3073232 |
| -0.6853042 | -0.4687159 |
| 0.5563376 | -2.1470924 |
| -1.4044313 | 0.6692145 |
| 0.1751051 | -2.5987957 |
| -1.1895418 | 0.4389375 |
| 0.2606545 | -1.4208168 |
| -0.4772475 | -1.1464950 |
| 0.0743792 | -0.2074346 |
| -0.4200384 | -0.8184656 |
| 0.7245484 | -2.3681089 |
| -1.0421360 | 0.0561205 |
| 0.6005508 | -2.1787401 |
| 0.1385512 | -1.4729731 |
| -0.8398605 | -0.3190745 |
| -0.4719766 | -1.4760137 |
| -0.5286189 | -0.0295692 |
| -0.1434775 | -1.0027192 |
| 0.4614661 | -1.3099855 |
| -0.6445155 | -0.8863259 |
| 0.4395229 | -1.5474656 |
| -0.9163282 | -1.3479382 |
| -0.0308528 | -0.6402361 |
| -0.1873002 | -0.0569617 |
| 0.0686083 | -1.5305082 |
| -0.6280185 | -0.1810677 |
| 0.0192537 | -1.7470168 |
| -1.3111262 | 0.1963552 |
| -0.3304281 | -1.4883165 |
| -0.8488925 | 0.1908829 |
| -0.1374369 | -1.6742254 |
| -0.0516724 | -1.3041144 |
| -1.0719040 | -1.0124216 |
| 0.2145518 | -1.7896008 |
| -0.5051250 | 0.0185525 |
| -0.4507833 | -0.0653506 |
| 0.5526429 | -2.3440840 |
| -0.7388017 | -0.2477821 |
| -0.3673370 | -0.9895904 |
| 0.4916198 | -1.4826181 |
| -0.2130962 | -1.2596581 |
Let’s have a look to what happens when we now plot these values
onto a scatter plot
score_df = data.frame(df[,c(1,2,7,8)], score)
ggplot(data=score_df, aes(score_pc1, score_pc2, color=species, shape=sex))+
geom_point()+
theme_bw()+
stat_ellipse(level = 0.9)+
xlab(paste('PC1 - ', values_prc[1]*100, '%'))+
ylab(paste('PC2 - ', values_prc[2]*100, '%'))+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
geom_vline(xintercept= 0, linetype="dashed", alpha = 0.3)+
geom_hline(yintercept= 0, linetype="dashed", alpha = 0.3)
It’s immediately clear how the data are grouped into two major
clusters and penguins of species Gentoo seem
to have different characteristics than the others.
A similar result could have been achieved through a classical
cluster analysis. With PCA however we’re able to see which
variables are most responsible for the variation.
Let’s now take the first two loading vectors we calculate
before and plot them one against the other.
ggplot(load_df, aes(load1, load2))+
geom_point()+
theme_bw()+
geom_label(label= rownames(load_df),color='white', fill= 'darkolivegreen4', fontface = 'bold')+
xlim(-1,1)+ylim(-1,0.2)+
xlab(paste('PC1 - ', values_prc[1]*100, '%'))+
ylab(paste('PC2 - ', values_prc[2]*100, '%'))+
geom_segment(xend=0, yend=0, color= 'grey')+
geom_vline(xintercept= 0, linetype="dashed", alpha = 0.3)+
geom_hline(yintercept= 0, linetype="dashed", alpha = 0.3)
This graph show us the direction of each variable along the principal
components.
Vectors of flipper_length_mm and
body_mass_g are almost parralel to each other, this means
they are highly correlated (as we’re already seen before with linear
regression).
On the other hand bill_depth_mm and
bill_length_mm are located on the opposite sides of the
first principal component (the x axis). This suggest there is a weak
negative correlation between them.
Here’s what happens when we plot together the two previous graphs
ggplot(data= score_df, aes(score_pc1, score_pc2, color=species))+
geom_point(size=1)+
theme_bw()+
xlab(paste('PC1 - ', values_prc[1]*100, '%'))+
ylab(paste('PC2 - ', values_prc[2]*100, '%'))+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
geom_label(data=load_df, aes(load1*3,load2*3), label=rownames(load_df), size=3, color='white', fill= 'darkolivegreen4', fontface = 'bold' )+
geom_vline(xintercept= 0, linetype="dashed", alpha = 0.3)+
geom_hline(yintercept= 0, linetype="dashed", alpha = 0.3)
ggplot(data= score_df, aes(score_pc1, score_pc2, color=sex))+
geom_point(size=1)+
theme_bw()+
stat_ellipse(level = 0.9)+
xlab(paste('PC1 - ', values_prc[1]*100, '%'))+
ylab(paste('PC2 - ', values_prc[2]*100, '%'))+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
geom_label(data=load_df, aes(load1*3,load2*3), label=rownames(load_df), size=3, color='white', fill= 'darkolivegreen4', fontface = 'bold' )+
geom_vline(xintercept= 0, linetype="dashed", alpha = 0.3)+
geom_hline(yintercept= 0, linetype="dashed", alpha = 0.3)
ggplot(data= score_df, aes(score_pc1, score_pc2, color=island))+
geom_point(size=1)+
theme_bw()+
xlab(paste('PC1 - ', values_prc[1]*100, '%'))+
ylab(paste('PC2 - ', values_prc[2]*100, '%'))+
scale_color_manual(values=c('cornflowerblue', 'darkorange', 'brown3'))+
geom_label(data=load_df, aes(load1*3,load2*3), label=rownames(load_df), size=3, color='white', fill= 'darkolivegreen4', fontface = 'bold' )+
geom_vline(xintercept= 0, linetype="dashed", alpha = 0.3)+
geom_hline(yintercept= 0, linetype="dashed", alpha = 0.3)