We will analyze data collected from adult penguins found on three islands in the Palmer Archipelago, Antarctica. The data collected includes the penguin’s species, the island they were found on, the sex of the penguin, and several quantitative variables (flipper length, body mass, bill dimensions).
Our goal is to examine the data in general, and to do some more detailed analysis for the “species” variable.
We start by looking at the size of the data frame, as well as the names of the variables.
dim(P)
## [1] 344 8
names(P)
## [1] "species" "island" "bill_length_mm"
## [4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
## [7] "sex" "year"
As we can see, there are 344 rows in our data frame (i.e. 344 penguins), and there are 8 columns (i.e. 8 variables). The variables are species (categorical), island (categorical), bill length (quantitative; measured in mm), bill depth (quantitative; measured in mm), flipper length (quantitative; measured in mm), body mass (quantitative; measured in grams), sex (categorical), and year of data (quantitative but discrete - can also be viewed as categorical and ordinal).
We start by counting how many penguins of each of our three species appear in the islands. We do this with a frequency table. We also create a relative frequency table to measure proportions.