HDS 2.3-2.4

Author

GI

library(tidyverse)
library(datasets)

We will begin by loading the tidyverse and palmerpenguins packages above.

Took a glimpse of the penguins data and determine which of the variables are categorical and which are quantitative:

glimpse(penguins)
Rows: 344
Columns: 8
$ species     <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…
$ island      <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…
$ bill_len    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, …
$ bill_dep    <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, …
$ flipper_len <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180,…
$ body_mass   <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …
$ sex         <fct> male, female, female, NA, female, male, female, male, NA, …
$ year        <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Made a list of the categorical variables:

Made a list of the quantitative variables:

Summarizing Variables

We will now create code chunks to summarize some of our variables.

Created a code chunk that summarizes the number of penguins by species:

count(penguins,species)
    species   n
1    Adelie 152
2 Chinstrap  68
3    Gentoo 124

Created a code chunk that summarizes the number of penguins by island:

count(penguins,island)
     island   n
1    Biscoe 168
2     Dream 124
3 Torgersen  52

Created a code chunk that summarizes the number of penguins by species and island:

count(penguins,species,island)
    species    island   n
1    Adelie    Biscoe  44
2    Adelie     Dream  56
3    Adelie Torgersen  52
4 Chinstrap     Dream  68
5    Gentoo    Biscoe 124

Created a code chunk that summarizes body_mass. It should produce a data frame with the mean, median, minimum, maximum, and standard deviation of the body_mass_q:

summarize(penguins,
          mean_mass=mean(body_mass, na.rm = TRUE),
          median_mass=median(body_mass, na.rm = TRUE),
          min_mass=min(body_mass, na.rm = TRUE),
          max_mass=max(body_mass, na.rm = TRUE),
          sd_mass=sd(body_mass, na.rm = TRUE))
  mean_mass median_mass min_mass max_mass  sd_mass
1  4201.754        4050     2700     6300 801.9545