HDS 2.3-2.4

Author

Luke Orgon

install.packages("tidyverse")
install.packages("palmerpenguins")

Begin by loading the tidyverse and palmerpenguins packages above.

Take a glimpse of the penguins data and determine which of the variables are categorical and which are quantitative:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(palmerpenguins)

Attaching package: 'palmerpenguins'

The following objects are masked from 'package:datasets':

    penguins, penguins_raw
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Make a list of the categorical variables: - species - island - sex - year

Make a list of the quantitative variables: - bill_length_mm - bill_depth_mm - flipper_length_mm - body_mass_g

Summarizing Variables

Create a code chunk that summarizes the number of penguins by species:

count(penguins, species)
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124

Create a code chunk that summarizes the number of penguins by island:

count(penguins, island)
# A tibble: 3 × 2
  island        n
  <fct>     <int>
1 Biscoe      168
2 Dream       124
3 Torgersen    52

Create a code chunck that summarizes the number of penguins by species and island:

count(penguins, species, island)
# A tibble: 5 × 3
  species   island        n
  <fct>     <fct>     <int>
1 Adelie    Biscoe       44
2 Adelie    Dream        56
3 Adelie    Torgersen    52
4 Chinstrap Dream        68
5 Gentoo    Biscoe      124

Create a code chunk that summarizes body_mass_g. It should produce a data frame with the mean, median, minimum, maximum, and standard deviation of the body_mass_q:

penguins %>%
  summarize(
    mean_body_mass = mean(body_mass_g, na.rm = TRUE),
    median_body_mass = median(body_mass_g, na.rm = TRUE),
    min_body_mass = min(body_mass_g, na.rm = TRUE),
    max_body_mass = max(body_mass_g, na.rm = TRUE),
    sd_body_mass = sd(body_mass_g, na.rm = TRUE)
  )
# A tibble: 1 × 5
  mean_body_mass median_body_mass min_body_mass max_body_mass sd_body_mass
           <dbl>            <dbl>         <int>         <int>        <dbl>
1          4202.             4050          2700          6300         802.