library(tidyverse)
library(palmerpenguins)HDS 2.3-2.4
Begin by loading the tidyverse and palmerpenguins packages above.
Take a glimpse of the penguins data and determine which of the variables are categorical and which are quantitative:
glimpse(penguins)Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Make a list of the categorical variables:
- Species
- Island
- Sex
- Year
Make a list of the quantitative variables:
- Bill Length (mm)
- Bill Depth (mm)
- Flipper Length (mm)
- Body Mass (g)
Summarizing Variables
Create a code chunk that summarizes the number of penguins by species:
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
count(penguins,species)# A tibble: 3 × 2
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
tabyl(penguins,species) species n percent
Adelie 152 0.4418605
Chinstrap 68 0.1976744
Gentoo 124 0.3604651
Create a code chunk that summarizes the number of penguins by island:
count(penguins,island)# A tibble: 3 × 2
island n
<fct> <int>
1 Biscoe 168
2 Dream 124
3 Torgersen 52
tabyl(penguins,island) island n percent
Biscoe 168 0.4883721
Dream 124 0.3604651
Torgersen 52 0.1511628
Create a code chunck that summarizes the number of penguins by species and island:
count(penguins,species,island)# A tibble: 5 × 3
species island n
<fct> <fct> <int>
1 Adelie Biscoe 44
2 Adelie Dream 56
3 Adelie Torgersen 52
4 Chinstrap Dream 68
5 Gentoo Biscoe 124
tabyl(penguins,species,island) species Biscoe Dream Torgersen
Adelie 44 56 52
Chinstrap 0 68 0
Gentoo 124 0 0
Create a code chunk that summarizes body_mass_g. It should produce a data frame with the mean, median, minimum, maximum, and standard deviation of the body_mass_q:
summarize(
penguins,
mean_body_mass = mean(body_mass_g, na.rm = TRUE),
median_body_mass = median(body_mass_g, na.rm = TRUE),
min_body_mass = min(body_mass_g, na.rm = TRUE),
max_body_mass = max(body_mass_g, na.rm = TRUE),
sd_body_mass = sd(body_mass_g, na.rm = TRUE)
)# A tibble: 1 × 5
mean_body_mass median_body_mass min_body_mass max_body_mass sd_body_mass
<dbl> <dbl> <int> <int> <dbl>
1 4202. 4050 2700 6300 802.