The purpose of this document is to provide you the opportunity to
practically apply the skills that were developed in class. The focus of
this session is tidyr specifically. It is useful for
getting data into a tidy format in which each variable is its own column
and each row is its own observation.
For this assignment, we will be using the palmerpenguins
package and data set. Please install the package and load it into your
current R session. You can run the command
data(package = 'palmerpenguins') which will allow you to
access the penguins dataframe.
library(palmerpenguins)
library(tidyverse)
library(explore)
data(package = 'palmerpenguins')penguins %>% describe()## # A tibble: 8 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 species fct 0 0 3 NA NA NA
## 2 island fct 0 0 3 NA NA NA
## 3 bill_length_mm dbl 2 0.6 165 32.1 43.9 59.6
## 4 bill_depth_mm dbl 2 0.6 81 13.1 17.2 21.5
## 5 flipper_length_mm int 2 0.6 56 172 201. 231
## 6 body_mass_g int 2 0.6 95 2700 4202. 6300
## 7 sex fct 11 3.2 3 NA NA NA
## 8 year int 0 0 3 2007 2008. 2009
penguins %>% explore_all()penguins %>% explore_all(target = sex)penguins %>%
select(species, body_mass_g, ends_with("_mm")) %>%
GGally::ggpairs(aes(color = species)) +
scale_colour_manual(values = c("darkorange","purple","cyan4")) +
scale_fill_manual(values = c("darkorange","purple","cyan4"))year) per species.# Maximum bill Length of each species of penguin
maxspecies <- penguins %>%
group_by(species) %>%
summarise(maxspecies=max(bill_length_mm,na.rm = T))
#b
species_2007 <- penguins %>%
filter(year %in% '2007') %>%
group_by(species) %>%
count(island, sex, .drop = FALSE)
#OR
penguins %>% filter(year %in% '2007') %>% count(species, island,sex, .drop = FALSE)## # A tibble: 21 × 4
## species island sex n
## <fct> <fct> <fct> <int>
## 1 Adelie Biscoe female 5
## 2 Adelie Biscoe male 5
## 3 Adelie Dream female 9
## 4 Adelie Dream male 10
## 5 Adelie Dream <NA> 1
## 6 Adelie Torgersen female 8
## 7 Adelie Torgersen male 7
## 8 Adelie Torgersen <NA> 5
## 9 Chinstrap Biscoe female 0
## 10 Chinstrap Biscoe male 0
## # … with 11 more rows
#c The data has 4 numeric variables (anonymous function)
avg_numeric <- penguins %>%
group_by(species) %>%
#select(body_mass_g, ends_with("_mm")) %>%
summarize(across(c(body_mass_g, ends_with("_mm")),\(x) mean(x, na.rm=TRUE)))The code chunk below merges values from different columns into a
single column using the penguins data and saving it as
penguins_united. Run the code below and make sure you
understand how it is transforming the data. Then, use
penguins_united and untangle these columns as they were
before and store them in an object called
penguins_separated using the separate()
function. Remember to replace "NA" with
NA.
penguins_united <- penguins %>%
mutate(across(bill_length_mm:body_mass_g, as.character)) %>%
unite(
col = "merged",
bill_length_mm:body_mass_g
)
penguins_united## # A tibble: 344 × 5
## species island merged sex year
## <fct> <fct> <chr> <fct> <int>
## 1 Adelie Torgersen 39.1_18.7_181_3750 male 2007
## 2 Adelie Torgersen 39.5_17.4_186_3800 female 2007
## 3 Adelie Torgersen 40.3_18_195_3250 female 2007
## 4 Adelie Torgersen NA_NA_NA_NA <NA> 2007
## 5 Adelie Torgersen 36.7_19.3_193_3450 female 2007
## 6 Adelie Torgersen 39.3_20.6_190_3650 male 2007
## 7 Adelie Torgersen 38.9_17.8_181_3625 female 2007
## 8 Adelie Torgersen 39.2_19.6_195_4675 male 2007
## 9 Adelie Torgersen 34.1_18.1_193_3475 <NA> 2007
## 10 Adelie Torgersen 42_20.2_190_4250 <NA> 2007
## # … with 334 more rows
# Start your answer below
penguin_separated <- penguins_united %>%
separate(merged, c("bill_length_mm","bill_depth_mm","flipper_length_mm","body_mass_g" ),
sep ="_",remove = T) %>%
mutate(across(where(is.character), ~na_if(., "NA")))
penguin_separated## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_…¹ body_…² sex year
## <fct> <fct> <chr> <chr> <chr> <chr> <fct> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
## 4 Adelie Torgersen <NA> <NA> <NA> <NA> <NA> 2007
## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
## 7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007
## 8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
## 9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
## 10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007
## # … with 334 more rows, and abbreviated variable names ¹flipper_length_mm,
## # ²body_mass_g