Harold Nelson
2023-02-13
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Use the Import Dataset control.
boys_and_girls_2021 <- read_delim("Natality, 2016-2021 expanded.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 121 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): Notes, State of Residence, State of Residence Code, Sex of Infant, ...
## dbl (1): Births
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 121
## Columns: 6
## $ Notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ `State of Residence` <chr> "Alabama", "Alabama", "Alaska", "Alaska", "A…
## $ `State of Residence Code` <chr> "01", "01", "02", "02", "04", "04", "05", "0…
## $ `Sex of Infant` <chr> "Female", "Male", "Female", "Male", "Female"…
## $ `Sex of Infant Code` <chr> "F", "M", "F", "M", "F", "M", "F", "M", "F",…
## $ Births <dbl> 170911, 179258, 29296, 31102, 235677, 245676…
How do you clean up these variable names?
Use janitor::clean_names() Do a little research!
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
## Rows: 121
## Columns: 6
## $ notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ state_of_residence <chr> "Alabama", "Alabama", "Alaska", "Alaska", "Ari…
## $ state_of_residence_code <chr> "01", "01", "02", "02", "04", "04", "05", "05"…
## $ sex_of_infant <chr> "Female", "Male", "Female", "Male", "Female", …
## $ sex_of_infant_code <chr> "F", "M", "F", "M", "F", "M", "F", "M", "F", "…
## $ births <dbl> 170911, 179258, 29296, 31102, 235677, 245676, …
Also trim the variable names.
boys_and_girls_2021 = boys_and_girls_2021 %>%
select(state_of_residence,sex_of_infant_code,births) %>%
rename(state = state_of_residence,
sex = sex_of_infant_code) %>%
drop_na()
glimpse(boys_and_girls_2021)
## Rows: 102
## Columns: 3
## $ state <chr> "Alabama", "Alabama", "Alaska", "Alaska", "Arizona", "Arizona",…
## $ sex <chr> "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F"…
## $ births <dbl> 170911, 179258, 29296, 31102, 235677, 245676, 107765, 112827, 1…