Harold Nelson
2025-06-24
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
CDCBirths <- read_delim("Provisional Natality, 2023 through Last Month.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 176 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): Notes, State of Residence, State of Residence Code, Sex of Infant, ...
## dbl (1): Births
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 176
## Columns: 6
## $ Notes <chr> NA, NA, "Total", NA, NA, "Total", NA, NA, "T…
## $ `State of Residence` <chr> "Alabama", "Alabama", "Alabama", "Alaska", "…
## $ `State of Residence Code` <chr> "01", "01", "01", "02", "02", "02", "04", "0…
## $ `Sex of Infant` <chr> "Female", "Male", NA, "Female", "Male", NA, …
## $ `Sex of Infant Code` <chr> "F", "M", NA, "F", "M", NA, "F", "M", NA, "F…
## $ Births <dbl> 28223, 29635, 57858, 4320, 4695, 9015, 38208…
## spc_tbl_ [176 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Notes : chr [1:176] NA NA "Total" NA ...
## $ State of Residence : chr [1:176] "Alabama" "Alabama" "Alabama" "Alaska" ...
## $ State of Residence Code: chr [1:176] "01" "01" "01" "02" ...
## $ Sex of Infant : chr [1:176] "Female" "Male" NA "Female" ...
## $ Sex of Infant Code : chr [1:176] "F" "M" NA "F" ...
## $ Births : num [1:176] 28223 29635 57858 4320 4695 ...
## - attr(*, "spec")=
## .. cols(
## .. Notes = col_character(),
## .. `State of Residence` = col_character(),
## .. `State of Residence Code` = col_character(),
## .. `Sex of Infant` = col_character(),
## .. `Sex of Infant Code` = col_character(),
## .. Births = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
We need to remove the rows with no gender code drop some variables, and make the variable names usable. Create a new dataset called Births. It should have the following variables:
CDCBirths = CDCBirths %>%
rename(State = `State of Residence`,
Sex = `Sex of Infant Code`) %>%
filter(!is.na(Sex)) %>%
select(State, Sex, Births)
CDCBirths
## # A tibble: 102 × 3
## State Sex Births
## <chr> <chr> <dbl>
## 1 Alabama F 28223
## 2 Alabama M 29635
## 3 Alaska F 4320
## 4 Alaska M 4695
## 5 Arizona F 38208
## 6 Arizona M 39888
## 7 Arkansas F 17187
## 8 Arkansas M 18077
## 9 California F 195243
## 10 California M 204865
## # ℹ 92 more rows
I asked DeepSeek the following question.
You are an expert in R programming and especially the tidyverse. I have a dataframe CDCBirths with the following variables: State (name), Sex(M or F) and Births ( A number). I want a dataframe with State, Male_Births, and Female_Births. Write R code to do this.
wide_births <- CDCBirths %>%
pivot_wider(
names_from = Sex, # Column to get new column names from
values_from = Births, # Column to get values from
names_prefix = "Births_" # Optional prefix for new column names
) %>%
rename(
Male_Births = Births_M, # Rename for your requested output
Female_Births = Births_F
)
wide_births
## # A tibble: 51 × 3
## State Female_Births Male_Births
## <chr> <dbl> <dbl>
## 1 Alabama 28223 29635
## 2 Alaska 4320 4695
## 3 Arizona 38208 39888
## 4 Arkansas 17187 18077
## 5 California 195243 204865
## 6 Colorado 30120 31374
## 7 Connecticut 16968 17591
## 8 Delaware 5101 5326
## 9 District of Columbia 3868 4028
## 10 Florida 108051 113359
## # ℹ 41 more rows