PCTBoys

Harold Nelson

2025-06-24

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
CDCBirths <- read_delim("Provisional Natality, 2023 through Last Month.txt", delim = "\t", escape_double = FALSE, trim_ws = TRUE)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 176 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): Notes, State of Residence, State of Residence Code, Sex of Infant, ...
## dbl (1): Births
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(CDCBirths)
## Rows: 176
## Columns: 6
## $ Notes                     <chr> NA, NA, "Total", NA, NA, "Total", NA, NA, "T…
## $ `State of Residence`      <chr> "Alabama", "Alabama", "Alabama", "Alaska", "…
## $ `State of Residence Code` <chr> "01", "01", "01", "02", "02", "02", "04", "0…
## $ `Sex of Infant`           <chr> "Female", "Male", NA, "Female", "Male", NA, …
## $ `Sex of Infant Code`      <chr> "F", "M", NA, "F", "M", NA, "F", "M", NA, "F…
## $ Births                    <dbl> 28223, 29635, 57858, 4320, 4695, 9015, 38208…
# View(CDCBirths)
str(CDCBirths)
## spc_tbl_ [176 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Notes                  : chr [1:176] NA NA "Total" NA ...
##  $ State of Residence     : chr [1:176] "Alabama" "Alabama" "Alabama" "Alaska" ...
##  $ State of Residence Code: chr [1:176] "01" "01" "01" "02" ...
##  $ Sex of Infant          : chr [1:176] "Female" "Male" NA "Female" ...
##  $ Sex of Infant Code     : chr [1:176] "F" "M" NA "F" ...
##  $ Births                 : num [1:176] 28223 29635 57858 4320 4695 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Notes = col_character(),
##   ..   `State of Residence` = col_character(),
##   ..   `State of Residence Code` = col_character(),
##   ..   `Sex of Infant` = col_character(),
##   ..   `Sex of Infant Code` = col_character(),
##   ..   Births = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

We need to remove the rows with no gender code drop some variables, and make the variable names usable. Create a new dataset called Births. It should have the following variables:

  1. State
  2. Sex
  3. Births

Solution

CDCBirths = CDCBirths %>% 
  rename(State = `State of Residence`,
         Sex = `Sex of Infant Code`) %>% 
  filter(!is.na(Sex)) %>% 
  select(State, Sex, Births)

CDCBirths
## # A tibble: 102 × 3
##    State      Sex   Births
##    <chr>      <chr>  <dbl>
##  1 Alabama    F      28223
##  2 Alabama    M      29635
##  3 Alaska     F       4320
##  4 Alaska     M       4695
##  5 Arizona    F      38208
##  6 Arizona    M      39888
##  7 Arkansas   F      17187
##  8 Arkansas   M      18077
##  9 California F     195243
## 10 California M     204865
## # ℹ 92 more rows

Wider

I asked DeepSeek the following question.

You are an expert in R programming and especially the tidyverse. I have a dataframe CDCBirths with the following variables: State (name), Sex(M or F) and Births ( A number). I want a dataframe with State, Male_Births, and Female_Births. Write R code to do this.

The Code from Deepseek

wide_births <- CDCBirths %>%
  pivot_wider(
    names_from = Sex,         # Column to get new column names from
    values_from = Births,     # Column to get values from
    names_prefix = "Births_"  # Optional prefix for new column names
  ) %>%
  rename(
    Male_Births = Births_M,   # Rename for your requested output
    Female_Births = Births_F
  )

wide_births
## # A tibble: 51 × 3
##    State                Female_Births Male_Births
##    <chr>                        <dbl>       <dbl>
##  1 Alabama                      28223       29635
##  2 Alaska                        4320        4695
##  3 Arizona                      38208       39888
##  4 Arkansas                     17187       18077
##  5 California                  195243      204865
##  6 Colorado                     30120       31374
##  7 Connecticut                  16968       17591
##  8 Delaware                      5101        5326
##  9 District of Columbia          3868        4028
## 10 Florida                     108051      113359
## # ℹ 41 more rows

Calculate Percent of Boys

Solution

wide_births = wide_births %>% 
  mutate(pctboys = Male_Births/(Male_Births + Female_Births))

summary(wide_births$pctboys)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.5020  0.5101  0.5117  0.5120  0.5133  0.5257