Fertility 1

Harold Nelson

02/27/2022

Setup

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Birth Data by State and Year

We’re going to get birth data by state from 2003 through 2020.

The following video will show you how to get the basic data from CDC Wonder. Point your browser to https://wonder.cdc.gov/. Then follow along with the video.

Task 1

Video Link: https://www.youtube.com/watch?v=Oiw7bm4GjvQ

Donload the data for 2003-2006 from CDC Wonder following the directions in the video to obtain by_state_0306. Import the data into your R environment using the “Import Dataset” control.

Copy the code created by the control into the chunk below. Run glimpse() on the dataframe to verify that the process worked.

Solution

# Place your code here.
by_state_year_0306 <- read_delim("~/Downloads/Natality, 2003-2006.txt",  delim = "\t", escape_double = FALSE, 
                                 trim_ws = TRUE)

## Rows: 1672 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (7): Notes, State, State Code, Age of Mother 9, Age of Mother 9 Code, Fe...
## dbl (3): Year, Year Code, Births

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(by_state_year_0306)

## Rows: 1,672
## Columns: 10

## Warning: One or more parsing issues, see `problems()` for details

## $ Notes                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ State                  <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Al…
## $ `State Code`           <chr> "01", "01", "01", "01", "01", "01", "01", "01",…
## $ `Age of Mother 9`      <chr> "Under 15 years", "Under 15 years", "Under 15 y…
## $ `Age of Mother 9 Code` <chr> "15", "15", "15", "15", "15-19", "15-19", "15-1…
## $ Year                   <dbl> 2003, 2004, 2005, 2006, 2003, 2004, 2005, 2006,…
## $ `Year Code`            <dbl> 2003, 2004, 2005, 2006, 2003, 2004, 2005, 2006,…
## $ Births                 <dbl> 172, 162, 150, 163, 8095, 8126, 7771, 8537, 186…
## $ `Female Population`    <chr> "Not Available", "Not Available", "Not Availabl…
## $ `Fertility Rate`       <chr> "Not Available", "Not Available", "Not Availabl…

Task 2

Repeat the process above for the data from 2007-2020. Note that the video refers to 2007-2018, but the change is obvious.

by_state_year_0720 <- read_delim("~/Downloads/Natality, 2007-2020.txt", delim = "\t", escape_double = FALSE, 
trim_ws = TRUE)

## Rows: 5826 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (7): Notes, State, State Code, Age of Mother 9, Age of Mother 9 Code, Fe...
## dbl (3): Year, Year Code, Births

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Problem 3

Combine the two data frames into by_state_0320 using rbind().

Solution

by_state_0320 = rbind(by_state_year_0306,by_state_year_0720)

## Warning: One or more parsing issues, see `problems()` for details

Task 4

Edit the dataframe using dplyr. It should have the following variables.

State
Year
Age (Contents of Age Code)
Fpop (Renamed and made numeric using as.numeric())
Births (made numeric using as.numeric())
Rate (made numeric using as.numeric()). Divide by 1000 to get rates per person.
Eliminate the District of Columbia.
Drop rows with missing data.

Use summary() to check your work.

by_state_0320 = by_state_0320 %>% 
  filter(State != "District of Columbia") %>% 
  select("State","Year",Age = "Age of Mother 9 Code", Fpop = "Female Population", "Births", Rate = "Fertility Rate") %>%
  mutate(Fpop = as.numeric(Fpop),
           Births = as.numeric(Births),
           Rate = as.numeric(Rate)/1000) %>% 
  drop_na()

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion

Task 5

Lets’ look at the time-series of Rate for the state of Washington. Map color to Age. Use plotly.

Solution

g = by_state_0320 %>% 
  filter(State == "Washington") %>% 
  ggplot(aes(x = Year, y = Rate, color = Age)) +
  geom_point()
ggplotly(g)

Task 6

Compare the states Connecticut, Washington, and Utah for birth rates in the 25-29 group. Map color to State and use plotly.

Solution

g1 = by_state_0320 %>% 
  filter(State %in% c("Connecticut", "Washington","Utah") &
           Age == "25-29") %>% 
  ggplot(aes(x = Year, y = Rate, color = State)) +
  geom_point() 

ggplotly(g1)