Fertility 2

Harold Nelson

2/28/2022

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## âś“ ggplot2 3.3.5     âś“ purrr   0.3.4
## âś“ tibble  3.1.5     âś“ dplyr   1.0.7
## âś“ tidyr   1.1.4     âś“ stringr 1.4.0
## âś“ readr   2.0.2     âś“ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
load("by_state_0320.Rdata")

Births by Year

Make a graph showing the total number of births per year.

Solution

g1 = by_state_0320 %>% 
  group_by(Year) %>% 
  summarise(Total_births = sum(Births)) %>% 
  ggplot(aes(x = Year, y = Total_births)) +
  geom_point() +
  ggtitle("Total Births by Year")

ggplotly(g1)

Women

It is somewhat surprising that the total number of births in the US has declined. Has the number of women been declining in this time period. Redo the previous dataframe and include the total number of women. Also compute the ratio of births to women. Print this dataframe so we can see all three variables.

Solution

births_by_year = by_state_0320 %>% 
  group_by (Year) %>% 
  summarize(total_births = sum(Births),
            total_women = sum(Fpop)) %>%
  mutate(ratio = total_births/total_women)

births_by_year
## # A tibble: 18 Ă— 4
##     Year total_births total_women  ratio
##    <dbl>        <dbl>       <dbl>  <dbl>
##  1  2003      4069873    61745355 0.0659
##  2  2004      4091267    61826371 0.0662
##  3  2005      4117159    61926703 0.0665
##  4  2006      4243709    62042926 0.0684
##  5  2007      4293868    62142781 0.0691
##  6  2008      4225221    62207012 0.0679
##  7  2009      4108770    62215640 0.0660
##  8  2010      3978056    62212650 0.0639
##  9  2011      3932777    62350967 0.0631
## 10  2012      3932055    62574870 0.0628
## 11  2013      3911686    62765484 0.0623
## 12  2014      3967411    63178997 0.0628
## 13  2015      3957547    63425384 0.0624
## 14  2016      3924784    63429600 0.0619
## 15  2017      3834747    63771779 0.0601
## 16  2018      3771254    63982759 0.0589
## 17  2019      3727127    64134984 0.0581
## 18  2020      3593850    64351519 0.0558

TFR

The total fertility rate is the number of births for a woman during her lifetime. The rate data we have is the number of births per woman per year while she is in one of the 5-year age groups. How do we use the rate information to construct the TFR. Do this for the State of Washington. Plot the time-series using plotly.

Solution

g2 = by_state_0320 %>% 
  filter(State == "Washington") %>% 
  group_by(Year) %>% 
  summarize(TFR = sum(Rate) * 5) %>% 
  ungroup %>% 
  ggplot(aes(x = Year, y = TFR)) +
  geom_point()

ggplotly(g2)

All States

Repeat the exercise for all states. Again, use plotly so we will be able to identify states. In the aes(), add “group = State”. Also draw a red horizontal line at 2.1.

Solution

g3 = by_state_0320 %>% 
  group_by(State,Year) %>% 
  summarize(TFR = sum(Rate) * 5) %>% 
  ungroup() %>% 
  ggplot(aes(x = Year, y = TFR, group = State)) +
  geom_point(size = .2) +
  geom_hline(aes(yintercept=2.1), color = "red")
## `summarise()` has grouped output by 'State'. You can override using the `.groups` argument.
ggplotly(g3)

Whole Country

Create a graph showing the TFR for the US as a whole. We can’t use the rate data directly. We need to compute the total numbers of births and total numbers of women for each age group and year. Then compute the TFR by adding the calculated rates and multiplying by 5.

Solution

g4 = by_state_0320 %>% 
  group_by(Year,Age) %>% 
  summarize(Births = sum(Births),
            Fpop = sum(Fpop)) %>% 
  mutate(Rate = Births/Fpop) %>% 
  summarize(TFR = sum(Rate) * 5) %>% 
  ungroup() %>% 
  ggplot(aes(x = Year, y = TFR)) +
  geom_point() +
  geom_hline(aes(yintercept=2.1), color = "red")
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.
ggplotly(g4)