Harold Nelson
2/28/2022
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## âś“ ggplot2 3.3.5 âś“ purrr 0.3.4
## âś“ tibble 3.1.5 âś“ dplyr 1.0.7
## âś“ tidyr 1.1.4 âś“ stringr 1.4.0
## âś“ readr 2.0.2 âś“ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Make a graph showing the total number of births per year.
It is somewhat surprising that the total number of births in the US has declined. Has the number of women been declining in this time period. Redo the previous dataframe and include the total number of women. Also compute the ratio of births to women. Print this dataframe so we can see all three variables.
births_by_year = by_state_0320 %>%
group_by (Year) %>%
summarize(total_births = sum(Births),
total_women = sum(Fpop)) %>%
mutate(ratio = total_births/total_women)
births_by_year
## # A tibble: 18 Ă— 4
## Year total_births total_women ratio
## <dbl> <dbl> <dbl> <dbl>
## 1 2003 4069873 61745355 0.0659
## 2 2004 4091267 61826371 0.0662
## 3 2005 4117159 61926703 0.0665
## 4 2006 4243709 62042926 0.0684
## 5 2007 4293868 62142781 0.0691
## 6 2008 4225221 62207012 0.0679
## 7 2009 4108770 62215640 0.0660
## 8 2010 3978056 62212650 0.0639
## 9 2011 3932777 62350967 0.0631
## 10 2012 3932055 62574870 0.0628
## 11 2013 3911686 62765484 0.0623
## 12 2014 3967411 63178997 0.0628
## 13 2015 3957547 63425384 0.0624
## 14 2016 3924784 63429600 0.0619
## 15 2017 3834747 63771779 0.0601
## 16 2018 3771254 63982759 0.0589
## 17 2019 3727127 64134984 0.0581
## 18 2020 3593850 64351519 0.0558
The total fertility rate is the number of births for a woman during her lifetime. The rate data we have is the number of births per woman per year while she is in one of the 5-year age groups. How do we use the rate information to construct the TFR. Do this for the State of Washington. Plot the time-series using plotly.
Repeat the exercise for all states. Again, use plotly so we will be able to identify states. In the aes(), add “group = State”. Also draw a red horizontal line at 2.1.
g3 = by_state_0320 %>%
group_by(State,Year) %>%
summarize(TFR = sum(Rate) * 5) %>%
ungroup() %>%
ggplot(aes(x = Year, y = TFR, group = State)) +
geom_point(size = .2) +
geom_hline(aes(yintercept=2.1), color = "red")
## `summarise()` has grouped output by 'State'. You can override using the `.groups` argument.
Create a graph showing the TFR for the US as a whole. We can’t use the rate data directly. We need to compute the total numbers of births and total numbers of women for each age group and year. Then compute the TFR by adding the calculated rates and multiplying by 5.
g4 = by_state_0320 %>%
group_by(Year,Age) %>%
summarize(Births = sum(Births),
Fpop = sum(Fpop)) %>%
mutate(Rate = Births/Fpop) %>%
summarize(TFR = sum(Rate) * 5) %>%
ungroup() %>%
ggplot(aes(x = Year, y = TFR)) +
geom_point() +
geom_hline(aes(yintercept=2.1), color = "red")
## `summarise()` has grouped output by 'Year'. You can override using the `.groups` argument.