This assignment builds on what you did in the previous assignment. It uses the dataframe you saved at the end.

It also requires you to submit your work by posting a document on RPubs. This will allow you to create interactive graphs.

Setup

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.3
## Warning: package 'readr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.0     âś” stringr   1.5.1
## âś” ggplot2   3.5.1     âś” tibble    3.2.1
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
## Warning: package 'plotly' was built under R version 4.4.3
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
load("by_state_00323.RData")

by_state_00323 <- by_state_00323_new


by_state_00323$Fpop <- as.numeric(gsub(",","", by_state_00323$Fpop))
by_state_00323$Rate <- as.numeric(gsub(",","", by_state_00323$Rate))
by_state_00323$Births <- as.numeric(gsub(",","", by_state_00323$Births))

Task 1

Make a graph showing the total number of births per year. Use plotly.

g = by_state_00323 %>%
  group_by(Year) %>%
  summarize(Total_births = sum(Births)) %>%
  ggplot(aes(x=Year, y= Total_births)) + geom_point() + ggtitle("Total Births per Year")
  
ggplotly(g)

Task 2

It is somewhat surprising that the total number of births in the US has declined. Has the number of women been declining in this time period?

Redo the previous dataframe and include the total number of women. Also compute the ratio of births to women. Print the head and tail of this dataframe so we can see all three variables.

# Place your code here
colnames(by_state_00323)
##  [1] "Notes"           "State"           "State Code"      "Age of Mother 9"
##  [5] "Age"             "Year"            "Year Code"       "Births"         
##  [9] "Fpop"            "Rate"
births_by_year = by_state_00323 %>%
  group_by(Year) %>%
  summarize(total_births = sum(Births), total_women = sum(Fpop, na.rm = TRUE)) %>%
  mutate(ratio = total_births/total_women)

head(births_by_year)
## # A tibble: 6 Ă— 4
##    Year total_births total_women  ratio
##   <dbl>        <dbl>       <dbl>  <dbl>
## 1  2003      4082145    61745355 0.0661
## 2  2004      4103975    61826371 0.0664
## 3  2005      4130210    61926703 0.0667
## 4  2006      4256849    62042926 0.0686
## 5  2007      4307197    62142781 0.0693
## 6  2008      4238410    62207012 0.0681
tail(births_by_year)
## # A tibble: 6 Ă— 4
##    Year total_births total_women    ratio
##   <dbl>        <dbl>       <dbl>    <dbl>
## 1  2018      3782308    63982759   0.0591
## 2  2019      3738303    64134984   0.0583
## 3  2020      3604602    64351519   0.0560
## 4  2021      3655480           0 Inf     
## 5  2022      3659519           0 Inf     
## 6  2023      3587955           0 Inf

Task 3

The total fertility rate (TFR) is the number of births for a woman during her lifetime. The rate data we have is the number of births per woman per year while she is in one of the 5-year age groups.

How do we use the rate information to construct the TFR? We add up all of the rates for the individual age groups and multiply the sum by 5. Do this for the State of Washington. Plot the time series using geom_point() and plotly.

g = by_state_00323 %>%
  filter(State == "Washington") %>%
  group_by(Year) %>%
  summarize(TFR = sum(Rate, na.rm = TRUE) * 5) %>%
  ungroup %>%
  ggplot(aes(x=Year, y=TFR)) + geom_point()
  
ggplotly(g)

Task 4

Repeat the exercise for all states. Again, use plotly so we will be able to identify states. In the aes(), add “group = State”. Also draw a red horizontal line at 2.1. Use the plotly framework I have set up.

Using the interactive graph, which states have the highest and lowest TFR values in 2003 and 2023?

g = by_state_00323 %>%
  group_by(State,Year) %>%
  summarize(TFR = sum(Rate, na.rm = TRUE) * 5) %>%
  ungroup() %>%
  ggplot(aes(x=Year, y=TFR, group=State)) + geom_point() +geom_hline(aes(yintercept =2.1), color = "red")
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.
ggplotly(g)

Task 5

Create a graph showing the TFR for the US as a whole. We can’t use the rate data directly. We need to compute the total numbers of births and total numbers of women for each age group and year. Then compute the TFR by adding the calculated rates and multiplying by 5. Plot this using plotly.

g = by_state_00323 %>%
  group_by(Year, `Age of Mother 9`) %>%
  summarize(Births=sum(Births, na.rm = TRUE), Fpop = sum(Fpop, na.rm = TRUE)) %>%
  filter(Fpop>0) %>%
  mutate(Rate = Births/Fpop) %>%
  group_by(Year) %>%
  summarize(TFR = sum(Rate, na.rm = TRUE) *5) %>%
  ungroup() %>%
  ggplot(aes(x=Year, y=TFR)) + geom_point()
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
ggplotly(g)