This assignment builds on what you did in the previous assignment. It uses the dataframe you saved at the end.

It also requires you to submit your work by posting a document on RPubs. This will allow you to create interactive graphs.

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
load("by_state_0323.Rdata")

Task 1

Make a graph showing the total number of births per year. Use plotly.

g<- by_state_0323 %>%
  group_by(Year) %>%
  summarize(TotalBirths = sum(Births)) %>%
  ggplot(aes(x = Year, y = TotalBirths)) +
  geom_line() +
  geom_point() +
  labs(title = "Total Number of Births per Year",
       x = "Year",
       y = "Total Births") +
  theme_minimal()

ggplotly(g)

Task 2

It is somewhat surprising that the total number of births in the US has declined. Has the number of women been declining in this time period?

Redo the previous dataframe and include the total number of women. Also compute the ratio of births to women. Print the head and tail of this dataframe so we can see all three variables.

# Place your code here

yearly_summary <- by_state_0323 %>%
  group_by(Year) %>%
  summarize(
    TotalBirths = sum(Births),
    TotalWomen = sum(Fpop),
    BirthsPerWoman = TotalBirths / TotalWomen
  )

# Print the head and tail of the dataframe
print("Head of the dataframe:")
## [1] "Head of the dataframe:"
print(head(yearly_summary))
## # A tibble: 6 × 4
##    Year TotalBirths TotalWomen BirthsPerWoman
##   <dbl>       <dbl>      <dbl>          <dbl>
## 1  2003     4069873   61745355         0.0659
## 2  2004     4091267   61826371         0.0662
## 3  2005     4117159   61926703         0.0665
## 4  2006     4243709   62042926         0.0684
## 5  2007     4293868   62142781         0.0691
## 6  2008     4225221   62207012         0.0679
print("\nTail of the dataframe:")
## [1] "\nTail of the dataframe:"
print(tail(yearly_summary))
## # A tibble: 6 × 4
##    Year TotalBirths TotalWomen BirthsPerWoman
##   <dbl>       <dbl>      <dbl>          <dbl>
## 1  2015     3957547   63425384         0.0624
## 2  2016     3924784   63429600         0.0619
## 3  2017     3834747   63771779         0.0601
## 4  2018     3771254   63982759         0.0589
## 5  2019     3727127   64134984         0.0581
## 6  2020     3593850   64351519         0.0558

Task 3

The total fertility rate (TFR) is the number of births for a woman during her lifetime. The rate data we have is the number of births per woman per year while she is in one of the 5-year age groups.

How do we use the rate information to construct the TFR? We add up all of the rates for the individual age groups and multiply the sum by 5. Do this for the State of Washington. Plot the time series using geom_point() and plotly.

# Calculate TFR for Washington
washington_tfr <- by_state_0323 %>%
  filter(State == "Washington") %>%
  group_by(Year) %>%
  summarize(TFR = sum(Rate) * 5)

# Create the plot
g <- ggplot(washington_tfr, aes(x = Year, y = TFR)) +
  geom_point() +
  labs(title = "Total Fertility Rate in Washington State Over Time",
       x = "Year",
       y = "Total Fertility Rate") +
  theme_minimal()

# Convert to interactive plot
ggplotly(g)

Task 4

Repeat the exercise for all states. Again, use plotly so we will be able to identify states. In the aes(), add “group = State”. Also draw a red horizontal line at 2.1. Use the plotly framework I have set up.

Using the interactive graph, which states have the highest and lowest TFR values in 2003 and 2023?

# Calculate TFR for all states
all_states_tfr <- by_state_0323 %>%
  group_by(State, Year) %>%
  summarize(TFR = sum(Rate) * 5, .groups = 'drop')

# Create the ggplot
g <- ggplot(all_states_tfr, aes(x = Year, y = TFR, color = State, group = State)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 2.1, color = "red", linetype = "dashed") +
  labs(title = "Total Fertility Rate by State Over Time",
       x = "Year",
       y = "Total Fertility Rate") +
  theme_minimal() +
  theme(legend.position = "none")

# Convert to interactive plotly graph
ggplotly(g)

Task 5

Create a graph showing the TFR for the US as a whole. We can’t use the rate data directly. We need to compute the total numbers of births and total numbers of women for each age group and year. Then compute the TFR by adding the calculated rates and multiplying by 5. Plot this using plotly.

# Calculate TFR for the US as a whole
us_tfr <- by_state_0323 %>%
  group_by(Year, Age) %>%
  summarize(
    TotalBirths = sum(Births),
    TotalWomen = sum(Fpop),
    .groups = 'drop'
  ) %>%
  mutate(Rate = TotalBirths / TotalWomen) %>%
  group_by(Year) %>%
  summarize(TFR = sum(Rate) * 5)

# Create the ggplot
g <- ggplot(us_tfr, aes(x = Year, y = TFR)) +
  geom_line() +
  geom_point() +
  geom_hline(yintercept = 2.1, color = "red", linetype = "dashed") +
  labs(title = "Total Fertility Rate in the United States Over Time",
       x = "Year",
       y = "Total Fertility Rate") +
  theme_minimal()

# Convert to interactive plotly graph
ggplotly(g)