Assignment 4

This assignment builds on what you did in the previous assignment. It uses the dataframe you saved at the end.

It also requires you to submit your work by posting a document on RPubs. This will allow you to create interactive graphs.

Setup

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

load("by_state_0323.RData")
ls()

## [1] "by_state_0323_clean"

Task 1

Make a graph showing the total number of births per year. Use plotly.

g <- by_state_0323_clean %>%
  group_by(Year) %>%
  summarise(total_births = sum(Births)) %>%
  ggplot(aes(x = Year, y = total_births)) +
  geom_line() +
  labs(title = "Total Number of Births per Year", x = "Year", y = "Total Births")

ggplotly(g)

Task 2

It is somewhat surprising that the total number of births in the US has declined. Has the number of women been declining in this time period?

Redo the previous dataframe and include the total number of women. Also compute the ratio of births to women. Print the head and tail of this dataframe so we can see all three variables.

# Add the ratio of births to women and print the head and tail
by_state_0323_clean <- by_state_0323_clean %>%
  mutate(birth_to_women_ratio = Births / Fpop)

# Display head and tail of the data
head(by_state_0323_clean)

## # A tibble: 6 × 7
##   State    Year Age     Fpop Births   Rate birth_to_women_ratio
##   <chr>   <dbl> <chr>  <dbl>  <dbl>  <dbl>                <dbl>
## 1 Alabama  2003 15-19 157477   8095 0.0514               0.0514
## 2 Alabama  2004 15-19 159259   8126 0.0510               0.0510
## 3 Alabama  2005 15-19 161701   7771 0.0481               0.0481
## 4 Alabama  2006 15-19 164708   8537 0.0518               0.0518
## 5 Alabama  2003 20-24 163919  18633 0.114                0.114 
## 6 Alabama  2004 20-24 163736  18581 0.113                0.113

tail(by_state_0323_clean)

## # A tibble: 6 × 7
##   State    Year Age    Fpop Births    Rate birth_to_women_ratio
##   <chr>   <dbl> <chr> <dbl>  <dbl>   <dbl>                <dbl>
## 1 Wyoming  2015 40-44 16075    127 0.0079               0.00790
## 2 Wyoming  2016 40-44 15785    135 0.00855              0.00855
## 3 Wyoming  2017 40-44 15958    116 0.00727              0.00727
## 4 Wyoming  2018 40-44 16085    147 0.00914              0.00914
## 5 Wyoming  2019 40-44 16592    134 0.00808              0.00808
## 6 Wyoming  2020 40-44 17327    135 0.00779              0.00779

Task 3

The total fertility rate (TFR) is the number of births for a woman during her lifetime. The rate data we have is the number of births per woman per year while she is in one of the 5-year age groups.

How do we use the rate information to construct the TFR? We add up all of the rates for the individual age groups and multiply the sum by 5. Do this for the State of Washington. Plot the time series using geom_point() and plotly.

# Calculate TFR for Washington
washington_data <- by_state_0323_clean %>%
  filter(State == "Washington") %>%
  group_by(Year) %>%
  summarise(TFR = sum(Rate) * 5)

# Plot the TFR for Washington
g <- washington_data %>%
  ggplot(aes(x = Year, y = TFR)) +
  geom_point() +
  labs(title = "Total Fertility Rate for Washington", x = "Year", y = "TFR")

ggplotly(g)

Task 4

Repeat the exercise for all states. Again, use plotly so we will be able to identify states. In the aes(), add “group = State”. Also draw a red horizontal line at 2.1. Use the plotly framework I have set up.

Using the interactive graph, which states have the highest and lowest TFR values in 2003 and 2023?

# Calculate TFR for all states
tfr_by_state <- by_state_0323_clean %>%
  group_by(State, Year) %>%
  summarise(TFR = sum(Rate) * 5)

## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.

# Plot TFR for all states and add a red line at 2.1
g <- tfr_by_state %>%
  ggplot(aes(x = Year, y = TFR, group = State)) +
  geom_line() +
  geom_hline(yintercept = 2.1, color = "red") +
  labs(title = "TFR for All States", x = "Year", y = "TFR")

ggplotly(g)

Task 5

Create a graph showing the TFR for the US as a whole. We can’t use the rate data directly. We need to compute the total numbers of births and total numbers of women for each age group and year. Then compute the TFR by adding the calculated rates and multiplying by 5. Plot this using plotly.

# Calculate TFR for the US
us_data <- by_state_0323_clean %>%
  group_by(Year) %>%
  summarise(total_births = sum(Births), total_women = sum(Fpop)) %>%
  mutate(TFR = total_births / total_women * 5)

# Plot the TFR for the US
g <- us_data %>%
  ggplot(aes(x = Year, y = TFR)) +
  geom_line() +
  labs(title = "TFR for the US", x = "Year", y = "Total Fertility Rate")

ggplotly(g)