Introduction

This project is all about divorce rate in variaous socio-economic section in the US. The source for all data is Decennial Census (years 1960 to 2000) and American Community Survey (years 2001-2012), via IPUMS USA. The summary of variable names is as follows:
Header Description
all Total (or all men/women in sex-specific files)
HS High school graduate or less (EDUCD < 65)
SC Some college (EDUCD >= 65 & <= 100)
BAp Bachelor’s degree or more (EDUCD > 100)
BAo Bachelor’s degree, no graduate degree (EDUCD > 100 & <= 113)
GD Graduate degree (EDUCD > 113)
White Non-Hispanic white
Black Black or African-American
Hisp Hispanic of any race
NE New England (REGION == 11)
MA Mid-Atlantic (REGION == 12)
Midwest Midwest (REGION == 21-23)
South South (REGION == 31-34)
Mountain Mountain West (REGION == 41)
Pacific Pacific (REGION == 42)
poor Family income in lowest 25%
mid Family income in middle 50%
rich Family income in top 25%
work Employed 50+ weeks prior year
nowork Not employed at least 50 weeks prior year
nokids_all No own children living at home
kids_all At least one own child living at home
df <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/marriage/divorce.csv')
head(df)

Use of filter to filter to get the data for the variable ‘all_3544’>0.1

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
df<-df|> filter(
  all_3544>=0.1
)
df

Use of mutate() to create a variable ratio = rich_4554/poor_4554

df<- df|> mutate(
  ratio = rich_4554/poor_4554
)
df