Introduction
This project is all about divorce rate in variaous socio-economic
section in the US. The source for all data is Decennial Census (years
1960 to 2000) and American Community Survey (years 2001-2012), via IPUMS
USA. The summary of variable names is as follows:
Header
|
Description
|
all
|
Total (or all men/women in sex-specific files)
|
HS
|
High school graduate or less (EDUCD < 65)
|
SC
|
Some college (EDUCD >= 65 & <= 100)
|
BAp
|
Bachelor’s degree or more (EDUCD > 100)
|
BAo
|
Bachelor’s degree, no graduate degree (EDUCD > 100 & <= 113)
|
GD
|
Graduate degree (EDUCD > 113)
|
White
|
Non-Hispanic white
|
Black
|
Black or African-American
|
Hisp
|
Hispanic of any race
|
NE
|
New England (REGION == 11)
|
MA
|
Mid-Atlantic (REGION == 12)
|
Midwest
|
Midwest (REGION == 21-23)
|
South
|
South (REGION == 31-34)
|
Mountain
|
Mountain West (REGION == 41)
|
Pacific
|
Pacific (REGION == 42)
|
poor
|
Family income in lowest 25%
|
mid
|
Family income in middle 50%
|
rich
|
Family income in top 25%
|
work
|
Employed 50+ weeks prior year
|
nowork
|
Not employed at least 50 weeks prior year
|
nokids_all
|
No own children living at home
|
kids_all
|
At least one own child living at home
|
df <- read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/marriage/divorce.csv')
head(df)
Use of filter to filter to get the data for the variable
‘all_3544’>0.1
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
df<-df|> filter(
all_3544>=0.1
)
df
Use of mutate() to create a variable ratio =
rich_4554/poor_4554
df<- df|> mutate(
ratio = rich_4554/poor_4554
)
df