Load packages
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(data.table, ggplot2, modelsummary, binsreg, knitr, broom, fixest)
Clear memory
rm(list = ls())
Set etable preferences
# The style of the table
my_style = style.tex("aer", model.format = "(i)")
# markdown = TRUE is only useful in Rmarkdown documents
setFixest_etable(style.tex = my_style,
page.width = "a4",
fitstat = ~ n,
markdown = FALSE)
Disable scientific notation
options(scipen=999)
Import dataset (using relative file paths)
atlas = fread("../data/raw/atlas.csv")
I’ve chosen to look at Grove City, OH. The two parts of the data I will be utilizing are “Income at age 35” and “Teenage birth rate”
Income at age 35 by census tract of residence as a child.
Teenage birthrate by census tract of residence as a child.
As shown in the stats and in the maps provided, the household income tends to slightly correlate with the teenage birthrate in Grove City, OH. For example, a wealthier district like Pinnacle has a teenage birth rate of 14%, while a less wealthy district like Jackson Homes has a rate of 21%.
us_mean_p25 = unlist(atlas[!is.na(count_pooled), .(kfr_pooled_p25_mean = weighted.mean(kfr_pooled_p25, w = count_pooled, na.rm = TRUE))])
oh_mean_p25 = unlist(atlas[!is.na(count_pooled) & state == 39, .(kfr_pooled_p25_mean = weighted.mean(kfr_pooled_p25, w = count_pooled, na.rm = TRUE))])
grove_city_mean_p25 = unlist(atlas[state == 39 & county == 49 & tract == 9740, .(kfr_pooled_p25)])
The US weighted mean is $34311.68, while the Ohio mean is $32301.59, the mean for the Grove City tract I found is $36438.137.
I lived in one of the districts above: Pinnacle. It has a mean income in the 25th percentile of $36,438. The 25th percentile of the United States income falls below this at around $34,311. The state of Ohio’s mean income is $32,301, below both Pinnacles and the US average. It is reasonable to conclude that a child born in Pinnacle’s 25th percentile of income would have a better chance at earning a higher income than someone born in the 25th percentile of income in the US or the 25th percentile of the rest of the state of Ohio.
us_mean_p75 = unlist(atlas[!is.na(count_pooled), .(kfr_pooled_p75_mean = weighted.mean(kfr_pooled_p75, w = count_pooled, na.rm = TRUE))])
oh_mean_p75 = unlist(atlas[!is.na(count_pooled) & state == 39, .(kfr_pooled_p75_mean = weighted.mean(kfr_pooled_p75, w = count_pooled, na.rm = TRUE))])
grove_city_mean_p75 = unlist(atlas[state == 39 & county == 49 & tract == 9740, .(kfr_pooled_p75)])
The US weighted mean is $51284.03, while the Ohio mean is $51069.48, the mean for the Grove City tract I found is $52967.348.
Pinnacle has a mean income in the 75th percentile of $52,967. The 75th percentile of the United States income falls below this at around $51,284. The state of Ohio’s mean income is $51,069, below both Pinnacles and the US average. It is reasonable to conclude that a child born in Pinnacle’s 75th percentile of income would have a better chance at earning a higher income than someone born in the 75th percentile of income in the US or the 75th percentile of the rest of the state of Ohio.
grove_data = atlas[state == 39 & county == 49 & !is.na(kfr_pooled_p25) & !is.na(kfr_pooled_p75)]
# Run linear regression
model = lm(kfr_pooled_p75 ~ kfr_pooled_p25, data = grove_data)
# Tidy model output
summary(model)
##
## Call:
## lm(formula = kfr_pooled_p75 ~ kfr_pooled_p25, data = grove_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18537.3 -3282.0 -150.1 3230.9 25115.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18360.64862 1196.33811 15.35 <0.0000000000000002 ***
## kfr_pooled_p25 0.95355 0.03761 25.36 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5638 on 281 degrees of freedom
## Multiple R-squared: 0.6959, Adjusted R-squared: 0.6948
## F-statistic: 642.9 on 1 and 281 DF, p-value: < 0.00000000000000022
# Create scatter plot with regression line
ggplot(grove_data, aes(x = kfr_pooled_p25, y = kfr_pooled_p75)) +
geom_point(color = "steelblue", alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "darkred") +
labs(
title = "Relationship Between 25th and 75th Percentile Child Outcomes",
x = "Income at Age 35 (25th Percentile)",
y = "Income at Age 35 (75th Percentile)"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
# Questions 5-8
In the county of Pinnacle, the median income varies between races. White residents of Pinnacle have a mean income of $54,000, black residents have a mean income of $45,000, and Hispanic residents earn much less at just $36,000. My hypothesis is that residents of Pinnacle County tend to make more than others across the country, and the state of Ohio is the wealthier side of the county. There are many neighborhoods in the area that are very standard compared to the rest of the city. However, I lived about a mile away from one section of the county that had an incredibly wealthy neighborhood, with houses going for $650,000 to over a million dollars. I believe this wealthier neighborhood greatly influences the mean of the area, which is why the data shows that people at age 35 tend to have a better income. I learned several things from this project, including several things about economic opportunity in Grove City. I found that household income at age 35 tends to be a good indicator of what your mean income could be. I also found that the teenage birthrate appears to be lower in counties with higher incomes. I formulated a hypothesis that explains why my selected county tends to be more favorable for living in compared to other places in the state and the United States. While household income does not determine someone’s outcome, it’s important to know that it will affect someone’s upbringing. In other words, someone who starts in a bad position will struggle more to succeed than someone who was born into a richer family.