Income, Nativity, and Home Ownership in Queens: A Statistical Analysis of ACS 2023 Data

Introduction

Home ownership remains a key indicator of economic stability and social mobility in the United States, yet access to it is shaped by a complex interplay of socioeconomic factors. In Queens, New York—one of the most diverse urban areas in the country—understanding who owns homes and why is especially important for informing housing policy and equity initiatives. This analysis uses data from the 2023 American Community Survey (ACS) to investigate the impact of income and nativity status (foreign-born vs. U.S.-born) on the likelihood of home ownership. By applying Poisson and negative binomial regression models, the study aims to quantify how significantly earning over $100,000 per year and being foreign-born influence home ownership rates, while also accounting for statistical challenges such as overdispersion. Through a series of regression models, marginal effect estimates, and visualizations, this report offers a clearer picture of the demographic and economic dynamics influencing home ownership in Queens.

rm(list = ls())
gc()
##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 526709 28.2    1174690 62.8   660385 35.3
## Vcells 961271  7.4    8388608 64.0  1770057 13.6
options(repos = c(CRAN = "https://cloud.r-project.org/"))



# Set working directory
directory <- "C:/Users/mikem/iCloudDrive/Spring 2025/DATA 712/Data sets"
setwd(directory)

# Set seed for reproducibility
set.seed(123)

# Load required libraries
if (!require("here")) install.packages("here", dependencies = TRUE)
install.packages(c("sf", "ggplot2", "dplyr", "clarify", "AER", "janitor", "pscl"))
## package 'sf' successfully unpacked and MD5 sums checked
## package 'ggplot2' successfully unpacked and MD5 sums checked
## package 'dplyr' successfully unpacked and MD5 sums checked
## package 'clarify' successfully unpacked and MD5 sums checked
## package 'AER' successfully unpacked and MD5 sums checked
## package 'janitor' successfully unpacked and MD5 sums checked
## package 'pscl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\mikem\AppData\Local\Temp\RtmpWYJUOr\downloaded_packages
library(clarify)
library(AER)
library(tidyverse)
library(pscl)


acs23 <- read.csv("2023 ACS Queens.csv", stringsAsFactors = FALSE)

For ease of coding, I simply created new vectors for the variables I would be using for this exercise.

Foreign_born <- acs23$Birthplace.type.and.entry_Foreign.born
Own <- acs23$Housing_Own
Income_over_100k <- acs23$Income_.100.000.and.up

Poisson Regression: Income and Home Ownership

m1 <- glm(Own ~ Income_over_100k, family = poisson, data = acs23)
summary(m1)
## 
## Call:
## glm(formula = Own ~ Income_over_100k, family = poisson, data = acs23)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      8.181e+00  3.852e-03  2124.1   <2e-16 ***
## Income_over_100k 8.429e-05  5.541e-07   152.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 70245  on 62  degrees of freedom
## Residual deviance: 48991  on 61  degrees of freedom
## AIC: 49651
## 
## Number of Fisher Scoring iterations: 4

The Poisson regression model examines the relationship between home ownership in Queens in 2023 and whether individuals earn over $100,000 annually. The results indicate a statistically significant positive association between higher income and home ownership. Specifically, the coefficient for individuals earning over $100K is 0.00008429, with a very small standard error and a z-value of 152.1, resulting in a p-value less than 2e-16. This suggests that as income crosses the $100K threshold, the expected log count of home ownership slightly increases. Although the effect size appears small, it is highly statistically significant, indicating that higher income is associated with a greater likelihood of owning a home. The model fit improves substantially with the inclusion of this income variable, as seen by the reduction in deviance from 70,245 to 48,991. Overall, the data supports the conclusion that earning over $100,000 is positively related to home ownership in Queens.

Marginal Effects of Income on Home Ownership (Poisson Model)

library(clarify)

# Simulate coefficients
sim_coefs1 <- sim(m1)

# Simulate average marginal effects
sim_est1 <- sim_ame(sim_coefs1, var = "Income_over_100k", contrast = "rd")

# Summarize the results
summary(sim_est1)
##                           Estimate 2.5 % 97.5 %
## E[dY/d(Income_over_100k)]    0.497 0.491  0.504

The marginal effects analysis provides a clearer interpretation of how income influences home ownership in Queens in 2023. Using the clarify package, the average marginal effect of earning over $100,000 per year on home ownership was estimated to be approximately 0.497. This means that, on average, individuals earning over $100K are nearly 50 percentage points more likely to own a home compared to those earning less than $100K, holding other factors constant. The 95% confidence interval ranges from 0.491 to 0.504, indicating high precision and statistical significance of this effect. This result confirms and strengthens the findings from the earlier Poisson regression: income is a strong and meaningful predictor of home ownership, and the transition to earning over $100,000 substantially increases the likelihood of owning a home in Queens.

Poisson Regression Including Foreign-Born Status

m2 <- glm(Own ~ Income_over_100k + Foreign_born, family = poisson, data = acs23)
summary(m2)
## 
## Call:
## glm(formula = Own ~ Income_over_100k + Foreign_born, family = poisson, 
##     data = acs23)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      8.159e+00  3.917e-03 2083.12   <2e-16 ***
## Income_over_100k 6.806e-05  7.069e-07   96.28   <2e-16 ***
## Foreign_born     6.368e-06  1.668e-07   38.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 70245  on 62  degrees of freedom
## Residual deviance: 47562  on 60  degrees of freedom
## AIC: 48223
## 
## Number of Fisher Scoring iterations: 4

In this updated Poisson regression model, the variable Foreign_born was added to assess whether being born outside the U.S. influences home ownership in Queens in 2023, alongside income. The results show that both income and nativity status are statistically significant predictors of home ownership. Specifically, individuals earning over $100,000 annually continue to show a strong positive association with home ownership, with a coefficient of 0.00006806 and a highly significant p-value (<2e-16). Additionally, being foreign-born is also positively associated with home ownership, with a coefficient of 0.06368 and a similarly significant p-value. Although the magnitude of the foreign-born effect is smaller compared to income, it still indicates that, holding income constant, foreign-born individuals are more likely to own homes than their U.S.-born counterparts. The residual deviance has further decreased from the previous model (from 48,991 to 47,562), and the AIC dropped to 48,223, indicating an improved model fit with the inclusion of nativity status. Overall, this model suggests that both higher income and being foreign-born are associated with increased likelihood of home ownership in Queens.

Updated Marginal Effect of Income (with Nativity Included)

library(clarify)

# Simulate coefficients
sim_coefs2 <- sim(m2)

# Simulate average marginal effects
sim_est2 <- sim_ame(sim_coefs2, var = "Income_over_100k", contrast = "rd")

# Summarize the results
summary(sim_est2)
##                           Estimate 2.5 % 97.5 %
## E[dY/d(Income_over_100k)]    0.402 0.393  0.410

After incorporating nativity status (Foreign_born) into the regression model, the average marginal effect of earning over $100,000 on home ownership in Queens was re-estimated using simulated coefficients. The new marginal effect is approximately 0.402, with a 95% confidence interval ranging from 0.394 to 0.410. This means that, holding nativity constant, individuals earning over $100K are about 40 percentage points more likely to own a home than those earning less. Compared to the previous model without nativity status (which showed an effect of ~49.7 percentage points), the effect of income has decreased slightly, suggesting that part of the income effect may have been confounded with nativity status. Nonetheless, income remains a strong and statistically significant predictor of home ownership. Including nativity provides a more nuanced and accurate picture of the socioeconomic factors influencing home ownership in Queens.

Marginal Effect of Being Foreign-Born

sim_est2a <- sim_ame(sim_coefs2, var = "Foreign_born",
                    contrast = "rd")

summary(sim_est2a)
##                       Estimate  2.5 % 97.5 %
## E[dY/d(Foreign_born)]   0.0376 0.0356 0.0396

The marginal effect of being foreign-born on home ownership in Queens was also estimated using simulated coefficients from the adjusted Poisson regression model. The average marginal effect is approximately 0.0376, with a 95% confidence interval ranging from 0.0356 to 0.0396. This means that, holding income constant, foreign-born individuals are about 3.8 percentage points more likely to own a home compared to U.S.-born individuals. While the effect is modest in size, it is statistically significant and suggests that foreign-born status is associated with a slight increase in the likelihood of home ownership. This finding adds nuance to the broader picture, indicating that nativity, though not as strong a predictor as income, still plays a meaningful role in shaping patterns of home ownership in Queens.

Dose-Response Function: Foreign-Born Status

sim_est2b <- sim_adrf(sim_coefs2, var = "Foreign_born",
                    contrast = "adrf")
plot(sim_est2b)

The plot generated from the sim_adrf() function illustrates the average dose-response function (ADRF) for the Foreign_born variable, showing how the expected number of homeowners changes as the number of foreign-born individuals increases. The graph reveals a clear and nearly linear positive relationship: as the number of foreign-born individuals rises, the expected count of homeowners also increases steadily. The shaded region around the line represents the 95% confidence interval, indicating the model’s uncertainty, which remains relatively narrow—suggesting strong precision in the estimates. This visual reinforces the earlier marginal effects finding by showing that foreign-born individuals contribute positively and consistently to home ownership levels in Queens, emphasizing their important role in the borough’s housing landscape.

Marginal Effects by Population Size (AMEF: Foreign-Born)

sim_est2c <- sim_adrf(sim_coefs2, var = "Foreign_born",
                    contrast = "amef")

plot(sim_est2c)

This plot represents the average marginal effect function (AMEF) for the Foreign_born variable, showing how the marginal effect of being foreign-born on home ownership varies across different values of the foreign-born population. The y-axis reflects the estimated marginal change in home ownership for each additional foreign-born individual, while the x-axis shows the number of foreign-born individuals in the population. The plot reveals a positive and slightly increasing trend, suggesting that the marginal effect of being foreign-born on home ownership grows stronger as the foreign-born population increases. The shaded area indicates the 95% confidence interval, which remains relatively tight, suggesting high confidence in the estimates. This visualization highlights that the impact of nativity status on home ownership is not static—it becomes more influential as the number of foreign-born individuals rises, potentially reflecting growing social or economic integration over time.

Overdispersion Test Results

dispersiontest(m2)
## 
##  Overdispersion test
## 
## data:  m2
## z = 6.5765, p-value = 2.408e-11
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion 
##   679.9466

The results of the overdispersion test for the Poisson regression model (m2) reveal strong evidence of overdispersion in the data. The test produced a z-score of 6.5765 with a highly significant p-value of 2.408e-11, indicating that the variance in the outcome variable (home ownership) is significantly greater than what the Poisson model assumes. The estimated dispersion parameter is approximately 680, far exceeding the ideal value of 1 expected under a true Poisson distribution. This suggests that the model’s assumption of equal mean and variance does not hold, and the standard errors produced by the Poisson model are likely underestimated. As a result, statistical inferences based on this model—such as p-values and confidence intervals—may be unreliable. Given this finding, it would be more appropriate to use a negative binomial regression model, which adjusts for overdispersion and yields more robust and trustworthy results.

Negative Binomial Regression Results

m3 <- MASS::glm.nb(Own ~ Income_over_100k + Foreign_born, data = acs23)
summary(m3)
## 
## Call:
## MASS::glm.nb(formula = Own ~ Income_over_100k + Foreign_born, 
##     data = acs23, init.theta = 6.195295689, link = log)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      8.066e+00  1.221e-01  66.060  < 2e-16 ***
## Income_over_100k 8.508e-05  2.471e-05   3.443 0.000576 ***
## Foreign_born     5.949e-06  5.663e-06   1.051 0.293459    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(6.1953) family taken to be 1)
## 
##     Null deviance: 90.030  on 62  degrees of freedom
## Residual deviance: 64.716  on 60  degrees of freedom
## AIC: 1154.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  6.20 
##           Std. Err.:  1.08 
## 
##  2 x log-likelihood:  -1146.878

This output presents the results of a negative binomial regression model (m3) used to address the overdispersion found in the earlier Poisson model predicting home ownership. The model includes two predictors: Income_over_100k and Foreign_born. The coefficient for Income_over_100k remains statistically significant (p = 0.000576), indicating that individuals earning over $100,000 are still significantly more likely to own a home, even after accounting for overdispersion. However, the coefficient for Foreign_born is no longer statistically significant (p = 0.293459), suggesting that once overdispersion is properly handled, the previously observed effect of nativity status on home ownership is no longer robust. The dispersion parameter (Theta) is estimated at 6.20, confirming the presence of overdispersion that warranted the use of this model. Overall, the negative binomial model provides a more reliable estimation framework, and the results reinforce the importance of income in determining home ownership, while casting doubt on the independent effect of nativity status.

Marginal Effect of Income (Negative Binomial Model)

sim_coefs3 <- sim(m3)
sim_est3 <- sim_ame(sim_coefs3, var = "Income_over_100k",
                    contrast = "rd")
summary(sim_est3)
##                           Estimate 2.5 % 97.5 %
## E[dY/d(Income_over_100k)]    0.506 0.223  0.827

The marginal effects output from the negative binomial regression model provides a more robust estimate of the impact of income on home ownership after correcting for overdispersion. The average marginal effect of earning over $100,000 is approximately 0.506, with a 95% confidence interval ranging from 0.223 to 0.827. This indicates that, on average, individuals earning above this threshold are about 51 percentage points more likely to own a home than those earning less, even when accounting for variability in the data. Although the confidence interval is wider than in the earlier Poisson models—reflecting greater uncertainty—the effect remains substantively large and statistically significant. These results reinforce the conclusion that income is a powerful predictor of home ownership in Queens, and they affirm the appropriateness of using a negative binomial model for more accurate inference in the presence of overdispersion.

Marginal Effect of Foreign-Born Status (Negative Binomial Model)

sim_est3a <- sim_ame(sim_coefs3, var = "Foreign_born",
                    contrast = "rd")
summary(sim_est3a)
##                       Estimate   2.5 %  97.5 %
## E[dY/d(Foreign_born)]   0.0354 -0.0260  0.1054

The marginal effect of being foreign-born on home ownership, as estimated from the negative binomial regression model, is approximately 0.0354, suggesting a modest 3.5 percentage point increase in the likelihood of owning a home. However, the 95% confidence interval ranges from -0.0260 to 0.1054, which includes zero. This indicates that the effect is not statistically significant, and we cannot confidently conclude that being foreign-born has a meaningful impact on home ownership once overdispersion is accounted for. These findings suggest that any observed association between nativity status and home ownership in earlier models may have been overstated, and that income remains the more consistent and significant predictor.

Dose-Response Curve: Foreign-Born (Negative Binomial)

sim_est3b <- sim_adrf(sim_coefs3, var = "Foreign_born",
                    contrast = "adrf")
plot(sim_est3b)

This plot illustrates the average dose-response function (ADRF) for the Foreign_born variable based on the negative binomial model. The graph depicts the expected number of homeowners (E[Y(Foreign_born)]) as the number of foreign-born individuals increases. While the central trend line shows a slight upward trajectory, indicating a potential positive relationship between the foreign-born population and home ownership, the wide and expanding confidence intervals (shaded area) reflect substantial uncertainty in the estimates—especially at higher levels of the foreign-born population. This visual reinforces the earlier marginal effect findings: although there may be a weak positive trend, the relationship between being foreign-born and home ownership is not statistically robust once overdispersion is addressed. The growing uncertainty at higher population levels also suggests that predictions become less reliable as the number of foreign-born individuals increases.

Marginal Effect Function: Foreign-Born (Negative Binomial)

sim_est3b <- sim_adrf(sim_coefs3, var = "Foreign_born",
                    contrast = "amef")
plot(sim_est3b)

This plot shows the average marginal effect function (AMEF) for the Foreign_born variable using the negative binomial regression model. The y-axis represents the marginal effect of being foreign-born on home ownership, while the x-axis reflects the number of foreign-born individuals. The central line shows a slight upward trend, suggesting that the marginal effect of being foreign-born may increase slightly as the foreign-born population grows. However, the shaded confidence band is wide and overlaps with zero across much of the range, indicating a high degree of uncertainty and lack of statistical significance. This reinforces the earlier findings that, although there may be a minor positive trend, the effect of being foreign-born on home ownership is not robust and should be interpreted with caution. In essence, the influence of nativity status remains weak and uncertain after accounting for overdispersion in the data.

Conclusion

This study explored the relationship between income, nativity status, and home ownership in Queens using American Community Survey data from 2023. Initial Poisson regression models suggested that earning over $100,000 per year was a strong and statistically significant predictor of home ownership, with marginal effects indicating that high earners were nearly 50 percentage points more likely to own a home. Adding nativity status to the model showed that being foreign-born was also associated with a modest increase in the likelihood of home ownership. However, further testing revealed significant overdispersion in the data, indicating that the Poisson model assumptions were violated. This led to the use of a more appropriate negative binomial regression model to obtain more reliable estimates.

Once overdispersion was accounted for, income remained a significant and meaningful predictor of home ownership, with marginal effects still showing a substantial increase in ownership likelihood among high earners. In contrast, the effect of being foreign-born was no longer statistically significant, and its marginal effect estimates showed wide confidence intervals overlapping zero. Visualization of dose-response and marginal effect functions further reinforced this uncertainty, highlighting that while foreign-born status may show a slight positive trend in some contexts, it lacks statistical robustness when properly modeled. Overall, the findings underscore the central role of income in shaping home ownership in Queens, while suggesting that nativity status has, at best, a minimal and uncertain influence once statistical noise and model assumptions are properly addressed.