San Antonio is one of the fastest growing cities in Texas. Between 2015 and 2020 the SA metro area added 300,000 new residents reaching a total of 2.6 million, and is expected to gain an additional 1 million residents by 2035 and another million by 2050 reaching a total of 4.4 million residents1. With population growth, an increased demand for land is expected which will in turn create neighborhood change pressures in more desirable locations throughout the SA metro area. [context]
Population growth –(+)–> Housing Demand –(+)–> Housing Prices –(+)–> Neighborhood Change
Assessing neighborhood change is key to identify signs of potential gentrification and displacement and ultimately the loss of cultural value in urban communities. This research main question is:
Population growth –(+)–> Housing Demand –(+)–> Housing Prices –(+)–> Neighborhood Change | ^ v | Household Income (Mediator)
Income functions as a mediator in this conceptual framework because it helps explain the relationship between housing demand and housing prices. As housing demand rises due to population growth, income levels influence the extent to which this demand translates into rising home prices and thus neighborhood change. Higher-income households can afford more expensive housing, accelerating price growth and shifting neighborhood composition.
Potential biases include:
1. Selection Bias: If income is unevenly distributed across study areas,
results may not be generalizable.
2. Measurement Bias: Using median household income may overlook income
inequality within neighborhoods.
3. Reverse Causality: Rising home prices may attract wealthier residents
rather than income driving price increases.
This empirical exercise will test and operationalize the following hypothesis:
This research will use American Community Survey (ACS) data from the
United States Census Bureau gathered using the tidycensus
R
Package. A census tract will be categorized as having disproportionate
growth if its growth value is over the county median value. Each of the
hypotheses previously stated will be operationalized as:
B25077_001
variable (Median Home Value in Dollars).B19013_001
variable (Median Household Income in
Dollars).The unit of analysis in this study is the census tract, which
provides a small-area geographic unit that allows for neighborhood-level
analysis. The timeframe for analysis spans from 2017 to 2022, which
allows for evaluating changes in home values (B25077_001
)
and household income (B19013_001
) over a five-year period.
The selected variables from the American Community Survey (ACS) 5-year
estimates are: - Median Home Value (B25077_001
) – Measured
in dollars, representing the median value of owner-occupied housing
units in each census tract. - Median Household Income
(B19013_001
) – Measured in dollars, representing the median
income of households in each census tract.
# Downloading median home value data
bexar_homevalue_17 <- get_acs(geography = "tract", variables = "B25077_001",
state = "TX", county = "Bexar", geometry = TRUE,year = 2017)
bexar_homevalue_22 <- get_acs(geography = "tract", variables = "B25077_001",
state = "TX", county = "Bexar", geometry = FALSE,year = 2022)
# Downloading median household income data
bexar_income_17 <- get_acs(geography = "tract", variables = "B19013_001",
state = "TX", county = "Bexar", geometry = TRUE, year = 2017)
bexar_income_22 <- get_acs(geography = "tract", variables = "B19013_001",
state = "TX", county = "Bexar", geometry = FALSE, year = 2022)
Disproportionate home value growth is calculated as:
\[ Disp_{c,t_f-t_i} =(\frac{value_{c,t_f}}{value_{c,t_i}})-1 \] where for any census tract \(c\), percent change values are calculated using the final year \(t_f\) and the initial year \(t_i\) values of the same variable (\(value\))
#Fixing variable names
names(bexar_homevalue_17)[names(bexar_homevalue_17)%in%c("estimate","moe")] <-c("estimate_mhv_17","moe_mhv_17")
names(bexar_homevalue_22)[names(bexar_homevalue_22)%in%c("estimate","moe")] <-c("estimate_mhv_22","moe_mhv_22")
names(bexar_income_17)[names(bexar_income_17) %in% c("estimate", "moe")] <- c("estimate_mhi_17", "moe_mhi_17")
names(bexar_income_22)[names(bexar_income_22) %in% c("estimate", "moe")] <- c("estimate_mhi_22", "moe_mhi_22")
#Merging data
bexar_mhv<-merge(bexar_homevalue_17,bexar_homevalue_22,by="GEOID",sort = F)
bexar_mhi<-merge(bexar_income_17,bexar_income_22,by="GEOID",sort = F)
#Calculating the percentage change
bexar_mhv$mhv_per_change<-round(((bexar_mhv$estimate_mhv_22/bexar_mhv$estimate_mhv_17)-1),2)
bexar_mhi$mhi_per_change<-round(((bexar_mhi$estimate_mhi_22/bexar_mhi$estimate_mhi_17)-1),2)
#Calculating the indicator variable of which neighborhood is disproportionate (over the median of the county) or not
county_median_mhv<-quantile(bexar_mhv$mhv_per_change,0.5,na.rm=T)
county_median_mhi<-quantile(bexar_mhi$mhi_per_change,0.5,na.rm=T)
bexar_mhv$disp_mhv_pc<-as.numeric(bexar_mhv$mhv_per_change>=county_median_mhv)
bexar_mhi$disp_mhi_pc<-as.numeric(bexar_mhi$mhi_per_change>=county_median_mhi)
Descriptive Statistic Table for Median Home Values (mhv) in Bexar County
`mhv 2017`<-summary(bexar_mhv$estimate_mhv_17) # 2017 summary statistics
`mhv 2022`<-summary(bexar_mhv$estimate_mhv_22) # 2022 summary statistics
`Per. Change` <-summary(bexar_mhv$mhv_per_change)
rbind(`mhv 2017`,`mhv 2022`,`Per. Change`)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## mhv 2017 46400.0 7.865e+04 1.211e+05 1.481041e+05 177850.00 675600.00 8
## mhv 2022 66800.0 1.333e+05 1.908e+05 2.205041e+05 270750.00 845400.00 9
## Per. Change -0.2 4.125e-01 5.150e-01 5.659942e-01 0.67 3.82 9
Descriptive Statistic Table for Median Household Income (mhi) in Bexar County
`mhi 2017` <- summary(bexar_mhi$estimate_mhi_17)
`mhi 2022` <- summary(bexar_mhi$estimate_mhi_22)
`Per. Change MHI` <- summary(bexar_mhi$mhi_per_change)
rbind(`mhi 2017`, `mhi 2022`, `Per. Change MHI`)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## mhi 2017 12098.00 36793.00 49589.00 5.688906e+04 70964.5 207188.00 4
## mhi 2022 16607.00 45324.00 61211.00 6.799392e+04 82537.0 225556.00 4
## Per. Change MHI -0.38 0.06 0.21 2.332565e-01 0.4 1.58 4
# Final merge: Combining Home Value (MHV) and Income (MHI) data
library(sf)
## Warning: package 'sf' was built under R version 4.4.3
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
bexar_final <- st_join(bexar_mhv, bexar_mhi, join = st_intersects)
library(classInt)
#defining breaks
bbins_mhv<-classIntervals(var = bexar_mhv$mhv_per_change,n = 5,style = "jenks")
bbins_mhi<-classIntervals(var = bexar_mhi$mhi_per_change,n = 5,style = "jenks")
library(RColorBrewer)
library(mapview)
mapviewOptions(fgb = F)
#the mapview function won't run as part of this markdown chunk, possibly due to an issue with the length of the referenced filepath, but this is the appropriate coding that does run successfully in the console itself:
mapview(bexar_mhv,zcol="mhv_per_change",col.regions=brewer.pal(9,"Greens"),at=bbins_mhv$brks)
mapview(bexar_mhi,zcol="mhi_per_change",col.regions=brewer.pal(9,"Greens"),at=bbins_mhi$brks)
#troubleshooting that did not work:
#install.packages("htmltools")
#library(htmltools)
#bexar_mhv <- bexar_mhv[!is.na(bexar_mhv$mhv_per_change), ]
#bexar_mhv <- st_transform(bexar_mhv, crs = 4326)
#print(mapview(bexar_mhv, zcol = "mhv_per_change", at = bbins_mhv$brks))
Below are maps showing the percentage change in Median Home Value and Median Household Income by census tract in Bexar County between 2017 and 2022:
# Map Interpretation The maps depict the percentage change in Median
Home Value (MHV) and Median Household Income (MHI) from 2017 to 2022,
with shading intensity representing the degree of change. Darker green
areas indicate higher growth, while lighter green and gray areas reflect
slower, stagnant, or negative changes.
In the first map (MHV change), the highest home value increases are concentrated in the central and northern parts of Bexar County, likely due to rising demand, new development, and gentrification. In contrast, areas on the western and southern edges show more moderate or stagnant growth, suggesting different economic conditions or lower housing market activity.
The second map (MHI change) reveals that income growth does not always align with home value appreciation. While some neighborhoods with high home value growth also experience rising incomes, others show a mismatch—where home values rise significantly, but income growth lags behind. This pattern may indicate potential affordability concerns or displacement risks in certain areas. Generally, slower income growth in the southern and western parts of the county could reflect economic stagnation or lower-wage job availability.
Comparing both maps highlights the areas where rising home values may be outpacing income growth, which could signal increasing affordability pressures and potential displacement of lower-income residents. ##### 1.7 Visualizing Data: Boxplot
library(ggplot2)
ggplot(bexar_mhi, aes(x = factor(disp_mhi_pc, labels = c("Low Growth", "High Growth")), y = mhi_per_change)) +
geom_boxplot(fill = "lightblue", color = "black") +
labs(title = "Boxplot of MHI Percentage Change by Disproportionate Status",
x = "Disproportionate Status",
y = "MHI Percentage Change") +
theme_minimal()
# Boxplot of MHI Percentage Change by Disproportionate Status
Below is the boxplot showing MHI percentage change by disproportionate status:
# Boxplot Interpretation This
boxplot illustrates the percentage change in median household income
(MHI) by disproportionate status, revealing notable differences in
income variability. Neighborhoods classified as High Growth in home
values exhibit both higher median income increases and greater
variability, with several outliers indicating significant income gains.
In contrast, Low Growth areas show a smaller median income change with
less variability, suggesting more stable but slower economic shifts.
# Chi-Square
# Create a contingency table
contingency_table <- table(bexar_mhv$disp_mhv_pc, bexar_mhi$disp_mhi_pc)
# Perform Chi-square test
chi_sq <- chisq.test(contingency_table)
chi_sq
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 9.1696, df = 1, p-value = 0.002461
To test if there is a significant association between the number of census tracts classified as having disproportionate income growth and disproportionate home value growth, an appropriate method is running a Chi-square test.
The contingency table shows how many census tracts fall into each combination of disproportionate home value growth and disproportionate income growth. The chisq.test() function tests if there is a significant association between the two categorical variables.
Since the p-value from the Chi-square test is less than 0.05, that means there IS significant association between disproportionate income growth and disproportionate home value growth. Given that the census tracts with disproportionate home value growth tend to also have disproportionate income growth, this sugggests that areas with high home value appreciation are also experiencing significant income increases, and vice versa. This could also mean that areas with rising home values (which may result from gentrification or increased demand) are also attracting higher-income households.
#T-Test
bexar_mhv_numeric <- bexar_mhv$mhv_per_change # Extract just the numeric column
# Subset into low and high growth groups
low_growth_mhi <- bexar_mhv_numeric[bexar_mhi$disp_mhi_pc == 0]
high_growth_mhi <- bexar_mhv_numeric[bexar_mhi$disp_mhi_pc == 1]
# Run the t-test
t_test_result <- t.test(low_growth_mhi, high_growth_mhi)
t_test_result
##
## Welch Two Sample t-test
##
## data: low_growth_mhi and high_growth_mhi
## t = -3.8859, df = 283.94, p-value = 0.000127
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.20977044 -0.06870909
## sample estimates:
## mean of x mean of y
## 0.4963743 0.6356140
To test if there’s a significant difference between the average values of median home value growth across the different categories of disproportionate median income growth, a t-test is an appropriate approach.
The low_growth_mhi and high_growth_mhi subsets hold the home value growth data for the corresponding categories of income growth. The function t.test() compares the means of these two groups. The output will provide a p-value. If the p-value is less than 0.05, the null hypothesis that there is no differenc ein means can be rejected.
Since the p-value is below 0.05, we can conclude that there is a statistically significant difference in the average median home value growth between areas with high and low disproportionate median income growth. Areas with high income growth tend to experience higher home value growth compared to areas with low income growth. This reinforces the idea that areas with rising incomes may also see higher demand for housing, leading to increased property values. This can be a sign of gentrification or economic development where wealthier populations are moving into previously lower-income neighborhoods, driving up both incomes and home values.
# Calculating the Pearson correlation between home value growth and income growth
correlation_result <- cor(bexar_mhv$mhv_per_change, bexar_mhi$mhi_per_change, method = "pearson", use = "complete.obs")
correlation_result
## [1] 0.2864498
To calculate the correlation between median home value growth and median income growth, the most appropriate statistical method is to use Pearson’s correlation coefficient. Pearson’s correlation coefficient measures the strength and direction of the linear relationship between two continuous variables. A coefficient of 1 indicates a perfect positive linear relationship; -1 indicates a perfect negative linear relationship; 0 indicates no linear relationship.
The correlation value of 0.29 indicates that there is a slight positive relationship between the two variables, meaning that as home values increase, income growth tends to increase as well, but the relationship is not very strong. This suggests that other factors might be influencing the two variables separately. For example, while higher home values may correlate with higher incomes in some areas, other factors such as housing affordability, gentrification, or local economic conditions may play a significant role.
2050 demographic projections. Alamo Area Metropolitan Planning Organization https://aampo-mobility-2050-atginc.hub.arcgis.com↩︎