Blog Post 3: Preliminary Modeling Results

Blog Post Instructions

Descriptive statistics (500 - 700 words, 1-2 tables, 2-3 figures). This is where you will describe the modeling approach you took in your project. Here you will describe the justification of which model you used, and describe the results of your statistical analysis based on the model(s) you ran.

Examining Gentrification and Neighborhood Change in Bexar County, Texas Using Principal Components Analysis

This study examines changing patterns of urban characteristics in Bexar County, Texas in 2019, with a focus on patterns of urban displacement and housing segregation through principal components analysis (PCA). This report pairs geospatial data visualization with the analysis to explore the spatial nature of gentrification in the county during this time period.

Research Questions

Given the multidimensionality of gentrification, this study aims to examine the following research questions:

In what census tracts have Bexar County neighborhoods experienced the greatest effects of gentrification?
How do various demographic, socioeconomic, and housing characteristics impact the gentrification process within Bexar County census tracts?

Data and Methods

This study utilizes data from the American Community Survey (ACS) and other Census data, made available by the U.S. Census Bureau. Five-year estimates of the following variables were referenced from the 2019 ACS:

the percentage of the total population by Black, Hispanic, and white racial/ethnic identity;
the percentage of the total population with a bachelor’s degree or higher;
the percentage of the total population living in poverty;
median house value;
median household income; and
median gross rent

Using the six indicators outlined above, this study illustrates the use of PCA to form an index of gentrification of Bexar County census tracts to examine how neighborhood gentrification has impacted Bexar County residents.

Prior academic research and guidelines set forth by the U.S. Department of Housing and Urban Development have indicated that census tracts at-risk of experiencing gentrification must have an income level that is less than 80% of the surrounding metropolitan area’s median income. This method is consistent with the widely accepted definition of gentrification being the process whereby residents inhabiting lower socioeconomic urban communities are displaced by more affluent residents– more specifically through inflated rent or property values brought forth by “urban renewal” policies.

Results

In examining the summary table of eigenvalues, the first principal component explains 68% of the total variance in the dataset, the second principal component represents 16.4%, the third 8%, fourth 4%, fifth 2%, sixth 1%, and seventh 0.2%. The first three components of the PCA model account for almost 93% of the variation in the input variables. The first variable has an eigenvalue greater than 4, the second has an eigenvalue greater than 1, and the third has an eigenvalue of 0.6.

According to summary statistics, the percent of the total population 25 years or older with a bachelors education or higher and of white racial/ethnic identity within a given census tract accounts for the most variation in the index. However, median home value, median household income, and median rent are all strong explanatory variables for the overarching latent variable of gentrification. The percent of the total population that is Hispanic is negatively correlated to the other factors indicating a poor fit of this variable for the index. The percent of the total population that is Black is uncorrelated to the other factors.

Below are the screeplot and radial plots displaying the eigenvalues configured for this PCA.

Appendix

The following section presents the R code used to produce the results examined in this report:

#load libraries
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidycensus)
library(dplyr)
library(tigris)

## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.

## 
## Attaching package: 'tigris'

## The following object is masked from 'package:tidycensus':
## 
##     fips_codes

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

library(pander)
library(ggplot2)
library(knitr)
library(sf)

## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE

library(purrr)

#download data
census_api_key(key = "d42eebdfb8a5be15b37eb8cef2a3abc37a71f12b", install = T, overwrite = T)

## Your original .Renviron will be backed up and stored in your R HOME directory if needed.

## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY"). 
## To use now, restart R or run `readRenviron("~/.Renviron")`

#download acs data from census api 2005-20
v2019_Profile = load_variables(2019,
                               "acs5/profile",
                               cache=T)

#search for variables using grep()
#v2019_Profile[grep(x = v2019_Profile$label,
                  # "value",
                   #ignore.case = T),
              #c("name", "label")]

#retrieve data
bexar_acs19 = get_acs(geography = "tract",
                             year = 2019,
                             state = "TX",
                             county = "Bexar County",
                             variables = c("DP05_0078P", "DP05_0071P", "DP05_0077P", 
                                           "DP02_0068P",
                                           "DP04_0089", "DP04_0134", "B19013_001M"),
                             output = "wide",
                             geometry = TRUE)

## Getting data from the 2015-2019 5-year ACS

## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.

## Fetching data by table type ("B/C", "S", "DP") and combining the result.

#rename variables and filter missing variables
bexar_acs19.pca = bexar_acs19 %>% 
  select("GEOID", "DP05_0078PE", "DP05_0071PE", "DP05_0077PE", "DP02_0068PE", "DP04_0089E", "DP04_0134E", "B19013_001E") %>% 
  rename(c(pct_black = "DP05_0078PE",
         pct_hisp = "DP05_0071PE",
         pct_white = "DP05_0077PE",
         pct_educ = "DP02_0068PE",
         med.value = "DP04_0089E",
         med.rent = "DP04_0134E",
         med.inc = "B19013_001E")) %>% 
  filter(complete.cases(pct_black, pct_hisp, pct_white, pct_educ, med.value, med.rent, med.inc))

head(bexar_acs19.pca)

library(FactoMineR)

## Warning: package 'FactoMineR' was built under R version 4.1.3

library(factoextra)

## Warning: package 'factoextra' was built under R version 4.1.3

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

bexar_acs19.pca2 = bexar_acs19.pca %>% 
  select(-"GEOID") %>% 
  st_set_geometry(NULL)

#z-score the data
bexar_pca.19.z = PCA(bexar_acs19.pca2[, c(1:7)],
                    scale.unit=T,
                    graph=F)

summary(bexar_pca.19.z)

#eigenvalues
eigenvalues = bexar_pca.19.z$eig
print(eigenvalues)

#summary of eigenvalues and variance
bexar_pca.19.z$var

#correlation summary
desc = dimdesc(bexar_pca.19.z)
desc$Dim.1

#screeplot
library(factoextra)
fviz_screeplot(bexar_pca.19.z)

#radial plot
fviz_pca_var(bexar_pca.19.z,
             col.var="contrib")+
  theme_minimal()