Explanation on this document and variables

The purpose of this document is to show how the Austin census tracts are shaping up in my research as well as to explore some of the preliminary PCA work I’ve been doing.

I’ve pulled the following variables from the Logan Thomas Database:

pnwXX percent NHW

peduXX # of people with 4 years of college / # of people at least 25

mhhincXX median hh income

I’ve also created five-year “bins” labeled y06_10, y11_15, and y16_20 that represent the total number of building permits issued for a given census tract in that five year period

In addition to those variables, I’ve created a few “change” variables for use in the latent profile analysis 9LPA0 later. I’ll just show the raw code below:

# filtering out high income neighborhood variables and creating variables for tidyLPA


bps_ausw <- bps_ausw%>% 
  filter(neig >=3) %>% 
  mutate(
    asc12 = y06_10 - y11_15,
    asc19 = y11_15 - y16_20,
    asctot = asc12 +asc19,
    asc_cat = ifelse(asc12 >= 0 & asc19 <= 0, "12",
                      ifelse(asc19 >= 0 & asc12 <= 0, "19",
                             ifelse(asc12 > 0 & asc19 > 0, "both",0))),
    ch_wh12 = pnhw12 - pnhw00,
    ch_wh19 = pnhw19 - pnhw12,
    ch_ed12 = pedu12 - pedu00,
    ch_ed19 = pedu19 - pedu12,
    asc12z = scale(asc12),
    asc19z = scale(asc19),
    asctotz = scale(asctot),
    ch_wh12z = scale(ch_wh12),
    ch_wh19z = scale(ch_wh19),
    ch_ed12z = scale(ch_ed12),
    ch_ed19z = scale(ch_ed19)
  ) %>% 
  na.omit()
#table(bps_ausw$asc_cat)
#trying to find the intersection of tracts that have an increase of issues bldg prmts over time
dat_bps_aus <- bps_ausw %>% 
  filter(asc12 > 0 & asc19 > 0)%>% 
  na.omit()
#trying to find the union of tracts that have an increase of issued bldg prmts over time
dat_bps_aus_or<- bps_ausw %>% 
  filter(asc12 > 0 | asc19 > 0) %>% 
  na.omit()

Map 1: Austin tracts increasing number of bldg permits

Mapping the tracts (Note the missing tracts are the tracts in the top %40 of mhhinc):

map with distinction between increasing number of bldg permits

Working on the legend, but the “0” means no increase in bldg permits, “12” means an increase from 2006-2010 to 2011-2015, “19” means an increase in building permits from 2011-2015 to 2016-2020, and “both” means and increase in both categories

Latent Profile Analysis

Jumping into LPA at this point leads to the following output (note the first graph shows the results of the LPA without using MplusAutomation (which seems to have been created in order to help LPA run better/smoother) and the second graph shows the results using it) :

As a reminder, I’m trying to distinguish ascending neighborhoods into two distnict classes, those ascensing sue to gentrification and those ascending but not due to gentrification.

Just TidyLPA

Using TidyLPA with MplusAutomation

The Mplus Automation seems a bit cleaner, but, overall, the results here seem a bit murky, so I’ve turned to PCA to see if I can eliminate some of the existing collinearity.

Principal Comoponent Analysis

I tried several combinations of variables, but the one that seems to be the cleanest and easiest to understand is utilizing the raw data and not manipulating it at all.

The following are the eigen values and the categories:

##        eigenvalue percentage of variance
## comp 1  4.2910741              35.758951
## comp 2  3.0360447              25.300373
## comp 3  1.6380112              13.650093
## comp 4  1.2792640              10.660533
## comp 5  0.6686684               5.572237
## comp 6  0.2960917               2.467431

##                 Dim.1      Dim.2       Dim.3        Dim.4        Dim.5
## y06_10   -0.140132886  0.8006512 -0.39754643  0.216503678  0.114568318
## y11_15   -0.005910495  0.8945147 -0.26319145  0.047828745  0.088241002
## y16_20   -0.282247129  0.7797818 -0.38603114  0.172230888  0.066799789
## pnhw00    0.695676090  0.1628920 -0.30325341 -0.555629829 -0.146355529
## pnhw12    0.899775563  0.1348092 -0.06783231 -0.211967886 -0.059843759
## pnhw19    0.309011263  0.2439938  0.57763409 -0.215069297  0.674875483
## pedu00    0.943599526 -0.0360922 -0.18414486 -0.100493058  0.007778948
## pedu12    0.924760054  0.1087139 -0.14744685 -0.082110701  0.032578370
## pedu19    0.130893594  0.6387956  0.65101019  0.045492959 -0.217257446
## mhhinc00  0.683888586 -0.1935121  0.01146400  0.643254800  0.003113459
## mhhinc12  0.736274125 -0.1368964  0.05669803  0.606451629  0.040426305
## mhhinc19  0.173757721  0.6365353  0.59045999  0.007644548 -0.335959897

The scree plot:

From the eigen values and the screeplot there appears to be four components of the PCA with the two largest of these componenets representing over %60 of the variance

##               Dim.1     Dim.2       Dim.3       Dim.4       Dim.5
## y06_10 -0.140132886 0.8006512 -0.39754643  0.21650368  0.11456832
## y11_15 -0.005910495 0.8945147 -0.26319145  0.04782875  0.08824100
## y16_20 -0.282247129 0.7797818 -0.38603114  0.17223089  0.06679979
## pnhw00  0.695676090 0.1628920 -0.30325341 -0.55562983 -0.14635553
## pnhw12  0.899775563 0.1348092 -0.06783231 -0.21196789 -0.05984376
## pnhw19  0.309011263 0.2439938  0.57763409 -0.21506930  0.67487548

And as I prefer pictures, I thought this helped with understanding the categories of the dimensions:

this one makes much more sense. Bigger means high level of association, blue means positive association and red is negative:

Using the four principal components in PCA in LPA

trying the recommended classes from earlier recommendation, 2

TidyLPA

##         Dim.1      Dim.2      Dim.3      Dim.4      Dim.5
## 1  3.30057217  1.7414419  1.3628310  0.7031743 -1.1985835
## 2  0.65361441  1.0380128 -1.9558061 -1.9123510 -0.7221169
## 3  0.04070208  1.1473277  0.4249807 -1.7744552 -0.2330693
## 4 -2.00508960  2.6922397 -1.5245914  0.4344906  1.0486206
## 5 -1.08756407 -0.8885329  0.8587591 -0.7473335 -0.7110226
## 6 -3.43059088  0.9649757  1.8040117  0.2781315  1.3700463

TidyLPA with MplusAutomation

Playing around with the LPA models

comparing models

## Compare tidyLPA solutions:
## 
##  Model Classes AIC      BIC      Warnings
##  1     1       1159.352 1178.206         
##  1     2       1160.960 1191.597         
##  1     3       1128.420 1170.841         
##  6     1       1171.352 1204.346         
##  6     2       1116.402 1184.747         
##  6     3       1110.288 1213.983 Warning 
## 
## Best model according to AIC is Model 6 with 3 classes.
## Best model according to BIC is Model 1 with 3 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 3 classes.

Looking at recommended model

trying with just the first two dimensions

two categories

extracting data

working with the tidyLPA with MplusAutomation two category output

## 
##   1   2 
## 152   4

trying to interpret the output. I’ve got two times the number of census tracts used in the output.

it also looks like I’ve only got two tracts that classify in the second classification… not the most promising so far, but the sample size is pretty small

Going to start working on increasing my sample size by geocoding Dallas and San Antonio

coding update

Craig, Scott

2022-05-03

Explanation on this document and variables

The purpose of this document is to show how the Austin census tracts are shaping up in my research as well as to explore some of the preliminary PCA work I’ve been doing.

I’ve pulled the following variables from the Logan Thomas Database:

pnwXX percent NHW

peduXX # of people with 4 years of college / # of people at least 25

mhhincXX median hh income

I’ve also created five-year “bins” labeled y06_10, y11_15, and y16_20 that represent the total number of building permits issued for a given census tract in that five year period

In addition to those variables, I’ve created a few “change” variables for use in the latent profile analysis 9LPA0 later. I’ll just show the raw code below:

Map 1: Austin tracts increasing number of bldg permits

Mapping the tracts (Note the missing tracts are the tracts in the top %40 of mhhinc):

map with distinction between increasing number of bldg permits

Working on the legend, but the “0” means no increase in bldg permits, “12” means an increase from 2006-2010 to 2011-2015, “19” means an increase in building permits from 2011-2015 to 2016-2020, and “both” means and increase in both categories

Map 2: Facet by category, exploring change in percent nhw from 2000 - 2019

This was just a brief look on what the Austin data looks like. Obviously more to do here.

Latent Profile Analysis

Jumping into LPA at this point leads to the following output (note the first graph shows the results of the LPA without using MplusAutomation (which seems to have been created in order to help LPA run better/smoother) and the second graph shows the results using it) :

As a reminder, I’m trying to distinguish ascending neighborhoods into two distnict classes, those ascensing sue to gentrification and those ascending but not due to gentrification.

Just TidyLPA

Using TidyLPA with MplusAutomation

The Mplus Automation seems a bit cleaner, but, overall, the results here seem a bit murky, so I’ve turned to PCA to see if I can eliminate some of the existing collinearity.

Principal Comoponent Analysis

I tried several combinations of variables, but the one that seems to be the cleanest and easiest to understand is utilizing the raw data and not manipulating it at all.

The following are the eigen values and the categories:

The scree plot:

From the eigen values and the screeplot there appears to be four components of the PCA with the two largest of these componenets representing over %60 of the variance

And as I prefer pictures, I thought this helped with understanding the categories of the dimensions:

this one makes much more sense. Bigger means high level of association, blue means positive association and red is negative:

Using the four principal components in PCA in LPA

trying the recommended classes from earlier recommendation, 2

TidyLPA

TidyLPA with MplusAutomation

Playing around with the LPA models

comparing models

Looking at recommended model

trying with just the first two dimensions

two categories

extracting data

working with the tidyLPA with MplusAutomation two category output

trying to interpret the output. I’ve got two times the number of census tracts used in the output.

it also looks like I’ve only got two tracts that classify in the second classification… not the most promising so far, but the sample size is pretty small

Going to start working on increasing my sample size by geocoding Dallas and San Antonio