Table of Contents


preface


I. Maps

 

A. San Antonio

B. Austin

C. Dallas

 

II. PCA

A. Eigenvalues

B. Screeplot

C. Correlation Table


III. LPA


A. Normality Checks

B. Best model fit

i. Mclust

ii. Mplus Automation


C. Actual LPA


i. Mclust

a. Maps using Mclust

1. San Antonio
2. Austin
3. Dallas


ii.. Mplus Automation

a. Maps using Mplus Automation

1. San Antonio
2. Austin
3. Dallas


IV. Questions




preface


 

Some notes for this document:
 

WRT LPA, the main difference between Mclust and Mplus Automation (seen within the tidyLPA part of this analysis), is that, while both use MLE, the Mclust uses a more robust estimator IRT violations of normality. It is also the default mechanism for use in the tidyLPA package.
I had assumed that using PCA would produce results that would pass normality tests….they don’t.
I’ve done some research about violations of normality and LPA, and from what I’ve seen, statiticians think its abhorrent and social scientists seem to think that it adds to the results?
I purposefully left some of the code visible as I think it helps when viewing the various results.




I. Maps showing increasing number of issued bldg permits



A. San Antonio

Why do the titles for all of my maps end up in the first map? any ideas?

B. Austin

C. Dallas

II. PCA

A. Eigenvalues

eigvar <- as.data.frame(eigenval1[,1:2])
print(eigvar)
##         eigenvalue percentage of variance
## comp 1  5.14387408             36.7419577
## comp 2  3.32067121             23.7190800
## comp 3  2.16624855             15.4732039
## comp 4  0.73810342              5.2721673
## comp 5  0.61289046              4.3777890
## comp 6  0.55693207              3.9780862
## comp 7  0.44559074              3.1827910
## comp 8  0.26488817              1.8920584
## comp 9  0.24949322              1.7820944
## comp 10 0.17503057              1.2502184
## comp 11 0.14555464              1.0396760
## comp 12 0.07933219              0.5666585
## comp 13 0.06826988              0.4876420
## comp 14 0.03312080              0.2365771

B. Screeplot

### Examining the top three dimensions
###

##             Dim.1      Dim.2       Dim.3        Dim.4       Dim.5
## y00_04  0.6489001  0.6082697 -0.05225295  0.141896912 -0.15848383
## y05_09  0.7084368  0.6604710 -0.04494231  0.055408625 -0.11546698
## y10_14  0.7256974  0.6278711 -0.04028562 -0.035566365 -0.08096414
## y15_19  0.6832288  0.6455477 -0.03450893 -0.004409448 -0.03571526
## inc_tot 0.4508568  0.5402738 -0.08119735 -0.072174637  0.57096953
## pnhw00  0.6703959 -0.4592231 -0.01664213 -0.480797623  0.05248946

III. LPA

A. Normality Checks

## 
##  Shapiro-Wilk normality test
## 
## data:  pca1_dim$dim1
## W = 0.56621, p-value < 2.2e-16
## 
##  Shapiro-Wilk normality test
## 
## data:  pca1_dim$dim2
## W = 0.63969, p-value < 2.2e-16
## 
##  Shapiro-Wilk normality test
## 
## data:  pca1_dim$dim3
## W = 0.67501, p-value < 2.2e-16

B.Comparing Models

i. Mclust

#   LPA

## comparing models
pca1_dim <- as.data.frame(pca1_dim)
pca1_dim%>%
    dplyr::select(dim1,dim2,dim3) %>%
    single_imputation() %>%
    estimate_profiles(1:3, 
                      variances = c("equal", "varying"),
                      covariances = c("zero", "varying")
                      ) %>%
    compare_solutions(statistics = c("AIC", "BIC"))
## Compare tidyLPA solutions:
## 
##  Model Classes AIC      BIC      Warnings
##  1     1       34106.38 34142.03         
##  1     2       33022.87 33082.29         
##  1     3       31553.70 31636.89         
##  6     1       34112.38 34165.85         
##  6     2                         Warning 
##  6     3                         Warning 
## 
## Best model according to AIC is Model 1 with 3 classes.
## Best model according to BIC is Model 1 with 3 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 3 classes.

ii. Mplus Automation

#   LPA




## comparing models

pca1_dim%>%
    dplyr::select(dim1,dim2,dim3) %>%
    single_imputation() %>%
    estimate_profiles(1:3, 
                      variances = c("equal", "varying"),
                      covariances = c("zero", "varying"),
                      package = "MplusAutomation"
                      ) %>%
    compare_solutions(statistics = c("AIC", "BIC"))
## Compare tidyLPA solutions:
## 
##  Model Classes AIC      BIC      Warnings
##  1     1       34106.38 34142.03         
##  1     2       32116.38 32175.79         
##  1     3       31290.20 31373.39         
##  6     1       34112.38 34165.85         
##  6     2                         Warning 
##  6     3                         Warning 
## 
## Best model according to AIC is Model 1 with 3 classes.
## Best model according to BIC is Model 1 with 3 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 3 classes.

B. The actual LPA

i. Mclust

Table with number of classes

/

## 
##    1    2    3 
## 2417  271  124

Class vs increasing bldg permits count and proportion

 

##    
##        0    1    2    3
##   1 1980  241  149   47
##   2   23  103  103   42
##   3    5    9   49   61
##    
##         0     1     2     3
##   1 0.704 0.086 0.053 0.017
##   2 0.008 0.037 0.037 0.015
##   3 0.002 0.003 0.017 0.022

ii. Maps of Mclust results

a. San Antonio

b. Austin

c. Dallas

ii. Mplus Automation results

#### Table with by class
 

## 
##    1    2    3 
##  176 2592   44

Class vs increasing bldg permits count and proportion

 

##    
##        0    1    2    3
##   1   16   39   59   62
##   2 1992  312  220   68
##   3    0    2   22   20
##    
##         0     1     2     3
##   1 0.006 0.014 0.021 0.022
##   2 0.708 0.111 0.078 0.024
##   3 0.000 0.001 0.008 0.007

A.Maps of Mplus Automation outcome BEWARE, the classes changed, so the colors on the maps don’t represent the same things from the previous maps.

i. San Antonio

## 
##      austin      dallas san_antonio 
##         529        1748         535

ii. Austin

iii. Dallas


## Questions?
   

When examining the SAT Mclust map I noticed the area up by the Rim was showing ascending not gentrifying. I think this is because nothing existed before that land got developed into stores and apartments. Is there a way I can discriminate my analysis to exclude these outcomes?

How does indentation work on HTML part of Rmarkdown? I’ve googled it and still don’t find answers that work for me.

Overall, what do you think of the results? I was worried about moving forward from here before we got a chance to discuss these results.