Some notes for this document:
WRT LPA, the main difference between Mclust and Mplus Automation
(seen within the tidyLPA part of this analysis), is that, while both use
MLE, the Mclust uses a more robust estimator IRT violations of
normality. It is also the default mechanism for use in the tidyLPA
package.
I had assumed that using PCA would produce results that would pass
normality tests….they don’t.
I’ve done some research about violations of normality and LPA, and from
what I’ve seen, statiticians think its abhorrent and social scientists
seem to think that it adds to the results?
I purposefully left some of the code visible as I think it helps when
viewing the various results.
Why do the titles for all of my maps end up in the first map? any
ideas?
eigvar <- as.data.frame(eigenval1[,1:2])
print(eigvar)
## eigenvalue percentage of variance
## comp 1 5.14387408 36.7419577
## comp 2 3.32067121 23.7190800
## comp 3 2.16624855 15.4732039
## comp 4 0.73810342 5.2721673
## comp 5 0.61289046 4.3777890
## comp 6 0.55693207 3.9780862
## comp 7 0.44559074 3.1827910
## comp 8 0.26488817 1.8920584
## comp 9 0.24949322 1.7820944
## comp 10 0.17503057 1.2502184
## comp 11 0.14555464 1.0396760
## comp 12 0.07933219 0.5666585
## comp 13 0.06826988 0.4876420
## comp 14 0.03312080 0.2365771
### Examining the top three dimensions
###
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## y00_04 0.6489001 0.6082697 -0.05225295 0.141896912 -0.15848383
## y05_09 0.7084368 0.6604710 -0.04494231 0.055408625 -0.11546698
## y10_14 0.7256974 0.6278711 -0.04028562 -0.035566365 -0.08096414
## y15_19 0.6832288 0.6455477 -0.03450893 -0.004409448 -0.03571526
## inc_tot 0.4508568 0.5402738 -0.08119735 -0.072174637 0.57096953
## pnhw00 0.6703959 -0.4592231 -0.01664213 -0.480797623 0.05248946
##
## Shapiro-Wilk normality test
##
## data: pca1_dim$dim1
## W = 0.56621, p-value < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: pca1_dim$dim2
## W = 0.63969, p-value < 2.2e-16
##
## Shapiro-Wilk normality test
##
## data: pca1_dim$dim3
## W = 0.67501, p-value < 2.2e-16
# LPA
## comparing models
pca1_dim <- as.data.frame(pca1_dim)
pca1_dim%>%
dplyr::select(dim1,dim2,dim3) %>%
single_imputation() %>%
estimate_profiles(1:3,
variances = c("equal", "varying"),
covariances = c("zero", "varying")
) %>%
compare_solutions(statistics = c("AIC", "BIC"))
## Compare tidyLPA solutions:
##
## Model Classes AIC BIC Warnings
## 1 1 34106.38 34142.03
## 1 2 33022.87 33082.29
## 1 3 31553.70 31636.89
## 6 1 34112.38 34165.85
## 6 2 Warning
## 6 3 Warning
##
## Best model according to AIC is Model 1 with 3 classes.
## Best model according to BIC is Model 1 with 3 classes.
##
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 3 classes.
# LPA
## comparing models
pca1_dim%>%
dplyr::select(dim1,dim2,dim3) %>%
single_imputation() %>%
estimate_profiles(1:3,
variances = c("equal", "varying"),
covariances = c("zero", "varying"),
package = "MplusAutomation"
) %>%
compare_solutions(statistics = c("AIC", "BIC"))
## Compare tidyLPA solutions:
##
## Model Classes AIC BIC Warnings
## 1 1 34106.38 34142.03
## 1 2 32116.38 32175.79
## 1 3 31290.20 31373.39
## 6 1 34112.38 34165.85
## 6 2 Warning
## 6 3 Warning
##
## Best model according to AIC is Model 1 with 3 classes.
## Best model according to BIC is Model 1 with 3 classes.
##
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 1 with 3 classes.
/
##
## 1 2 3
## 2417 271 124
##
## 0 1 2 3
## 1 1980 241 149 47
## 2 23 103 103 42
## 3 5 9 49 61
##
## 0 1 2 3
## 1 0.704 0.086 0.053 0.017
## 2 0.008 0.037 0.037 0.015
## 3 0.002 0.003 0.017 0.022
#### Table with by class
##
## 1 2 3
## 176 2592 44
##
## 0 1 2 3
## 1 16 39 59 62
## 2 1992 312 220 68
## 3 0 2 22 20
##
## 0 1 2 3
## 1 0.006 0.014 0.021 0.022
## 2 0.708 0.111 0.078 0.024
## 3 0.000 0.001 0.008 0.007
##
## austin dallas san_antonio
## 529 1748 535
## Questions?
When examining the SAT Mclust map I noticed the area up by the Rim
was showing ascending not gentrifying. I think this is because nothing
existed before that land got developed into stores and apartments. Is
there a way I can discriminate my analysis to exclude these
outcomes?
How does indentation work on HTML part of Rmarkdown? I’ve googled it and
still don’t find answers that work for me.
Overall, what do you think of the results? I was worried about moving
forward from here before we got a chance to discuss these results.