- Notes for this update
- Had some difficulties coding these last two weeks. Didn’t quite get
as far along as I had hoped, but I at least got some stuff mapped out. I
wrote some more (I’m up to 4-5 pages now, and its helped me with my
coding, I think). I created a new variable that examines how many times
a census tract has had an increasing number of issued building perits
from one 5 yr period to another. so the range is 0-3. This new variable
brought the number of dimensions in the PCA from three to four, which I
think is helpful for the LPA.
- I’ve been having a hard time getting separation of classes in the
LPA. I think I may need to introduce another new variable to help with
this. Hopefully we can talk about that at our meeting.
- Do you know anything about POMS (proportion of maximum)? I’ve found
that when I use this for the LPA instead of scaling the variables, I get
better separation of the classes. I’ve done some searching but I can’t
figure out why its helping me out. Below are some results from both
scaling and poms with the same data.
- I mapped out the results with the best separation in the three
cities I’m examining. The results don’t make sense to me, which is why I
think I may need to introduce another variable; I have some thoughts
about that.
- see you in the morning.
# PCA
pca_orig_inc <- PCA(bpsw_comb_inc[,c("inc","y00_04","y05_09","y10_14","y15_19", "pnhw00", "pnhw12", "pnhw19", "pedu00","pedu12","pedu19","mhhinc00","mhhinc12","mhhinc19","pop00","pop12","pop19")],
scale.unit = T)
## Warning in PCA(bpsw_comb_inc[, c("inc", "y00_04", "y05_09", "y10_14",
## "y15_19", : Missing values are imputed by the mean of the variable: you should
## use the imputePCA function of the missMDA package

## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

eigenval_originc <- pca_orig_inc$eig
eigvar_originc <- as.data.frame(eigenval_originc[,1:3])
print(eigvar_originc)
## eigenvalue percentage of variance cumulative percentage of variance
## comp 1 7.65724991 45.0426465 45.04265
## comp 2 3.09024537 18.1779140 63.22056
## comp 3 2.37743337 13.9849022 77.20546
## comp 4 1.07437952 6.3198795 83.52534
## comp 5 0.75618812 4.4481654 87.97351
## comp 6 0.45351914 2.6677597 90.64127
## comp 7 0.36071605 2.1218591 92.76313
## comp 8 0.31929404 1.8782002 94.64133
## comp 9 0.24562897 1.4448763 96.08620
## comp 10 0.13999370 0.8234923 96.90970
## comp 11 0.13154414 0.7737891 97.68348
## comp 12 0.11017585 0.6480932 98.33158
## comp 13 0.08809204 0.5181885 98.84977
## comp 14 0.06623267 0.3896039 99.23937
## comp 15 0.05044174 0.2967161 99.53609
## comp 16 0.04748785 0.2793403 99.81543
## comp 17 0.03137751 0.1845736 100.00000
#Scree plot
fviz_screeplot(pca_orig_inc,ncp = 10)

# examining first four dimensions
var_originc <- get_pca_var(pca_orig_inc)
head(var_originc$coord)
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## inc 0.2354447 0.3217447 -0.14099205 0.715390629 0.52342456
## y00_04 0.4114169 0.6827655 -0.17700287 -0.337668785 -0.12628804
## y05_09 0.4511454 0.7878772 -0.24364426 -0.171125094 -0.05458167
## y10_14 0.5194472 0.7657250 -0.24082660 0.001056546 -0.01358054
## y15_19 0.4324211 0.7796613 -0.25473639 0.017296643 0.06367327
## pnhw00 0.7890123 -0.2062235 -0.02126662 0.275483692 -0.33086247
corrplot::corrplot(var_originc$coord)

pca_orig_diminc <- as.data.frame(pca_orig_inc$ind$coord[,1:4])
dat_dim <- cbind(bpsw_comb_inc,pca_orig_diminc)
dat_dim <- dat_dim %>%
mutate(
dim1 = Dim.1,
dim2 = Dim.2,
dim3 = Dim.3,
dim4 = Dim.4
)
# comparing models
#pca_orig_diminc <- as.data.frame(pca_orig_diminc)
dat_dim%>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
estimate_profiles(1:4,
variances = c("equal", "varying"),
covariances = c("zero", "varying")
) %>%
compare_solutions(statistics = c("AIC", "BIC"))
## Compare tidyLPA solutions:
##
## Model Classes AIC BIC Warnings
## 1 1 17771.70 17812.07
## 1 2 17611.48 17677.08
## 1 3 17103.89 17194.72
## 1 4 15459.14 15575.21 Warning
## 6 1 17783.70 17854.35
## 6 2 14495.13 14641.48
## 6 3 13985.51 14207.56
## 6 4 13517.37 13815.13
##
## Best model according to AIC is Model 6 with 4 classes.
## Best model according to BIC is Model 6 with 4 classes.
##
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 6 with 4 classes.
Scale vs POMS
Scale
y1 <- dat_dim %>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
scale() %>%
estimate_profiles(4,
variances = "varying",
covariances = "varying") %>%
plot_profiles()

y2<- dat_dim %>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
scale() %>%
estimate_profiles(4,package = "MplusAutomation",
variances = "varying",
covariances = "varying") %>%
plot_profiles()

POMS
y3 <- dat_dim %>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
poms() %>%
estimate_profiles(4,
variances = "varying",
covariances = "varying") %>%
plot_profiles()

y4<- dat_dim %>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
poms() %>%
estimate_profiles(4,package = "MplusAutomation",
variances = "varying",
covariances = "varying") %>%
plot_profiles()

Working with model y3
v3 <- dat_dim %>%
dplyr::select(dim1,dim2,dim3,dim4) %>%
single_imputation() %>%
poms() %>%
estimate_profiles(4,
variances = "varying",
covariances = "varying") %>%
get_data()
dat_dim$class <- v3$Class
table with numbers within the classes
table(v3$Class)
##
## 1 2 3 4
## 410 413 286 40
Mapping out the results in the three cities