chptr1_third

Notes for this update
- 1. Had some difficulties coding these last two weeks. Didn’t quite get as far along as I had hoped, but I at least got some stuff mapped out. I wrote some more (I’m up to 4-5 pages now, and its helped me with my coding, I think). I created a new variable that examines how many times a census tract has had an increasing number of issued building perits from one 5 yr period to another. so the range is 0-3. This new variable brought the number of dimensions in the PCA from three to four, which I think is helpful for the LPA.
- 1. I’ve been having a hard time getting separation of classes in the LPA. I think I may need to introduce another new variable to help with this. Hopefully we can talk about that at our meeting.
- 1. Do you know anything about POMS (proportion of maximum)? I’ve found that when I use this for the LPA instead of scaling the variables, I get better separation of the classes. I’ve done some searching but I can’t figure out why its helping me out. Below are some results from both scaling and poms with the same data.
- 1. I mapped out the results with the best separation in the three cities I’m examining. The results don’t make sense to me, which is why I think I may need to introduce another variable; I have some thoughts about that.
- 1. see you in the morning.

# PCA  
pca_orig_inc <- PCA(bpsw_comb_inc[,c("inc","y00_04","y05_09","y10_14","y15_19", "pnhw00", "pnhw12", "pnhw19", "pedu00","pedu12","pedu19","mhhinc00","mhhinc12","mhhinc19","pop00","pop12","pop19")],
           scale.unit = T)

## Warning in PCA(bpsw_comb_inc[, c("inc", "y00_04", "y05_09", "y10_14",
## "y15_19", : Missing values are imputed by the mean of the variable: you should
## use the imputePCA function of the missMDA package

## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

eigenval_originc <- pca_orig_inc$eig
eigvar_originc <- as.data.frame(eigenval_originc[,1:3])
print(eigvar_originc)

##         eigenvalue percentage of variance cumulative percentage of variance
## comp 1  7.65724991             45.0426465                          45.04265
## comp 2  3.09024537             18.1779140                          63.22056
## comp 3  2.37743337             13.9849022                          77.20546
## comp 4  1.07437952              6.3198795                          83.52534
## comp 5  0.75618812              4.4481654                          87.97351
## comp 6  0.45351914              2.6677597                          90.64127
## comp 7  0.36071605              2.1218591                          92.76313
## comp 8  0.31929404              1.8782002                          94.64133
## comp 9  0.24562897              1.4448763                          96.08620
## comp 10 0.13999370              0.8234923                          96.90970
## comp 11 0.13154414              0.7737891                          97.68348
## comp 12 0.11017585              0.6480932                          98.33158
## comp 13 0.08809204              0.5181885                          98.84977
## comp 14 0.06623267              0.3896039                          99.23937
## comp 15 0.05044174              0.2967161                          99.53609
## comp 16 0.04748785              0.2793403                          99.81543
## comp 17 0.03137751              0.1845736                         100.00000

#Scree plot
fviz_screeplot(pca_orig_inc,ncp = 10)

# examining first four dimensions
var_originc <- get_pca_var(pca_orig_inc)
head(var_originc$coord)

##            Dim.1      Dim.2       Dim.3        Dim.4       Dim.5
## inc    0.2354447  0.3217447 -0.14099205  0.715390629  0.52342456
## y00_04 0.4114169  0.6827655 -0.17700287 -0.337668785 -0.12628804
## y05_09 0.4511454  0.7878772 -0.24364426 -0.171125094 -0.05458167
## y10_14 0.5194472  0.7657250 -0.24082660  0.001056546 -0.01358054
## y15_19 0.4324211  0.7796613 -0.25473639  0.017296643  0.06367327
## pnhw00 0.7890123 -0.2062235 -0.02126662  0.275483692 -0.33086247

corrplot::corrplot(var_originc$coord)

pca_orig_diminc <- as.data.frame(pca_orig_inc$ind$coord[,1:4])
dat_dim <- cbind(bpsw_comb_inc,pca_orig_diminc)

dat_dim <- dat_dim %>% 
  mutate(
    dim1 = Dim.1,
    dim2 = Dim.2,
    dim3 = Dim.3,
    dim4 = Dim.4
  )

# comparing models
#pca_orig_diminc <- as.data.frame(pca_orig_diminc)
dat_dim%>%
    dplyr::select(dim1,dim2,dim3,dim4) %>%
    single_imputation() %>%
    estimate_profiles(1:4, 
                      variances = c("equal", "varying"),
                      covariances = c("zero", "varying")
                      ) %>%
    compare_solutions(statistics = c("AIC", "BIC"))

## Compare tidyLPA solutions:
## 
##  Model Classes AIC      BIC      Warnings
##  1     1       17771.70 17812.07         
##  1     2       17611.48 17677.08         
##  1     3       17103.89 17194.72         
##  1     4       15459.14 15575.21 Warning 
##  6     1       17783.70 17854.35         
##  6     2       14495.13 14641.48         
##  6     3       13985.51 14207.56         
##  6     4       13517.37 13815.13         
## 
## Best model according to AIC is Model 6 with 4 classes.
## Best model according to BIC is Model 6 with 4 classes.
## 
## An analytic hierarchy process, based on the fit indices AIC, AWE, BIC, CLC, and KIC (Akogul & Erisoglu, 2017), suggests the best solution is Model 6 with 4 classes.

Scale vs POMS

Scale

y1 <- dat_dim %>% 
  dplyr::select(dim1,dim2,dim3,dim4) %>% 
  single_imputation() %>% 
  scale() %>% 
  estimate_profiles(4,
                    variances = "varying",
                    covariances = "varying") %>% 
  plot_profiles()

y2<- dat_dim %>% 
  dplyr::select(dim1,dim2,dim3,dim4) %>% 
  single_imputation() %>% 
  scale() %>% 
  estimate_profiles(4,package = "MplusAutomation",
                    variances = "varying",
                    covariances = "varying") %>% 
  plot_profiles()

POMS

y3 <- dat_dim %>% 
  dplyr::select(dim1,dim2,dim3,dim4) %>% 
  single_imputation() %>% 
  poms() %>% 
  estimate_profiles(4,
                    variances = "varying",
                    covariances = "varying") %>% 
  plot_profiles()

y4<- dat_dim %>% 
  dplyr::select(dim1,dim2,dim3,dim4) %>% 
  single_imputation() %>% 
  poms() %>% 
  estimate_profiles(4,package = "MplusAutomation",
                    variances = "varying",
                    covariances = "varying") %>% 
  plot_profiles()

Working with model y3

v3 <- dat_dim %>% 
  dplyr::select(dim1,dim2,dim3,dim4) %>% 
  single_imputation() %>% 
  poms() %>% 
  estimate_profiles(4,
                    variances = "varying",
                    covariances = "varying") %>% 
  get_data()


dat_dim$class <- v3$Class

table with numbers within the classes

table(v3$Class)

## 
##   1   2   3   4 
## 410 413 286  40

chptr1_third_update

Craig, Scott

2022-10-02

Scale vs POMS

Scale

POMS

Working with model y3

table with numbers within the classes

Mapping out the results in the three cities

Austin

Dallas

San Antonio