En el presente documento hago analisis factorial con los 14 instrumentos de comportamiento del estudio de CBT en CAEs en Bogotá.
Hago un análisis de correlaciones, para tratar de entender las relaciones que posteriormente estimaremos con el análisis factorial.
| n | mean | sd | median | trimmed | mad | min | max | |
|---|---|---|---|---|---|---|---|---|
| TIME_C | 377 | 0.387 | 0.488 | 0.000 | 0.360 | 0.000 | 0.000 | 1.000 |
| STD_AQ_PT | 377 | 0.011 | 0.993 | 0.173 | 0.060 | 1.036 | -2.622 | 1.781 |
| STD_AQ_VT | 377 | 0.007 | 1.006 | 0.020 | -0.002 | 1.021 | -2.047 | 2.087 |
| STD_AQ_RT | 377 | 0.004 | 1.005 | 0.009 | 0.037 | 1.114 | -2.139 | 1.726 |
| STD_AQ_HT | 377 | -0.003 | 1.002 | -0.001 | 0.009 | 1.153 | -2.667 | 2.220 |
| STD_AQ_T | 377 | -0.009 | 1.005 | -0.014 | -0.022 | 1.047 | -2.505 | 2.887 |
| STD_SEF_T | 377 | 0.008 | 0.980 | 0.111 | 0.067 | 0.976 | -4.122 | 1.805 |
| STD_SE_T | 377 | 0.001 | 1.004 | 0.057 | 0.013 | 1.101 | -3.803 | 2.211 |
| STD_FT_T | 377 | 0.011 | 0.998 | 0.056 | 0.014 | 1.056 | -3.504 | 2.904 |
| STD_SS_T | 377 | -0.007 | 1.006 | 0.117 | 0.093 | 0.985 | -4.125 | 1.395 |
| STD_DT_T | 377 | 0.021 | 0.988 | 0.209 | 0.115 | 1.046 | -2.304 | 1.267 |
| STD_PER_T | 377 | 0.013 | 0.999 | 0.125 | 0.054 | 0.975 | -3.445 | 1.629 |
| STD_CNTRL_T | 377 | 0.008 | 1.000 | 0.043 | 0.020 | 0.892 | -3.088 | 2.330 |
| STD_STRS_T | 377 | -0.006 | 0.994 | 0.120 | 0.007 | 1.086 | -2.144 | 2.051 |
| STD_ANX_T | 377 | -0.008 | 1.006 | 0.050 | 0.007 | 1.039 | -2.180 | 1.834 |
| STD_DEP_T | 377 | -0.010 | 0.999 | 0.045 | 0.022 | 1.053 | -2.086 | 1.643 |
| STD_AGG_T | 377 | 0.005 | 1.003 | 0.031 | 0.054 | 1.179 | -2.241 | 1.617 |
| STD_INH_T | 377 | 0.003 | 1.007 | -0.023 | 0.012 | 0.834 | -2.940 | 2.648 |
| STD_GANG_T | 377 | 0.001 | 1.002 | -0.019 | 0.003 | 1.208 | -1.863 | 1.610 |
Note que por tratarse de datos panel, podemos estar inflando las correlaciones porque observamos al individuo antes y después del programa. Por lo tanto -y siguiendo la metodología de Attanasio-, divido la base por tiempo. Analizo nuevamente la estructura de correlaciones.
La matedología usada por Attanasio es descrita como: “The EFA is performed decomposing the polychoric correlation matrix of the items and using weighted least squares, and the solution is rescaled using oblique factor rotation (oblimin).”
Attanasio usa una “polychoric correlation matrix” porque sus variables son categóricas. Las matrices de esta naturaleza permiten trabajar con variable categóricas de dos o más categoría.
En nuestro caso, uso una matriz de correlación de Pearsons -el tratamiento estándar-, dado que nuestras varaibles han sido estandarizadas y se pueden tratar como continuas.
Siguiendo la metodología de Attanasio, se estiman las estructuras latentes para antes y después de la intervención. Usar la muestra completa violaría el supuesto de independencia y podría inglar las correlaciones entre las varaibles.
Comienzo por definir el número óptimo de estructuras latentes que se pueden extraer de los datos. Attanasio menciona que:
“As pointed out in Conti et al. (2014), there is relatively little agreement among procedures; this is the case especially for the Rutter items in the BCS data, where diferent methods suggest to retain between 1 and 3 factors, while most methods suggest to retain 2 factors in the MCS. In our analysis, we adopt two factors and a dedicated measurement system, where each measure reflects only one factor. This choice is justified both by the child psychology literature cited above, and as compromise to work with the same number of factors in the two cohorts.”
Attanasio escoge 2. Para la base de 1970 tres metodologías recomendaban 3 y en la base de 2000 5 metodologías recomentaron 2.
Para las observaciones antes de la intervención tenemos:
Para las observaciones después de la intervención tenemos:
Conjuntamente se tiene:
| Método | Número Óptimo - Antes | Número Óptimo - Después |
|---|---|---|
| Optimal Coordinates | 2 | 2 |
| Acceleration Factor | 2 | 2 |
| Parallel Analysis | 2 | 2 |
| Eigenvalues (Kaiser Criterion) | 2 | 2 |
| Velicer MAP | 2 | 2 |
| BIC | 3 | 3 |
| Sample Size Adjusted BIC | 6 | 7 |
| VSS Complexity 1 | 1 | 4 |
| VSS Complexity 2 | 2 | 2 |
Dado el resultado anterior, se decide estimar solo dos factores.
Dado el punto anterior, debemos hacer la estimación con dos factores. La estimación se realizará, siguiendo la metodología de Attanasio, por Mínimos Cuadrados Ponderas con el paquete psych. Además, usamos una rotación de tipo oblimin para facilitar el análisis. -Este tipo de rotación hace parte de los metodos oblicuos: “oblique rotation methods assume that the factors are correlated”.
antes_fa <- fa(antes[,c(7:20)], nfactors=2,fm="wls", rotate = "oblimin")
fa.diagram(antes_fa)despues_fa <- fa(despues[,c(7:20)], nfactors=2,fm="wls", rotate = "oblimin")
fa.diagram(despues_fa)x <- loadings(antes_fa,cut=0,digits=3)
x <- as.data.frame.array(x)
y <- loadings(despues_fa,cut=0,digits=3)
y <- as.data.frame.array(y)
tabla_lod <- data.frame(item=c(1:14), Var=c("STD_AQ_T", "STD_SEF_T","STD_SE_T","STD_FT_T","STD_SS_T", "STD_DT_T", "STD_PER_T", "STD_CNTRL_T", "STD_STRS_T", "STD_ANX_T", "STD_DEP_T", "STD_AGG_T", "STD_INH_T", "STD_GANG_T"), fac_1_1=x$WLS1, fac_1_2=x$WLS2, fac_2_1=y$WLS1, fac_2_2=y$WLS2)
tabla_lod%>%
kable(digits = 3, align = c("c", "l", "c", "c", "c", "c"), col.names = c("Item", "Variable", "Factor 1 - Antes", "Factor 2 - Antes", "Factor 1 - Después", "Factor 2 - Después"), caption = "Esta tabla replica la Tabla A7 de Attanasio, y establece los loadings de cada varaible por cohorte de los datos.")%>%
kable_styling()| Item | Variable | Factor 1 - Antes | Factor 2 - Antes | Factor 1 - Después | Factor 2 - Después |
|---|---|---|---|---|---|
| 1 | STD_AQ_T | 0.768 | 0.003 | 0.804 | -0.170 |
| 2 | STD_SEF_T | 0.030 | 0.720 | 0.038 | 0.741 |
| 3 | STD_SE_T | 0.259 | 0.550 | 0.387 | 0.516 |
| 4 | STD_FT_T | 0.449 | 0.313 | 0.605 | 0.150 |
| 5 | STD_SS_T | -0.051 | 0.659 | -0.131 | 0.699 |
| 6 | STD_DT_T | 0.182 | 0.313 | -0.014 | 0.614 |
| 7 | STD_PER_T | -0.208 | 0.742 | -0.097 | 0.681 |
| 8 | STD_CNTRL_T | 0.021 | 0.591 | 0.020 | 0.624 |
| 9 | STD_STRS_T | 0.920 | -0.090 | 0.963 | -0.113 |
| 10 | STD_ANX_T | 0.831 | -0.110 | 0.897 | -0.057 |
| 11 | STD_DEP_T | 0.807 | 0.018 | 0.852 | 0.123 |
| 12 | STD_AGG_T | 0.657 | 0.155 | 0.773 | 0.058 |
| 13 | STD_INH_T | 0.509 | 0.489 | 0.654 | 0.401 |
| 14 | STD_GANG_T | 0.603 | 0.094 | 0.625 | -0.044 |
Siguinedo la metodología de Attanasio, realizamos el análisis factorial confirmatorio por grupo. En nuestro caso tenemos dos diferencias con la metodología de Attanasio:
Nuestras varaibles son continuas, por lo que no necesitamos threshold invariance.
Dado que tenemos varios indicadores observables y solo dos factores, hacemos la estimación con máxima verosimilitud y no con Mínimos Cuadrados Ponderados como hace Attanasio. “Weighted least squares is recommended when you have many factors and not so many factor indicators. Maximum likelihood is recommended when you have few factors and many factor indicators. Results should not differ.”
Así, pues, el modelo configural model es:
Con el propósito de hacer el análisis de Measurement Invariance establecemos 3 modelos en adición al modelo configural:
configural <- cfa(model = modelo1, estimator="ML", data = data, group = "TIME_C", parameterization = "theta",
std.lv=TRUE, orthogonal=F)
parameterEstimates(configural, standardized=TRUE) %>%
filter(op == "=~") %>%
filter(lhs == "factor1") %>%
mutate(stars = ifelse(pvalue < .001, "***",
ifelse(pvalue < .01, "**",
ifelse(pvalue < .05, "*", "")))) %>%
select('Latent Factor'=lhs,
'Grupo'= group,
Indicator=rhs,
B=est,
SE=se, Z=z,
Beta=std.all,
sig=stars) %>%
kable(digits = 3, format="pandoc", caption="Factor Loadings for Factor 1")%>%
kable_styling()
| Latent Factor | Grupo | Indicator | B | SE | Z | Beta | sig |
|---|---|---|---|---|---|---|---|
| factor1 | 1 | STD_AQ_T | 0.714 | 0.058 | 12.290 | 0.716 | *** |
| factor1 | 1 | STD_STRS_T | 0.915 | 0.051 | 17.985 | 0.920 | *** |
| factor1 | 1 | STD_ANX_T | 0.860 | 0.054 | 15.879 | 0.853 | *** |
| factor1 | 1 | STD_DEP_T | 0.843 | 0.054 | 15.670 | 0.846 | *** |
| factor1 | 1 | STD_AGG_T | 0.635 | 0.060 | 10.535 | 0.637 | *** |
| factor1 | 1 | STD_INH_T | 0.617 | 0.064 | 9.695 | 0.596 | *** |
| factor1 | 1 | STD_GANG_T | 0.616 | 0.061 | 10.136 | 0.618 | *** |
| factor1 | 2 | STD_AQ_T | 0.749 | 0.072 | 10.346 | 0.741 | *** |
| factor1 | 2 | STD_STRS_T | 0.936 | 0.061 | 15.274 | 0.948 | *** |
| factor1 | 2 | STD_ANX_T | 0.903 | 0.063 | 14.430 | 0.918 | *** |
| factor1 | 2 | STD_DEP_T | 0.895 | 0.064 | 13.909 | 0.898 | *** |
| factor1 | 2 | STD_AGG_T | 0.750 | 0.071 | 10.497 | 0.748 | *** |
| factor1 | 2 | STD_INH_T | 0.665 | 0.070 | 9.497 | 0.696 | *** |
| factor1 | 2 | STD_GANG_T | 0.601 | 0.077 | 7.852 | 0.599 | *** |
parameterEstimates(configural, standardized=TRUE) %>%
filter(op == "=~") %>%
filter(lhs == "factor2") %>%
mutate(stars = ifelse(pvalue < .001, "***",
ifelse(pvalue < .01, "**",
ifelse(pvalue < .05, "*", "")))) %>%
select('Latent Factor'=lhs,
'Grupo'= group,
Indicator=rhs,
B=est,
SE=se, Z=z,
Beta=std.all,
sig=stars) %>%
kable(digits = 3, format="pandoc", caption="Factor Loadings for Factor 2")%>%
kable_styling()| Latent Factor | Grupo | Indicator | B | SE | Z | Beta | sig |
|---|---|---|---|---|---|---|---|
| factor2 | 1 | STD_SEF_T | 0.726 | 0.057 | 12.682 | 0.783 | *** |
| factor2 | 1 | STD_SS_T | 0.670 | 0.061 | 10.943 | 0.697 | *** |
| factor2 | 1 | STD_PER_T | 0.641 | 0.068 | 9.487 | 0.621 | *** |
| factor2 | 1 | STD_CNTRL_T | 0.588 | 0.070 | 8.373 | 0.560 | *** |
| factor2 | 1 | STD_DT_T | 0.356 | 0.069 | 5.182 | 0.364 | *** |
| factor2 | 1 | STD_SE_T | 0.609 | 0.067 | 9.150 | 0.603 | *** |
| factor2 | 2 | STD_SEF_T | 0.814 | 0.081 | 10.032 | 0.771 | *** |
| factor2 | 2 | STD_SS_T | 0.694 | 0.086 | 8.029 | 0.649 | *** |
| factor2 | 2 | STD_PER_T | 0.602 | 0.076 | 7.925 | 0.642 | *** |
| factor2 | 2 | STD_CNTRL_T | 0.572 | 0.074 | 7.722 | 0.629 | *** |
| factor2 | 2 | STD_DT_T | 0.612 | 0.082 | 7.457 | 0.611 | *** |
| factor2 | 2 | STD_SE_T | 0.578 | 0.080 | 7.178 | 0.592 | *** |
weak <- cfa(model = modelo1, estimator="ML", data = data, group = "TIME_C", parameterization = "theta",
std.lv=TRUE, orthogonal=F, group.equal=c("loadings"))
parameterEstimates(weak, standardized=TRUE) %>%
filter(op == "=~") %>%
filter(lhs == "factor2") %>%
mutate(stars = ifelse(pvalue < .001, "***",
ifelse(pvalue < .01, "**",
ifelse(pvalue < .05, "*", "")))) %>%
select('Latent Factor'=lhs,
'Grupo'= group,
Indicator=rhs,
B=est,
SE=se, Z=z,
Beta=std.all,
sig=stars) %>%
kable(digits = 3, format="pandoc", caption="Factor Loadings for Factor 1")%>%
kable_styling()
| Latent Factor | Grupo | Indicator | B | SE | Z | Beta | sig |
|---|---|---|---|---|---|---|---|
| factor2 | 1 | STD_SEF_T | 0.762 | 0.047 | 16.285 | 0.804 | *** |
| factor2 | 1 | STD_SS_T | 0.674 | 0.050 | 13.460 | 0.696 | *** |
| factor2 | 1 | STD_PER_T | 0.618 | 0.051 | 12.229 | 0.602 | *** |
| factor2 | 1 | STD_CNTRL_T | 0.578 | 0.051 | 11.345 | 0.551 | *** |
| factor2 | 1 | STD_DT_T | 0.468 | 0.053 | 8.844 | 0.459 | *** |
| factor2 | 1 | STD_SE_T | 0.596 | 0.051 | 11.618 | 0.593 | *** |
| factor2 | 2 | STD_SEF_T | 0.762 | 0.047 | 16.285 | 0.746 | *** |
| factor2 | 2 | STD_SS_T | 0.674 | 0.050 | 13.460 | 0.635 | *** |
| factor2 | 2 | STD_PER_T | 0.618 | 0.051 | 12.229 | 0.656 | *** |
| factor2 | 2 | STD_CNTRL_T | 0.578 | 0.051 | 11.345 | 0.635 | *** |
| factor2 | 2 | STD_DT_T | 0.468 | 0.053 | 8.844 | 0.496 | *** |
| factor2 | 2 | STD_SE_T | 0.596 | 0.051 | 11.618 | 0.606 | *** |
parameterEstimates(weak, standardized=TRUE) %>%
filter(op == "=~") %>%
filter(lhs == "factor1") %>%
mutate(stars = ifelse(pvalue < .001, "***",
ifelse(pvalue < .01, "**",
ifelse(pvalue < .05, "*", "")))) %>%
select('Latent Factor'=lhs,
'Grupo'= group,
Indicator=rhs,
B=est,
SE=se, Z=z,
Beta=std.all,
sig=stars) %>%
kable(digits = 3, format="pandoc", caption="Factor Loadings for Factor 2")%>%
kable_styling()
| Latent Factor | Grupo | Indicator | B | SE | Z | Beta | sig |
|---|---|---|---|---|---|---|---|
| factor1 | 1 | STD_AQ_T | 0.729 | 0.045 | 16.094 | 0.724 | *** |
| factor1 | 1 | STD_STRS_T | 0.921 | 0.039 | 23.656 | 0.919 | *** |
| factor1 | 1 | STD_ANX_T | 0.879 | 0.041 | 21.668 | 0.857 | *** |
| factor1 | 1 | STD_DEP_T | 0.865 | 0.041 | 21.076 | 0.853 | *** |
| factor1 | 1 | STD_AGG_T | 0.688 | 0.046 | 14.966 | 0.668 | *** |
| factor1 | 1 | STD_INH_T | 0.639 | 0.047 | 13.625 | 0.610 | *** |
| factor1 | 1 | STD_GANG_T | 0.611 | 0.048 | 12.822 | 0.614 | *** |
| factor1 | 2 | STD_AQ_T | 0.729 | 0.045 | 16.094 | 0.731 | *** |
| factor1 | 2 | STD_STRS_T | 0.921 | 0.039 | 23.656 | 0.947 | *** |
| factor1 | 2 | STD_ANX_T | 0.879 | 0.041 | 21.668 | 0.914 | *** |
| factor1 | 2 | STD_DEP_T | 0.865 | 0.041 | 21.076 | 0.892 | *** |
| factor1 | 2 | STD_AGG_T | 0.688 | 0.046 | 14.966 | 0.717 | *** |
| factor1 | 2 | STD_INH_T | 0.639 | 0.047 | 13.625 | 0.680 | *** |
| factor1 | 2 | STD_GANG_T | 0.611 | 0.048 | 12.822 | 0.605 | *** |
anova(configural, weak) %>% # Model fit index changes are minimal, hence, metric invariance is established.
kable(digits = 3, format="pandoc", caption="Chi Square Difference Test")%>%
kable_styling()| Df | AIC | BIC | Chisq | Chisq diff | Df diff | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|
| configural | 128 | 11758.74 | 12073.32 | 581.037 | NA | NA | NA |
| weak | 141 | 11742.12 | 12005.58 | 590.416 | 9.38 | 13 | 0.744 |
Los cambios en las medidas de ajuste son mínimas, por lo que se establece loadings invariance.
strong <- cfa(model = modelo1, estimator="ML", data = data, group = "TIME_C", parameterization = "theta", orthogonal=F, group.equal=c("loadings", "intercepts"))
anova(configural, weak, strong)%>% # Model fit index changes are minimal, hence, intercepts invariance is established.
kable(digits = 3, format="pandoc", caption="Chi Square Difference Test")%>%
kable_styling()| Df | AIC | BIC | Chisq | Chisq diff | Df diff | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|
| configural | 128 | 11758.74 | 12073.32 | 581.037 | NA | NA | NA |
| weak | 141 | 11742.12 | 12005.58 | 590.416 | 9.380 | 13 | 0.744 |
| strong | 150 | 11735.01 | 11963.08 | 601.305 | 10.888 | 9 | 0.283 |
Los cambios en las medidas de ajuste son mínimas, por lo que se establece loadings invariance and intercepts invariance.
strict <- cfa(model = modelo1, estimator="ML", data = data, group = "TIME_C", parameterization = "theta", orthogonal=F, group.equal=c("loadings", "intercepts", "residuals"))
anova(configural, weak, strong, strict)%>% # Model fit index changes are not minimal, hence, residuals invariance is not established.
kable(digits = 3, format="pandoc", caption="Chi Square Difference Test")%>%
kable_styling() | Df | AIC | BIC | Chisq | Chisq diff | Df diff | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|
| configural | 128 | 11758.74 | 12073.32 | 581.037 | NA | NA | NA |
| weak | 141 | 11742.12 | 12005.58 | 590.416 | 9.380 | 13 | 0.744 |
| strong | 150 | 11735.01 | 11963.08 | 601.305 | 10.888 | 9 | 0.283 |
| strict | 163 | 11755.22 | 11932.17 | 647.519 | 46.214 | 13 | 0.000 |
Los cambios en las medidas de ajuste no son mínimas, por lo que se establece loadings invariance and intercepts invariance solamente.