Structural Equation Modelling: The Effect of Earthquakes on California Housing Prices

Background

AT2 research aimed to test if earthquakes reduced housing prices in California. The most populated US state sits on a path along the Pacific Ocean known as the “Ring of Fire” which is vulnerable to earthquake activity and tectonic plate movement.

Quarterly price changes per county were observed with adjustments for socioeconomic confounders. The final multiple linear regression produced an adjusted-R2 of 67.3%. The variable for fault line score was statistically significant indicating a higher risk of earthquakes increases prices on average, while occurrence of major earthquakes was negative but not significant (see Appendix A).

Whilst generating relatively satisfactory explanatory power, the model was dominated by macroeconomic variables. The targeted audience of the research were real estate investors, who are likely to already possess a strong understanding of economic factors that are well documented by academic literature (Reichert, 1990; Baffoe-Bonnie, 1998). Meanwhile research on the effects of natural disasters on prices remains contradictory. While many papers reported short-term adverse effects (Beracha and Prati, 2008; Fekrazad, 2019; Boustan, et. al., 2020) others counterintuitively observed increases in prices. These were attributable to varying factors including higher construction costs (Willis and Asgary, 1997) or supply shortages (Murphy and Strobl, 2009).

Therefore, the effects of natural disasters would provide more insight in comparison to refining a purely economic model. Structural Equation Modelling (SEM) will be used to determine if earthquakes have a casual effect on prices.

Methodology

SEM provides a framework to visualise the structure of relationships between the variables, as well as the assessment of unobserved or latent constructs, i.e. the theoretical model of what drives prices from empirical data. Following a two-stage process, first the measurement model specifies how well measured variables represent the latent constructs. Secondly, the structural model models the causal relationship between constructs.

The new research objectives are:

Test structural assumptions of prior research. Concretely, housing prices are driven by macroeconomics, demographics and earthquakes.
Does the observed data accurately reflect the proposed structure?
Can a more accurate relationship be identified within the dataset?

Traditionally, housing price models stem from the hedonic approach by Rosen (1974) where attributes such as size, number of rooms, etc. determine prices. SEM has been effective in this domain with Bowen, et. al. (2001) finding solutions to the multicollinearity, heteroskedasticity and autocorrelation that hinder hedonic models, while Freeman and Zhao (2018) observed improvements from the introduction of covariance links between variables.

Directed Acyclic Graph

Directed Acyclic Graphs (DAG) visualise the theoretical framework. Per its name, relationships are directed to show casual effects within the model and no feedback loops exist (i.e. acyclic) so no variable can be its own descendant. The fit of this hypothesised causal structure is tested on the empirical data via three latent constructs as follows:

Economy - Income, Unemployment Rate and Interest Rate
Demographics - Population and Crime Index (level of crimes reported by the FBI, adjusted for population)
Earthquakes - Frequency, Depth, Strength (by Magnitude), Fault Score (risk defined by slip rates, length and number of sections)

The earthquake variables differ from AT2. Previously fault score and a dummy variable indicating a severe earthquake occurred during the past year were used. However, this may create timing issues where say an incident in the previous month receives equal weighting to an occurrence 12 months ago. Attempts to use counts of “moderate” or “severe” earthquakes were not fruitful. Here, extra variables are introduced to attempt more accurately describe the earthquake construct, rather than data mining for statistically significant regression parameters.

Directed Acyclic Graph (DAG) for Californian Housing Prices

Analysis and Results

Confirmatory Factor Analysis

Confirmatory Factor Analysis (CFA) explores how well the data points describe proposed latent constructs. Also referred to as the measurement model, each construct is assumed to be unrelated to one another. This local fit is important as the later structural model can be sensitive to smaller numbers of bad components, leading to poor global fit results. Using the lavaan package in R, the model is defined as:

econ =~ income + unemployment_rate + interest
demo =~ population + crime_index
quake =~ quake_freq + quake_strength + quake_depth + fault_score

This notation defines latent constructs on the left of the =~ symbol by the variables on the right side. Two more symbols that are relevant to later stages are ~ denoting a regression and ~~ identifying covariance between variables.

Three key measures will be focused on (full results in Appendix B1), namely Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR). CFI (Bentler, 1990) compares the fit of the target model against a restricted null model, where all variables have no correlation. RMSEA (Steiger and Lind, 1980) conversely measures how closely the target model reproduces data patterns. Both feature a parsimony adjustment where all other things being equal, simpler models with few parameters are preferred over complexity. SRMR measures the standarised differences between the observed and the predicted correlations.

An influential study by Hu and Bentler (1999) suggest the following benchmarks of good fit:

CFI > 0.95
RMSEA < 0.06
SRMR < 0.08

A combination of benchmarks is suggested as SRMR can be sensitive to mis-specified factor covariances, while other fit indices are sensitive to mis-specified factor loadings. These are widely adopted by many SEM practices, but more recent studies (Fan and Sivo, 2005) warn these should be used as guidelines rather than hard cut-offs. Lack of diversity in models and parameters of the original data may have created non-generalisable results.

Results, CFA
CFI	0.755
RMSEA	0.150
SRMR	0.100

Using the results from lavaan the proposed model can also be visualised as a DAG:

Results fall well outside of the generally accepted thresholds. Also lavaan produces the warning message “some estimated ov variances are negative”, which can be an indicator model misspecification. However, as previously noted this does not necessarily indicate a completely flawed model. A closer look into the results still provide insights.

Factor loadings

Below show how each variable loads onto its particular latent construct. Values can be interpreted similarly to regression coefficients. For example, income has a positive effect on economy factor (+0.36), while rising unemployment impacts negatively (-1.29) as expected.

Factor Loadings, CFA
Latent Factor	Indicator	B	SE	Z	Beta
econ	income	0.3617	0.0258	14.0103	1.3970
econ	unemployment_rate	-1.2882	0.1363	-9.4539	-0.3079
econ	interest	0.0792	0.0147	5.4005	0.1055
demo	population	1.3351	0.0674	19.8024	0.7700
demo	crime_index	-0.2800	0.0248	-11.2713	-0.3161
quake	quake_freq	2.9445	0.1767	16.6597	0.4210
quake	quake_strength	1.4742	0.0392	37.6275	1.0437
quake	quake_depth	3.9329	0.1764	22.2899	0.5772
quake	fault_score	0.6493	0.0396	16.3866	0.4135

Error Variances

The below shows the error variances of each parameter of the model. All except quake strength are significantly greater than zero. This means the latent factors do not perfectly predict observed variables, but this is expected.

Variances, CFA
Indicator	B	SE	Z	Pvalue	Beta
income	-0.0638	0.0188	-3.3932	0.0007	-0.9517
unemployment_rate	15.8413	0.6111	25.9245	0.0000	0.9052
interest	0.5582	0.0199	28.0917	0.0000	0.9889
population	1.2242	0.1571	7.7917	0.0000	0.4071
crime_index	0.7065	0.0261	27.0462	0.0000	0.9001
quake_freq	40.2577	1.4721	27.3472	0.0000	0.8228
quake_strength	-0.1781	0.0913	-1.9508	0.0511	-0.0893
quake_depth	30.9669	1.2734	24.3190	0.0000	0.6669
fault_score	2.0436	0.0745	27.4141	0.0000	0.8290
econ	1.0000	0.0000			1.0000
demo	1.0000	0.0000			1.0000
quake	1.0000	0.0000			1.0000
econ	0.5314	0.0497	10.6869	0.0000	0.5314
econ	0.0476	0.0172	2.7682	0.0056	0.0476
demo	0.3121	0.0319	9.7991	0.0000	0.3121

Residuals

This tracks how closely covariances recalculated from estimated parameters compare with the empirical data. Larger absolute values indicate relationships between variables that isn’t captured by the model, e.g. the relationship between unemployment rate and interest, or frequency versus depth and fault score among earthquakes. Residuals drive the RMSEA score where positive values indicate an under-prediction of the correlation between the variables, while negatives indicate over-prediction.

Residuals, CFA
	Income	Unemployment	Interest	Pop.	Crime	Freq	Strength	Depth	Fault Score
Income	0.0000
Unemployment	-0.0028	0.0000
Interest	0.0176	-1.3376	0.0000
Pop.	-0.0135	0.7932	-0.0476	0.0000
Crime	-0.0089	0.3580	-0.0554	0.0000	0.0000
Freq	-0.1883	5.7608	-0.0795	-0.4797	0.1171	-0.0001
Strength	-0.0020	0.3520	0.0247	0.0365	0.0356	0.0133	0.0000
Depth	-0.0627	0.6509	0.1262	0.2843	-0.3245	-3.4427	-0.0519	0.0000
Fault Score	0.0973	0.9021	-0.0024	0.6052	0.0864	1.8497	0.0082	-0.8906	0

Modification Indices

This list presents additional parameters that are recommended to significantly improve the model fit. Consistent with earlier notation, =~ indicates additional factor loading and ~~ indicates covariance. It is important to only consider additions that make theoretical sense to the proposed SEM, otherwise there is high risk of overfitting the data and reducing generalisability.

Modification Indices, CFA
lhs	op	rhs	mi	epc	sepc.lv	sepc.all	sepc.nox
econ	=~	fault_score	129.8480	0.2645	0.2645	0.1685	0.1685
demo	=~	income	84.8384	0.8923	0.8923	3.4462	3.4462
demo	=~	unemployment_rate	57.2352	2.2412	2.2412	0.5357	0.5357
demo	=~	quake_strength	20.2773	-0.2141	-0.2141	-0.1516	-0.1516
demo	=~	fault_score	151.2892	0.5407	0.5407	0.3444	0.3444
income	~~	interest	87.2007	-0.1130	-0.1130	-0.5986	-0.5986
income	~~	population	39.8641	0.3244	0.3244	1.1606	1.1606
income	~~	fault_score	114.2742	0.0715	0.0715	0.1981	0.1981
unemployment_rate	~~	interest	316.1659	-1.3362	-1.3362	-0.4494	-0.4494
unemployment_rate	~~	population	100.4972	2.8307	2.8307	0.6428	0.6428
unemployment_rate	~~	crime_index	23.5718	0.4119	0.4119	0.1231	0.1231
unemployment_rate	~~	quake_freq	40.5287	3.7610	3.7610	0.1489	0.1489
unemployment_rate	~~	fault_score	105.9493	1.3706	1.3706	0.2409	0.2409
crime_index	~~	fault_score	31.4893	0.1712	0.1712	0.1425	0.1425
quake_freq	~~	quake_depth	21.3984	-4.9545	-4.9545	-0.1403	-0.1403
quake_freq	~~	fault_score	74.1068	2.1075	2.1075	0.2323	0.2323
quake_strength	~~	quake_depth	78.9651	5.3605	5.3605	2.2826	2.2826
quake_depth	~~	fault_score	27.6453	-1.2560	-1.2560	-0.1579	-0.1579

Model Adjustment

The modifications suggest cross-loading should be introduced where a single variable loads onto multiple latent constructs, such as fault score being suggested onto economy and demographic factors. This reduces the distinctiveness of each construct in representing separate concepts within the model. Hence, these are not incorporated as it significantly changes the hypothesised framework. Cross-loading is accepted under the branch of Exploratory Factor Analysis, where no priori theory on constructs is expected. This is an extensive topic outside the scope of this research.

Focusing on covariances, those that load onto the same factor will be introduced. The relationship between unemployment rate and interest rates, as well as those among the earthquake variables, were expected from prior research and were also problematic in the residual matrix. The new model is restated as:

# Factors
econ =~ income + unemployment_rate + interest
demo =~ population + crime_index
quake =~ quake_freq + quake_strength + quake_depth + fault_score

# Covariance
income ~~ interest
unemployment_rate ~~ interest
quake_freq ~~ quake_depth
quake_freq ~~ fault_score
quake_strength ~~ quake_depth
quake_depth ~~ fault_score

There is significant improvement to the key fit measures (full results at Appendix B2). Although the scores are closer to benchmarks from the literature, they still fall short of a strong fit.

Results, CFA with Covariance
CFI	0.886
RMSEA	0.118
SRMR	0.068

Finally, an ANOVA test on the two measurement models shows the introduction of select covariances is a statistically significant improvement to the base model.

ANOVA, CFA
	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
fit.cov	18	52185.99	52330.58	411.6969
fit.cfa	24	52634.26	52746.72	871.9688	460.2719	6	0

Structural Model

The structural model combines constructs from the CFA stage with regressions, testing the path models hypothesised in the DAG. The SEM models in lavaan will use the parameter estimator="MLM". This indicates a maximum likelihood parameter with standard errors and mean adjusted test statistics robust for non-normality (Satorra and Bentler, 2001). Log-tranformations were performed on the relevant variables to improve the regressions in AT2, but earthquake data will still suffer from non-normality.

The initial SEM model is defined as:

# Factors
econ =~ income + unemployment_rate + interest
demo =~ population + crime_index
quake =~ quake_freq + quake_strength + quake_depth + fault_score

# Regression
house_price ~ econ + demo + quake

Initial Results

Fit measures fall into the same range as the base CFA model (full results at Appendix B3). Here the “scaled” version of indices below refer to the test scores that were adjusted for non-normality.

Results, SEM
CFI.SCALED	0.791
RMSEA.SCALED	0.150
SRMR	0.099

Modification Indices

In the modification indices, the covariances present in during the CFA stage resurface. Other covariances and cross-loadings will be ignored to keep consistency with priori theoretical framework as previously stated.

Modification Indices, SEM
lhs	op	rhs	mi	epc	sepc.lv	sepc.all	sepc.nox
econ	=~	fault_score	133.9981	0.4411	0.4411	0.2810	0.2810
demo	=~	income	141.8497	0.2915	0.2915	1.1259	1.1259
demo	=~	unemployment_rate	131.5447	2.7797	2.7797	0.6645	0.6645
demo	=~	fault_score	120.0145	0.5019	0.5019	0.3197	0.3197
income	~~	unemployment_rate	40.8202	0.2195	0.2195	0.5356	0.5356
income	~~	population	175.1152	0.1451	0.1451	1.0279	1.0279
income	~~	house_price	98.0872	-0.0756	-0.0756	-2.1323	-2.1323
unemployment_rate	~~	interest	248.5871	-1.0721	-1.0721	-0.4093	-0.4093
unemployment_rate	~~	population	235.8278	2.1726	2.1726	0.4817	0.4817
unemployment_rate	~~	quake_freq	46.6744	4.0002	4.0002	0.1752	0.1752
unemployment_rate	~~	quake_strength	24.7580	-0.4541	-0.4541	-0.4608	-0.4608
unemployment_rate	~~	fault_score	151.4571	1.6236	1.6236	0.3157	0.3157
unemployment_rate	~~	house_price	118.2825	-0.6661	-0.6661	-0.5881	-0.5881
interest	~~	population	64.1721	-0.2137	-0.2137	-0.2369	-0.2369
population	~~	fault_score	28.1391	0.2758	0.2758	0.1557	0.1557
crime_index	~~	fault_score	32.3301	0.1723	0.1723	0.1461	0.1461
quake_freq	~~	quake_depth	29.9526	-5.7693	-5.7693	-0.1661	-0.1661
quake_freq	~~	fault_score	67.6448	1.9931	1.9931	0.2223	0.2223
quake_strength	~~	quake_depth	151.6864	6.7467	6.7467	4.5007	4.5007
quake_depth	~~	fault_score	37.5488	-1.4423	-1.4423	-0.1844	-0.1844
fault_score	~~	house_price	97.4998	0.1378	0.1378	0.3097	0.3097

Model Adjustment

The final SEM model is defined as:

# Factors
econ =~ income + unemployment_rate + interest
demo =~ population + crime_index
quake =~ quake_freq + quake_strength + quake_depth + fault_score

# Regression
house_price ~ econ + demo + quake

# Covariance
income ~~ interest
unemployment_rate ~~ interest
quake_freq ~~ quake_depth
quake_freq ~~ fault_score
quake_strength ~~ quake_depth
quake_depth ~~ fault_score

Results

A significant improvement to fit measures is achieved (full results at Appendix B4), as well as passing the ANOVA test, but all still fall short of the benchmarks.

Results, SEM with Covariance
CFI.SCALED	0.879
RMSEA.SCALED	0.127
SRMR	0.082

ANOVA, SEM
	Df	AIC	BIC	Chisq	Chisq diff	Df diff	Pr(>Chisq)
fit.sem.cov	24	53756.66	53922.67	781.9378
fit.sem	30	54216.59	54350.46	1253.8640	548.3261	6	0

Conclusion

Although the final model did not meet accepted thresholds, results still give additional insights not achievable in AT2. Factor loadings and regression results below provide more evidence that the prevalence of earthquakes increases prices on average. The coefficient of the quake construct is positive in the regression on house prices, while all indicators within the quake factor loadings are also positive. In contrast to the inflationary effect here, the literature is mixed on the short-term effects to prices and AT2 showed weak evidence that large magnitude events are negative.

Factor Loadings, SEM with Covariance
Latent Factor	Indicator	B	SE	Z	Beta
econ	income	0.2172	0.0074	29.3485	0.8389
econ	unemployment_rate	-2.1582	0.0991	-21.7737	-0.5159
econ	interest	0.1578	0.0190	8.3145	0.2100
demo	population	1.2583	0.0718	17.5237	0.7256
demo	crime_index	-0.2971	0.0236	-12.5884	-0.3354
quake	quake_freq	4.3175	0.2423	17.8219	0.6172
quake	quake_strength	0.8960	0.0412	21.7326	0.6344
quake	quake_depth	1.8988	0.3170	5.9903	0.2785
quake	fault_score	1.1612	0.0500	23.2212	0.7396

Regression, SEM with Covariance
Regression	Latent Factor	B	SE	Z	Pvalue	Beta
house_price	econ	0.8277	0.1374	6.0226	0.0000	1.3031
house_price	demo	-0.3420	0.1574	-2.1734	0.0297	-0.5385
house_price	quake	0.2456	0.0634	3.8760	0.0001	0.3866

Also surprising is that the earthquake construct exhibits more significant covariance with demographics than the economy. This presents an interesting starting point for potential multilevel SEM analysis, as literature often attributes natural disasters to have an indirect effect on the economy but not demographic factors.

Latent Factor Covariance
Latent Factor 1	Latent Factor 2	B	SE	Z	Pvalue	Beta
econ	demo	0.7944	0.0440	18.0613	0.0000	0.7944
econ	quake	0.1146	0.0382	3.0034	0.0027	0.1146
demo	quake	0.4216	0.0468	9.0061	0.0000	0.4216

Reflections

Identification of a structural model during earlier stages of a project help define superior research questions, assess project complexity and direct data collection efforts. Decisions for model inclusion in prior research without a DAG were often driven by data availability or granularity, rather than adherence to suspected causation. SEM results also require a higher hurdle to clear to achieve a “good” model but create a lot of additional insights from the data.

These findings just scratch the surface of possibilities with SEM. Models can be extended through multilevel analysis, incorporating more advanced temporal and spatial techniques, or allowing cross-loadings using EFA. Unfortunately, the complexities of this topic are quite high and results suggest further confounding variables may need to be specified, thus extrapolating the data requirements.

Perhaps a more general “disaster” construct is required rather than simply earthquakes. Forrest fires for example are extremely prevalent in California. Dillon-Merrill, et. al. (2018) also suggests other indirect variables affect the impact of disasters, finding that floods produced the most significant housing price changes due to the lack of insurance coverage. A SEM approach appears to be superior in uncovering these additional pathways.

Appendices

References

Baffoe-Bonnie, J. (1998). The Dynamic Impact of Macroeconomic Aggregates on Housing Prices and Stock of Houses: A National and Regional Analysis. The Journal of Real Estate Finance and Economics 17, 179–197.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238–246.
Beracha, E. & Prati, R. (2008). How Major Hurricanes Impact Housing Prices and Transaction Volume. Real Estate Issues, 33, 45-57.
Boustan, L. P., Kahn, M. E., Rhode, P. W., and Yanguas, M. L. (2020). The effect of natural disasters on economic activity in US counties: A century of data, Journal of Urban Economics, 118, 103257.
Bowen, W.M., Mikelbank, B.A., and Prestegaard, D.M. (2001). Theoretical and empirical considerations regarding space in hedonic house price estimation. Growth and Change, 32(4), 466–490.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models, 136–162.
Dillon-Merrill, R.L., Ge, L., and Gete, P. (2018). Natural Disasters and Housing Markets. The Tenure Choice Channel.
Fan, X., and Sivo, S. (2005). Sensitivity of Fit Indexes to Misspecified Structural or Measurement Model Components: Rationale of Two-Index Strategy Revisited. Structural Equation Modeling-a Multidisciplinary Journal - STRUCT EQU MODELING, 12.
Fekrazad, A. (2019). Earthquake-risk salience and housing prices: Evidence from California, Journal of Behavioral and Experimental Economics, 78, 104-113.
Freeman, J., and Zhao, X. (2018). An SEM approach to modelling housing values.
Hu, L., and Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives, Structural Equation Modeling: A Multidisciplinary Journal, 6:1, 1-55.
Murphy, A., and Strobl, E. (2009). The Impact of Hurricanes on Housing Prices: Evidence from US Coastal Cities. University Library of Munich, Germany, MPRA Paper.
Reichert, A.K. (1990). The impact of interest rates, income, and employment upon regional housing prices. The Journal of Real Estate Finance and Economics, 3(4), 373–391.
Rosen, S. (1974). Hedonic prices and implicit markets: product differentiation in pure competition. Journal of Political Economy, 82(1), 34–55.
Satorra, A., and Bentler, P.M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.
Steiger, J. H., and Lind, J. C. (1980). Statistically based tests for the number of common factors. Annual Meeting of the Psychometric Society, Iowa City, IA.
Willis, K., and Asgary, A. (1997). The Impact of Earthquake Risk on Housing Markets: Evidence from Tehran Real Estate Agents. Journal of Housing Research, 8(1), 125-136.

Appendix A: AT2 Regression

Mutliple linear regression results from prior research.

Summary statistics:

Residual Plots:

Return

Appendix B1: CFA

# CFA Model 
model.measure <- "econ =~ income + unemployment_rate + interest
                  demo =~ population + crime_index 
                  quake =~ quake_freq + quake_strength + quake_depth + fault_score 
                  "
fit.cfa <- cfa(model.measure, data = quake_trans, std.lv = TRUE)

# Results
summary(fit.cfa, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-7 ended normally after 47 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         21
##                                                       
##   Number of observations                          1564
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               871.969
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3502.681
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.755
##   Tucker-Lewis Index (TLI)                       0.633
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -26296.131
##   Loglikelihood unrestricted model (H1)     -25860.147
##                                                       
##   Akaike (AIC)                               52634.263
##   Bayesian (BIC)                             52746.718
##   Sample-size adjusted Bayesian (BIC)        52680.006
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.150
##   90 Percent confidence interval - lower         0.142
##   90 Percent confidence interval - upper         0.159
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.100
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ =~                                                               
##     income            0.362    0.026   14.010    0.000    0.362    1.397
##     unemploymnt_rt   -1.288    0.136   -9.454    0.000   -1.288   -0.308
##     interest          0.079    0.015    5.400    0.000    0.079    0.105
##   demo =~                                                               
##     population        1.335    0.067   19.802    0.000    1.335    0.770
##     crime_index      -0.280    0.025  -11.271    0.000   -0.280   -0.316
##   quake =~                                                              
##     quake_freq        2.945    0.177   16.660    0.000    2.945    0.421
##     quake_strength    1.474    0.039   37.627    0.000    1.474    1.044
##     quake_depth       3.933    0.176   22.290    0.000    3.933    0.577
##     fault_score       0.649    0.040   16.387    0.000    0.649    0.414
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ ~~                                                               
##     demo              0.531    0.050   10.687    0.000    0.531    0.531
##     quake             0.048    0.017    2.768    0.006    0.048    0.048
##   demo ~~                                                               
##     quake             0.312    0.032    9.799    0.000    0.312    0.312
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .income           -0.064    0.019   -3.393    0.001   -0.064   -0.952
##    .unemploymnt_rt   15.841    0.611   25.924    0.000   15.841    0.905
##    .interest          0.558    0.020   28.092    0.000    0.558    0.989
##    .population        1.224    0.157    7.792    0.000    1.224    0.407
##    .crime_index       0.706    0.026   27.046    0.000    0.706    0.900
##    .quake_freq       40.258    1.472   27.347    0.000   40.258    0.823
##    .quake_strength   -0.178    0.091   -1.951    0.051   -0.178   -0.089
##    .quake_depth      30.967    1.273   24.319    0.000   30.967    0.667
##    .fault_score       2.044    0.075   27.414    0.000    2.044    0.829
##     econ              1.000                               1.000    1.000
##     demo              1.000                               1.000    1.000
##     quake             1.000                               1.000    1.000

Return

Appendix B2: CFA with Covariances

# CFA Model with Covariances
model.cov <- "# Factors 
              econ =~ income + unemployment_rate + interest
              demo =~ population + crime_index 
              quake =~ quake_freq + quake_strength + quake_depth + fault_score 
              
              # Covariance     
              income ~~ interest 
              unemployment_rate ~~ interest 
              quake_freq ~~ quake_depth 
              quake_freq ~~ fault_score 
              quake_strength ~~ quake_depth 
              quake_depth ~~ fault_score 
              "
fit.cov <- cfa(model.cov, data = quake_trans, std.lv = TRUE)

# Results
summary(fit.cov, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-7 ended normally after 143 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         27
##                                                       
##   Number of observations                          1564
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               411.697
##   Degrees of freedom                                18
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              3502.681
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.886
##   Tucker-Lewis Index (TLI)                       0.773
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -26065.995
##   Loglikelihood unrestricted model (H1)     -25860.147
##                                                       
##   Akaike (AIC)                               52185.991
##   Bayesian (BIC)                             52330.576
##   Sample-size adjusted Bayesian (BIC)        52244.803
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.118
##   90 Percent confidence interval - lower         0.108
##   90 Percent confidence interval - upper         0.128
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.068
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ =~                                                               
##     income            0.595    0.149    3.995    0.000    0.595    2.297
##     unemploymnt_rt   -0.788    0.224   -3.523    0.000   -0.788   -0.188
##     interest          0.061    0.072    0.849    0.396    0.061    0.081
##   demo =~                                                               
##     population        1.381    0.070   19.749    0.000    1.381    0.797
##     crime_index      -0.271    0.025  -10.906    0.000   -0.271   -0.305
##   quake =~                                                              
##     quake_freq        3.549    0.254   13.967    0.000    3.549    0.507
##     quake_strength    1.195    0.068   17.673    0.000    1.195    0.846
##     quake_depth       4.021    0.501    8.023    0.000    4.021    0.590
##     fault_score       0.828    0.058   14.326    0.000    0.828    0.527
## 
## Covariances:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .income ~~                                                                 
##    .interest              0.010    0.039    0.257    0.797    0.010    0.025
##  .unemployment_rate ~~                                                      
##    .interest             -1.392    0.103  -13.447    0.000   -1.392   -0.452
##  .quake_freq ~~                                                             
##    .quake_depth          -6.323    1.663   -3.803    0.000   -6.323   -0.191
##    .fault_score           0.823    0.364    2.262    0.024    0.823    0.102
##  .quake_strength ~~                                                         
##    .quake_depth           0.939    0.745    1.261    0.207    0.939    0.227
##  .quake_depth ~~                                                            
##    .fault_score          -1.627    0.383   -4.247    0.000   -1.627   -0.222
##   econ ~~                                                                   
##     demo                  0.303    0.082    3.684    0.000    0.303    0.303
##     quake                 0.058    0.019    2.997    0.003    0.058    0.058
##   demo ~~                                                                   
##     quake                 0.399    0.038   10.530    0.000    0.399    0.399
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .income           -0.287    0.178   -1.614    0.107   -0.287   -4.278
##    .unemploymnt_rt   16.880    0.679   24.852    0.000   16.880    0.965
##    .interest          0.561    0.022   25.786    0.000    0.561    0.993
##    .population        1.098    0.170    6.464    0.000    1.098    0.365
##    .crime_index       0.712    0.026   27.133    0.000    0.712    0.907
##    .quake_freq       36.332    1.889   19.235    0.000   36.332    0.743
##    .quake_strength    0.568    0.148    3.844    0.000    0.568    0.285
##    .quake_depth      30.255    3.987    7.588    0.000   30.255    0.652
##    .fault_score       1.780    0.097   18.275    0.000    1.780    0.722
##     econ              1.000                               1.000    1.000
##     demo              1.000                               1.000    1.000
##     quake             1.000                               1.000    1.000

Return

Appendix B3: SEM

# SEM Model
model.sem <- "# Factors 
              econ =~ income + unemployment_rate + interest
              demo =~ population + crime_index 
              quake =~ quake_freq + quake_strength + quake_depth + fault_score 
              
              # Regression 
              house_price ~ econ + demo + quake
              "
fit.sem <- sem(model.sem, data = quake_trans, estimator = "MLM", std.lv = TRUE)

# Results
summary(fit.sem, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-7 ended normally after 43 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         25
##                                                       
##   Number of observations                          1564
##                                                       
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                              1253.864    1081.722
##   Degrees of freedom                                30          30
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.159
##        Satorra-Bentler correction                                 
## 
## Model Test Baseline Model:
## 
##   Test statistic                              5329.283    5066.680
##   Degrees of freedom                                45          45
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.052
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.768       0.791
##   Tucker-Lewis Index (TLI)                       0.653       0.686
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.769
##   Robust Tucker-Lewis Index (TLI)                            0.654
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -27083.295  -27083.295
##   Loglikelihood unrestricted model (H1)     -26456.363  -26456.363
##                                                                   
##   Akaike (AIC)                               54216.590   54216.590
##   Bayesian (BIC)                             54350.465   54350.465
##   Sample-size adjusted Bayesian (BIC)        54271.045   54271.045
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.162       0.150
##   90 Percent confidence interval - lower         0.154       0.143
##   90 Percent confidence interval - upper         0.169       0.157
##   P-value RMSEA <= 0.05                          0.000       0.000
##                                                                   
##   Robust RMSEA                                               0.161
##   90 Percent confidence interval - lower                     0.153
##   90 Percent confidence interval - upper                     0.169
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.099       0.099
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ =~                                                               
##     income            0.233    0.008   31.010    0.000    0.233    0.899
##     unemploymnt_rt   -2.099    0.092  -22.894    0.000   -2.099   -0.502
##     interest          0.201    0.019   10.497    0.000    0.201    0.268
##   demo =~                                                               
##     population        1.205    0.066   18.292    0.000    1.205    0.695
##     crime_index      -0.310    0.024  -12.931    0.000   -0.310   -0.350
##   quake =~                                                              
##     quake_freq        3.022    0.160   18.909    0.000    3.022    0.432
##     quake_strength    1.439    0.041   34.869    0.000    1.439    1.018
##     quake_depth       4.017    0.105   38.225    0.000    4.017    0.589
##     fault_score       0.668    0.040   16.729    0.000    0.668    0.425
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   house_price ~                                                         
##     econ              0.614    0.075    8.225    0.000    0.614    0.967
##     demo             -0.089    0.082   -1.079    0.280   -0.089   -0.140
##     quake             0.084    0.027    3.157    0.002    0.084    0.133
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ ~~                                                               
##     demo              0.803    0.043   18.870    0.000    0.803    0.803
##     quake             0.058    0.026    2.239    0.025    0.058    0.058
##   demo ~~                                                               
##     quake             0.349    0.038    9.179    0.000    0.349    0.349
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .income            0.013    0.002    6.814    0.000    0.013    0.191
##    .unemploymnt_rt   13.095    1.047   12.501    0.000   13.095    0.748
##    .interest          0.524    0.021   25.424    0.000    0.524    0.928
##    .population        1.553    0.155   10.037    0.000    1.553    0.517
##    .crime_index       0.689    0.059   11.726    0.000    0.689    0.877
##    .quake_freq       39.793   10.304    3.862    0.000   39.793    0.813
##    .quake_strength   -0.074    0.118   -0.627    0.531   -0.074   -0.037
##    .quake_depth      30.300    5.055    5.994    0.000   30.300    0.653
##    .fault_score       2.020    0.066   30.541    0.000    2.020    0.819
##    .house_price       0.098    0.012    7.968    0.000    0.098    0.243
##     econ              1.000                               1.000    1.000
##     demo              1.000                               1.000    1.000
##     quake             1.000                               1.000    1.000

Return

Appendix B4: SEM with Covariances

# SEM Model with Covariances
model.sem.cov <- "# Factors 
                  econ =~ income + unemployment_rate + interest
                  demo =~ population + crime_index 
                  quake =~ quake_freq + quake_strength + quake_depth + fault_score 
                  
                  # Regression 
                  house_price ~ econ + demo + quake
                  
                  # Covariance     
                  income ~~ interest 
                  unemployment_rate ~~ interest 
                  quake_freq ~~ quake_depth 
                  quake_freq ~~ fault_score 
                  quake_strength ~~ quake_depth 
                  quake_depth ~~ fault_score 
                  "
fit.sem.cov <- sem(model.sem.cov, data = quake_trans, estimator = "MLM", std.lv = TRUE)

# Results
summary(fit.sem.cov, fit.measures = TRUE, standardized = TRUE)

## lavaan 0.6-7 ended normally after 82 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         31
##                                                       
##   Number of observations                          1564
##                                                       
## Model Test User Model:
##                                               Standard      Robust
##   Test Statistic                               781.938     633.787
##   Degrees of freedom                                24          24
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.234
##        Satorra-Bentler correction                                 
## 
## Model Test Baseline Model:
## 
##   Test statistic                              5329.283    5066.680
##   Degrees of freedom                                45          45
##   P-value                                        0.000       0.000
##   Scaling correction factor                                  1.052
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.857       0.879
##   Tucker-Lewis Index (TLI)                       0.731       0.772
##                                                                   
##   Robust Comparative Fit Index (CFI)                         0.858
##   Robust Tucker-Lewis Index (TLI)                            0.733
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -26847.332  -26847.332
##   Loglikelihood unrestricted model (H1)     -26456.363  -26456.363
##                                                                   
##   Akaike (AIC)                               53756.663   53756.663
##   Bayesian (BIC)                             53922.668   53922.668
##   Sample-size adjusted Bayesian (BIC)        53824.188   53824.188
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.142       0.127
##   90 Percent confidence interval - lower         0.134       0.120
##   90 Percent confidence interval - upper         0.151       0.135
##   P-value RMSEA <= 0.05                          0.000       0.000
##                                                                   
##   Robust RMSEA                                               0.142
##   90 Percent confidence interval - lower                     0.132
##   90 Percent confidence interval - upper                     0.151
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.082       0.082
## 
## Parameter Estimates:
## 
##   Standard errors                           Robust.sem
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   econ =~                                                               
##     income            0.217    0.007   29.349    0.000    0.217    0.839
##     unemploymnt_rt   -2.158    0.099  -21.774    0.000   -2.158   -0.516
##     interest          0.158    0.019    8.315    0.000    0.158    0.210
##   demo =~                                                               
##     population        1.258    0.072   17.524    0.000    1.258    0.726
##     crime_index      -0.297    0.024  -12.588    0.000   -0.297   -0.335
##   quake =~                                                              
##     quake_freq        4.317    0.242   17.822    0.000    4.317    0.617
##     quake_strength    0.896    0.041   21.733    0.000    0.896    0.634
##     quake_depth       1.899    0.317    5.990    0.000    1.899    0.278
##     fault_score       1.161    0.050   23.221    0.000    1.161    0.740
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   house_price ~                                                         
##     econ              0.828    0.137    6.023    0.000    0.828    1.303
##     demo             -0.342    0.157   -2.173    0.030   -0.342   -0.538
##     quake             0.246    0.063    3.876    0.000    0.246    0.387
## 
## Covariances:
##                        Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .income ~~                                                                 
##    .interest              0.012    0.003    4.382    0.000    0.012    0.116
##  .unemployment_rate ~~                                                      
##    .interest             -1.099    0.058  -18.989    0.000   -1.099   -0.418
##  .quake_freq ~~                                                             
##    .quake_depth          -1.709    1.101   -1.552    0.121   -1.709   -0.047
##    .fault_score          -1.252    0.377   -3.321    0.001   -1.252   -0.215
##  .quake_strength ~~                                                         
##    .quake_depth           4.053    0.407    9.957    0.000    4.053    0.567
##  .quake_depth ~~                                                            
##    .fault_score          -0.287    0.222   -1.289    0.197   -0.287   -0.041
##   econ ~~                                                                   
##     demo                  0.794    0.044   18.061    0.000    0.794    0.794
##     quake                 0.115    0.038    3.003    0.003    0.115    0.115
##   demo ~~                                                                   
##     quake                 0.422    0.047    9.006    0.000    0.422    0.422
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .income            0.020    0.002   11.151    0.000    0.020    0.296
##    .unemploymnt_rt   12.843    0.999   12.853    0.000   12.843    0.734
##    .interest          0.540    0.021   25.729    0.000    0.540    0.956
##    .population        1.424    0.176    8.088    0.000    1.424    0.473
##    .crime_index       0.697    0.060   11.691    0.000    0.697    0.888
##    .quake_freq       30.287    9.988    3.032    0.002   30.287    0.619
##    .quake_strength    1.192    0.072   16.590    0.000    1.192    0.598
##    .quake_depth      42.890    5.755    7.453    0.000   42.890    0.922
##    .fault_score       1.117    0.098   11.344    0.000    1.117    0.453
##    .house_price       0.015    0.026    0.589    0.556    0.015    0.037
##     econ              1.000                               1.000    1.000
##     demo              1.000                               1.000    1.000
##     quake             1.000                               1.000    1.000

Return

Structural Equation Modelling: The Effect of Earthquakes on California Housing Prices

Martin Neloe

08/11/2020

Background

Methodology

Directed Acyclic Graph

Analysis and Results

Confirmatory Factor Analysis

Structural Model

Conclusion

Reflections

Appendices

References

Appendix A: AT2 Regression

Appendix B1: CFA

Appendix B2: CFA with Covariances

Appendix B3: SEM

Appendix B4: SEM with Covariances