(a) Fit a direct model that resembles the final model on question 4, show the SEM diagram and results table (side by side with those for 4c). Comment on the similarities and differences in your result
First we create dummy variable for education since it has more than 2 levels.
import delimited framingham_cleanregress sysbp i.male age i.education i.bpmeds i.prevalenthyp diabp heartrate glucose totchol, allbase*Creating the dummy variables for education category*tab education, gen(educationlevels)*Fitting the modelsem (diabp -> sysbp, ) (educationlevels2 -> sysbp, ) (educationlevels3 -> sysbp, ) (educationlevels4 -> sysbp, ) (glucose -> sysbp, ) (heartrate -> sysbp, ) (bpmeds -> sysbp, ) (prevalenthyp -> sysbp, ) (male -> sysbp, ) (age -> sysbp, ) (totchol-> sysbp, ), covstructure(e._endogenous , unstructured) nocapslatent*Education level 2 was removed becaused it was insignificant in the model (p-value=0.240 ) from the the final model.estat mindices
The linear regression model output and the structural Equation model had similarities and few disparities
The estimated path coefficients in both model outputs were the same (identical) for all variables including the constant/intercept
The Z and t values where however quite different
The significance of the variables at 5% were also identical.
(b) Work on improving the direct model by introducing some indirect pathways based on research knowledge of the field or suggested pathways from ’’estat mindices”. Display the final direct and indirect SEM diagram and explain your approach of the indirect pathways and/or correlations introduced. Hint: Do not make the modifications too complex, make a few alterations that help improve the model
firstly education level 2 was removed due to not being significant (\(p=0.216>0.05\)).
On running the estat mindices command in stata on the initial direct model ,the estat mindices command did not suggest anything for improvement, hence I had to use expert opinion and prior belief to create indirect pathways.
The direct relationship between diastolic blood pressure and systolic blood pressure was mantained , this is supported both biologically and statistically since diastolic blood pressure is known to affect systolic blood pressure due to cardiovascular risk factors.
Prevalent hypertension(prevalenthyp) was introduced as a key Mediator since individuals with Prevalent hypertension often suffer more from elevated diastolic and systolic blood pressure.
Justification for the appproach
The changes result in more parsimonous model as few changes (justified changes were made to avoid overfitting)
Model Structural Output
(c) Perform and comment on all five SEM model goodness of fit procedures and comment on how each performs based on your final SEM model.
Note
The following command was ran into stata to get model goodness of fit indices
estat gof, stats(all)
Comments
Likelihood Ratio Test
(\(p-value=0.405\)), suggests no significant difference between the model and the saturated model. This model reproduces the observed data structure very well. The null hypothesis that the model fits the data is not rejected, therefore this is ideal in SEM.
RMSEA (Root Mean Square Error of Approximation)
A value of RMSEA (< 0.05 )indicates close model fit,here our value (\(RMSEA=0.001\)), which is perfect. Also, pclose = 1.000 means there’s a 100% probability that the true RMSEA is less than 0.05 — again showing excellent fit.
The 90% upper and lower bound are also within the expected range i.e \(LB<0.05\) and \(UB<0.1\) ,hence also suggesting a good model fit
CFI and TLI (Comparative Fit Index & Tucker-Lewis Index) Both indices are above 0.95 (exactly at 1.00), indicating excellent comparative fit. The model is much better than the baseline model that assumes no relationships among variables.
SRMR (Standardized Root Mean Squared Residual) SRMR < 0.08 is generally considered good. For this model \(SRMR=0.003\), indicates the perfect fit, the model predicted correlations very closely match the observed ones.
Coefficient of determination
value is \(CD=0.707\) and is quite high and significant.
The model explains 70.7% of the variance in the outcome variables indicating clinically/behaviorally meaningful predictive accuracy.
(d) Draw-up the table of results from the final SEM model and verify numerically the STATA drawn direct effects, indirect effects and total effects for “diabp” on your outcome variable “sysbp”.
e)Interpret your final SEM model and comment on whether SEM helped improve the direct model from 4c)
Comments
Final SEM Model
The final model has :
Endogenous variables Observed: prevalenthyp and sysbp
here we observe interrelationships
Exogenous variables Observed: educationlevels3 educationlevels4 male glucose heartrate diabp age bpmeds totchol
Summary of results
Direct effects on systolic blood pressure
Prevalent hyperytension has a major effect on systolic blood pressure such that those who experience this have 12.88 more systolic blood pressure as compared to their counterparts adjusting for other variables(\(\beta \approx 12.88,p=0.000\))
diastolic blood pressure has a positive significant total effect on systolic blood pressure (\(p<0.001\)) such that a unit increase in diastolic blood pressure results in 1.273 increase in systolic blood pressure adjusting for the mediatory effect of prevalent hypertension and also controlling for other variables. about \(21.45\%\) of this efffect is indirect due to prevalent hypertension and the remainder \(78.6\%\) is due to direct effect of diastolic blood pressure on systolic blood pressure
Model improvement
The \(SEM\) helped to improve since:
Root mean Square error or association(\(RMSEA=0.001<0.05\)) whict indicates a better fit.
CF1 and TLI =1 showing a perfect fit
Overally the chisquared test \(p=0.407\) improved from \(0.00\) indicating that the model is now not significantly worse than a saturated model hence our final model greatly improved
General additional effects shown on the table below:
Structural Equation Model Results with Clinical Interpretation
Outcome
Predictor
β
SE
p
Clinical Interpretation
Binary Outcome: Hypertension Status
Prevalent Hypertension
Glucose
0.0004
0.0002
0.106
NS: No significant association with hypertension risk
Prevalent Hypertension
Heart Rate
0.0018
0.0005
<0.001
Sig: Each 1 bpm increase → 0.18% higher hypertension odds
Notes: Model fit: χ²(4)=4.01, p=0.405 (Excellent fit); SRMR=0.003; CD=0.707
NS = Not Significant (p>0.05); Sig = Significant (p<0.05); STRONG = p<0.001 with large effect size
Comments
CFI and TLI (Comparative Fit Index & Tucker-Lewis Index) Both indices are above 0.95 (exactly at 1.00), indicating excellent comparative fit. The model is much better than the baseline model that assumes no relationships among variables.
SRMR (Standardized Root Mean Squared Residual) SRMR < 0.08 is generally considered good. For this model \(SRMR=0.003\), indicates the perfect fit, the model predicted correlations very closely match the observed ones.
Coefficient of determination