Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
nhis <-read_stata("C:/Users/maman/OneDrive - University of Texas at San Antonio/Event History Analysis Data/nhis_00002.dta.gz")nhis <- haven::zap_labels(nhis)nhis <- nhis %>%filter(mortelig ==1)nhis$sex1 <-Recode(nhis$sex,recodes="1='male'; 2='female'; else=NA")nhis$cit <-Recode(nhis$citizen,recodes="1='no'; 2='yes'; else=NA")nhis$obese <-Recode(nhis$bmicat,recodes="1:3='no'; 4='yes'; else=NA")nhis$education <-Recode(nhis$educrec2,recodes="10:41='less than hs'; 42='hs grad'; 50:53= 'some college'; 54='college degree'; 60='more than college'; else=NA")nhis$pov <-Recode(nhis$pooryn,recodes="1='at or above'; 2='below'; else=NA")nhis$smokstat <-Recode(nhis$smokfreqnow,recodes="1='non-smoker'; 2='current smoker'; 3='evry day smoker'; else=NA")nhis$marstat1 <-Recode(nhis$marstat,recodes="10:13='married'; 20='widowed'; 30='divorced'; 40='separated'; 50='never married'; else=NA")View(nhis)
Define your outcome as in HW 1. Also consider what covariates are hypothesized to affect the outcome variable.
Define these and construct a parametric model for your outcome. Fit the parametric model of your choosing to the data.
Did you choose an AFT or PH model and why?
The PH model will be used for this project because the aim of this project is to examine the mortality risk associated with obesity status across Hispanic subgroups.
Justify what parametric distribution you choose.
The Gompertz distribution will be used since it’s a widely used model for adult mortality.
Covariate Mean Coef Rel.Risk S.E. LR p
obese_yn 0.339 0.340 1.406 0.177 0.0596
Events 144
Total time at risk 422284
Max. log. likelihood -967.95
LR test statistic 3.55
Degrees of freedom 1
Overall p-value 0.0595613
Covariate Mean Coef Rel.Risk S.E. LR p
obese_yn 0.339 0.295 1.343 0.177 0.1017
Events 144
Total time at risk 422284
Max. log. likelihood -983.34
LR test statistic 2.68
Degrees of freedom 1
Overall p-value 0.101735
plot(fit.2)
AIC(fit.1)
[1] 1941.894
AIC(fit.2)
[1] 1972.689
According to AIC the model with the gompertz distribution is the better fit.
Include all main effects in the model.
Test for an interaction between at least two of the predictors.
Covariate Mean Coef Rel.Risk S.E. LR p
obese_yn 0.339 0.345 1.412 0.178 0.0571
sex2 0.436 0.721 2.056 0.169 0.0000
pov1 0.223 0.218 1.244 0.183 0.2391
Events 144
Total time at risk 422284
Max. log. likelihood -958.45
LR test statistic 22.55
Degrees of freedom 3
Overall p-value 5.01399e-05
plot(fit.3)
Interpret your results and write them up.
Provide tabular and graphical output to support your conclusions.
summary(fit.2)
Covariate Mean Coef Rel.Risk S.E. LR p
obese_yn 0.339 0.295 1.343 0.177 0.1017
Events 144
Total time at risk 422284
Max. log. likelihood -983.34
LR test statistic 2.68
Degrees of freedom 1
Overall p-value 0.101735