1 + 1[1] 2
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
1 + 1[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).
##load libraries
library(tidyverse)── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.8 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.1
✔ readr 2.1.2 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(haven)
library(survival)
library(car)Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
library(survey)Loading required package: grid
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Attaching package: 'survey'
The following object is masked from 'package:graphics':
dotchart
library(muhaz)
library(eha)##load data
nhis9 <- read_stata("C:/Users/okabe/OneDrive/Pictures/Stats 2/nhis_00010.dta.gz")
nhis9<-zap_labels(nhis9)#recode variables
#poor
nhis9$poor<-Recode(nhis9$pooryn, recodes ="7:9=NA; 1=0;2=1",as.factor=T)
##Alcohol
nhis9$alcohol<-Recode(nhis9$alclife, recodes ="1=1;2=0;7:9=NA;0=NA")
##smoking
nhis9$smoke<-Recode(nhis9$smokev, recodes ="7:9=NA;1=1;2=0; 0=NA")
##usborn
nhis9$born<-Recode(nhis9$usborn, recodes ="96:99=NA; 20=0;10:12=1;else=NA",as.factor=T)
##educ
nhis9$education<-Recode(nhis9$educ, recodes="102='0Prim'; 100:116='1somehs'; 200:202='2hsgrad'; 300:301='3somecol'; 400='4colgrad';500:503='5masteranddoc'; else= NA", as.factor=T)
##Recode
##citizen
nhis9$mycitizen<-Recode(nhis9$citizen, recodes =" 1=1;2=0;else=NA",as.factor=T)
##Marital status
nhis9$marital<-Recode(nhis9$marstat, recodes="10='married'; 30='divorced'; 20='widowed'; 40='separated'; 50='nm'; else=NA", as.factor=T)
nhis9$marital<-relevel(nhis9$marital, ref='married')
#Age cut into intervals
nhis9$agec<-cut(nhis9$age, breaks = c( 30, 40, 50, 60, 70, 80, 100), include.lowest = T)
#race/ethnicity
nhis9$race<-Recode(nhis9$racesr, recodes="100='white'; 200='black'; else='other'", as.factor=T)
nhis9$race<-relevel(nhis9$race, ref = "white")
##Hispanic
nhis9$ethnicity<-Recode(nhis9$hispeth, recodes="10='not hispanic'; 20:70='hispanic'; else=NA", as.factor=T)
##disease
#nhis2$disease<- Recode(nhis2$mortucodld, recodes= "01= 0; 02=1; 03=2; 04=3;05=4; 06=5; 07=6; 08= 7; 09=8; 10=9; else=NA")
#nhis2$disease<- Recode(nhis2$mortucodld, recodes= "01= 'Diseases of heart'; 02='Malignant neoplasms'; 03='Chronic lower respiratory diseases'; 04='Accidents';05='Cerebrovascular disease'; 06='Alzheimers disease'; 07='Diabetes mellitus'; 08= 'Influenza and pneumonia'; 09='Nephritis'; 10='All other causes'; 96= NA; else=NA",as.factor=T )
##sex
nhis9$male<-as.factor(ifelse(nhis9$sex==1, "Male", "Female"))
nhis9$male<-relevel(nhis9$male, ref='Male')
##Earnings
#nhis9$inc<-Recode(brfss_17$incomg, recodes = "9= NA;1='1_lt15k'; 2='2_15-25k';3='3_25-35k';4='4_35-50k';5='5_50kplus'", as.factor = T)
##employment status
nhis9$employed<- Recode(nhis9$empstat, recodes= " 100:122='unemployed'; 200:217='Employed';220='notinlaborforce'; else=NA", as.factor=T)Submit either a link to an Rpub that has your homework published, or as a emailed Word document Please answer the following questions, use appropriate figures and statistical output to answer the questions.
Parametric models 1) Carry out the following analysis:
Define your outcome as in HW 1. Also consider what covariates are hypothesized to affect the outcome variable. Define these and construct a parametric model for your outcome.
The outcome that I will looking at is mortality. To see if citizenship status and SES influence mortality outcomes of immigrants
Fit the parametric model of your choosing to the data.
#aft distribution for hazard
#complete.cases(nhis9$mycitizen, nhis9$education, nhis9$mortstat,nhis9$mortdody,nhis9$poor, nhis9$male)
nhis9 <- nhis9 %>%
filter(mortelig==1)%>%
mutate(death_age = ifelse( mortstat ==1,
mortdody - (year - age),
2018 - (year - age)),
d.event = ifelse(mortstat == 1, 1, 0))%>%
filter(complete.cases(mycitizen,education,poor,male))
library(survival)
age_fit <- survfit(Surv(death_age, d.event)~ 1,
data = nhis9)
library(ggsurvfit)
age_fit %>%
ggsurvfit() +
add_confidence_interval(type = "ribbon") +
add_quantile() ##using WEIBULL
fit.4<-phreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
data=nhis9,
dist="weibull")
summary(fit.4)Covariate Mean Coef Rel.Risk S.E. LR p
mycitizen 0.0001
0 0.881 0 1 (reference)
1 0.119 -0.291 0.747 0.077
education 0.0000
0Prim 0.008 0 1 (reference)
1somehs 0.193 0.487 1.628 0.174
2hsgrad 0.307 0.543 1.720 0.175
3somecol 0.199 0.580 1.785 0.177
4colgrad 0.186 0.210 1.234 0.179
5masteranddoc 0.107 0.035 1.036 0.183
poor 0.0000
0 0.852 0 1 (reference)
1 0.148 0.394 1.483 0.042
male 0.0000
Male 0.477 0 1 (reference)
Female 0.523 -0.401 0.670 0.031
Events 4381
Total time at risk 2638001
Max. log. likelihood -22702
LR test statistic 355.90
Degrees of freedom 8
Overall p-value 0
plot(fit.4)fit.gom<-phreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
data=nhis9,
dist="gompertz")Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
summary(fit.gom)Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Warning in Hgompertz(exit, scale = escale, shape = 1, param = "canonical"): Non-
positive shape or scale
Warning in Hgompertz(enter, scale = escale, shape = 1, param = "canonical"):
Non-positive shape or scale
Covariate Mean Coef Rel.Risk S.E. LR p
mycitizen 0.0001
0 0.881 0 1 (reference)
1 0.119 -0.284 0.753 0.077
education 0.0000
0Prim 0.008 0 1 (reference)
1somehs 0.193 0.505 1.656 0.174
2hsgrad 0.307 0.578 1.782 0.175
3somecol 0.199 0.624 1.867 0.177
4colgrad 0.186 0.252 1.286 0.179
5masteranddoc 0.107 0.098 1.103 0.183
poor 0.0000
0 0.852 0 1 (reference)
1 0.148 0.383 1.467 0.042
male 0.0000
Male 0.477 0 1 (reference)
Female 0.523 -0.421 0.656 0.031
Events 4381
Total time at risk 2638001
Max. log. likelihood -22373
LR test statistic 356.11
Degrees of freedom 8
Overall p-value 0
plot(fit.gom)I choose PH because the data fit and ran best.
The PH model provides various function such as Gompertz and Weibull that provide descriptives and graphics to show how the data fits as well as show mortality outcomes based on predictors used for this homework which include(sex, citizenship status and education). Additionally, it helps measure the probability of failing at any given time. In this case, the probability of experiencing mortality based on citizenship status and SES.
AIC(fit.4)[1] 45423.61
AIC(fit.gom)[1] 44766.79
Gompertz fits best compared to the rest of the models based on its low AIC.
Include all main effects in the model
Test for an interaction between at least two of the predictors
The predictors used are citizenship status, SES (education,poverty status) and sex
Weibull Interpretation:
The results above show that comparing to citizens, non citizens are less likely to experience mortality, similarly,this is similar to females when they are compared to men. However, when looking a the poverty threshold, individuals living below poverty are more likely to experience mortality. Based on education status, although all education groups have an increased risk of experiencing mortality, the probability decreases the high the education status an individual has. With all the predictor variables showing statistical significance
The Weibull graph shows an increased risk in mortality based on time.
Gompertz Interpretation:
The Gompertz interpretation is similar to the Weibull. Where we see that non citizens and females are less likely to experience mortality. Similarly, higher education reduces the risk of experiencing mortality.
g.Provide tabular and graphical output to support your conclusions
emp<-coxreg(Surv(death_age, d.event)~mycitizen+education+poor+male,
data=nhis9)
check.dist(sp=emp,pp=fit.gom, main = " Empirical vs.Gompertz")##check Weibull
check.dist(sp=emp,pp=fit.4, main = " Empirical vs.Weibull")While looking at the two graphs, we see that Gompertz fits the data better compared to Weibull