dat$death.age<-ifelse(dat$b5==1, ((((dat$v008))+1900)-(((dat$b3))+1900)) ,dat$b7)#censoring indicator for death by age 1, in months (12 months)dat$d.event<-ifelse(is.na(dat$b7)==T|dat$b7>12,0,1)dat$d.eventfac<-factor(dat$d.event); levels(dat$d.eventfac)<-c("Alive at 1", "Dead by 1")table(dat$d.eventfac)
Alive at 1 Dead by 1
5067 239
Estimating Survival Time Functions
#Here we see the datahead(dat[,c("death.age","d.event")], n=20)
mort<-survfit(Surv(death.age, d.event)~1, data=dat,conf.type="none")plot(mort, ylim=c(.9,1), xlim=c(0,12), main="Survival Function for Infant Mortality")
plot(cumsum(haz$haz)~haz$time, main ="Cumulative Hazard function",ylab="H(t)",xlab="Time in Months", type="l",xlim=c(0,12), lwd=2,col=3)
This figure indicates censored ages at death for children under age 1.
Kaplan-Meier survival analysis of the outcome
The grouping variable is place of birth (Urban and Rural) which is a categorical in nature. My research question is: What would be chances of survival of the infants in Nepal based on the place of birth (urban and rural area)? My null hypothesis (H0) states that infant survivorship based on rural and urban area of Nepal are not different. My H1 states that infant survivorship based on rural and urban area of Nepal are different.
Call: survfit(formula = Surv(death.age, d.event) ~ rural, data = dat,
conf.type = "log")
n events median 0.95LCL 0.95UCL
rural=0 4215 199 NA NA NA
rural=1 1091 40 NA NA NA
Call:
survdiff(formula = Surv(death.age, d.event) ~ rural, data = dat)
N Observed Expected (O-E)^2/E (O-E)^2/V
rural=0 4215 199 189.7 0.457 2.26
rural=1 1091 40 49.3 1.758 2.26
Chisq= 2.3 on 1 degrees of freedom, p= 0.1
We fail to reject null hypothesis as P value is greater than .05, indicating that children born at urban and rural parts of Nepal have similar survivor-ship.
Hence, it is evident that rural infant death is higher at the age of 0-1 months, however, the noticeable difference is prevalent at the age of 3-5 months.