This example will illustrate how to test for differences between survival functions estimated by the Kaplan-Meier product limit estimator. The tests all follow the methods described by Harrington and Fleming (1982) Link.
The first example will use as its outcome variable, the event of a child dying before age 1. The data for this example come from the Haitian Demographic and Health Survey for 2012 children’s recode file. This file contains information for all births in the last 5 years prior to the survey.
The second example, we will examine how to calculate the survival function for a longitudinally collected data set. Here I use data from the ECLS-K. Specifically, we will examine the transition into poverty between kindergarten and fifth grade.
#Example 1
library(foreign)
library(survival)
## Loading required package: splines
library(lattice)
library(car)
haiti<-read.dta("/Users/ozd504/Google Drive/dem7223/data//HTKR61FL.DTA", convert.factors = F)
In the DHS, they record if a child is dead or alive and the age at death if the child is dead. If the child is alive at the time of interview, B5==1, then the age at death is censored. If the age at death is censored, then the age at the date of interview (censored age at death) is the date of the interview - date of birth (in months). If the child is dead at the time of interview, B5!=1, then the age at death in months is the B7. Here we code this:
haiti$death.age<-ifelse(haiti$b5==1,
((((haiti$v008))+1900)-(((haiti$b3))+1900))
,haiti$b7)
#censoring indicator for death by age 1, in months (12 months)
haiti$d.event<-ifelse(is.na(haiti$b7)==T|haiti$b7>12,0,1)
haiti$d.eventfac<-factor(haiti$d.event); levels(haiti$d.eventfac)<-c("Alive at 1", "Dead by 1")
table(haiti$d.eventfac)
##
## Alive at 1 Dead by 1
## 6819 428
We will now test for differences in survival by characteristics of the household. First we will examine whether the survival chances are the same for children in relatively high ses (in material terms) households, compared to those in relatively low-ses households.
haiti$highses<-recode(haiti$v190, recodes ="1:3 = 0; 4:5=1; else=NA")
fit1<-survfit(Surv(death.age, d.event)~highses, data=haiti)
plot(fit1, ylim=c(.9,1), xlim=c(0,14), col=c(1,2), conf.int=F)
title(main="Survival Function for Infant Mortality", sub="Low vs. High SES Households")
legend("topright", legend = c("Low SES", "High SES"), col=c(1,2), lty=1)
summary(fit1)
## Call: survfit(formula = Surv(death.age, d.event) ~ highses, data = haiti)
##
## highses=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 5270 175 0.967 0.00247 0.962 0.972
## 1 5054 15 0.964 0.00257 0.959 0.969
## 2 4960 15 0.961 0.00267 0.956 0.966
## 3 4854 20 0.957 0.00280 0.952 0.963
## 4 4736 9 0.955 0.00286 0.950 0.961
## 5 4618 11 0.953 0.00294 0.947 0.959
## 6 4491 19 0.949 0.00307 0.943 0.955
## 7 4372 15 0.946 0.00317 0.939 0.952
## 8 4255 5 0.945 0.00320 0.938 0.951
## 9 4148 11 0.942 0.00328 0.936 0.949
## 10 4027 2 0.942 0.00330 0.935 0.948
## 11 3941 7 0.940 0.00335 0.933 0.947
## 12 3856 13 0.937 0.00345 0.930 0.944
##
## highses=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1977 57 0.971 0.00376 0.964 0.979
## 1 1901 7 0.968 0.00398 0.960 0.975
## 2 1869 4 0.966 0.00411 0.958 0.974
## 3 1839 3 0.964 0.00420 0.956 0.972
## 4 1791 7 0.960 0.00442 0.952 0.969
## 5 1740 4 0.958 0.00455 0.949 0.967
## 6 1692 7 0.954 0.00477 0.945 0.963
## 7 1655 5 0.951 0.00492 0.942 0.961
## 8 1613 5 0.948 0.00508 0.938 0.958
## 9 1570 5 0.945 0.00524 0.935 0.955
## 10 1534 1 0.945 0.00527 0.934 0.955
## 11 1506 1 0.944 0.00531 0.934 0.954
## 12 1478 5 0.941 0.00548 0.930 0.952
Gives us the basic survival plot. Next we will use survtest() to test for differences between the two or more groups.
#two group compairison
survdiff(Surv(death.age, d.event)~highses, data=haiti)
## Call:
## survdiff(formula = Surv(death.age, d.event) ~ highses, data = haiti)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## highses=0 5270 317 311 0.119 0.445
## highses=1 1977 111 117 0.317 0.445
##
## Chisq= 0.4 on 1 degrees of freedom, p= 0.505
Which is the log-rank test on the survival times. In this case, we see no difference in survival status based on household SES. How about rural vs urban residence?
table(haiti$v025)
##
## 1 2
## 2464 4783
haiti$rural<-recode(haiti$v025, recodes ="2 = 1; 1=0; else=NA")
fit2<-survfit(Surv(death.age, d.event)~rural, data=haiti)
plot(fit2, ylim=c(.9,1), xlim=c(0,14), col=c(1,2), conf.int=F)
title(main="Survival Function for Infant Mortality", sub="Rural vs Urban Residence")
legend("topright", legend = c("Urban","Rural" ), col=c(1,2), lty=1)
summary(fit2)
## Call: survfit(formula = Surv(death.age, d.event) ~ rural, data = haiti)
##
## rural=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 2464 87 0.965 0.00372 0.957 0.972
## 1 2363 11 0.960 0.00394 0.953 0.968
## 2 2316 8 0.957 0.00410 0.949 0.965
## 3 2278 6 0.954 0.00421 0.946 0.963
## 4 2215 9 0.950 0.00439 0.942 0.959
## 5 2155 7 0.947 0.00453 0.939 0.956
## 6 2085 7 0.944 0.00467 0.935 0.953
## 7 2041 11 0.939 0.00489 0.930 0.949
## 8 1992 5 0.937 0.00499 0.927 0.947
## 9 1942 8 0.933 0.00515 0.923 0.943
## 10 1891 1 0.932 0.00517 0.922 0.943
## 11 1859 2 0.931 0.00522 0.921 0.942
## 12 1819 11 0.926 0.00545 0.915 0.937
##
## rural=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 4783 145 0.970 0.00248 0.965 0.975
## 1 4592 11 0.967 0.00257 0.962 0.972
## 2 4513 11 0.965 0.00266 0.960 0.970
## 3 4415 17 0.961 0.00280 0.956 0.967
## 4 4312 7 0.960 0.00286 0.954 0.965
## 5 4203 8 0.958 0.00292 0.952 0.964
## 6 4098 19 0.953 0.00308 0.947 0.960
## 7 3986 9 0.951 0.00316 0.945 0.958
## 8 3876 5 0.950 0.00320 0.944 0.956
## 9 3776 8 0.948 0.00327 0.942 0.955
## 10 3670 2 0.948 0.00329 0.941 0.954
## 11 3588 6 0.946 0.00335 0.939 0.953
## 12 3515 7 0.944 0.00342 0.937 0.951
#Two- sample test
survdiff(Surv(death.age, d.event)~rural, data=haiti)
## Call:
## survdiff(formula = Surv(death.age, d.event) ~ rural, data = haiti)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## rural=0 2464 173 145 5.24 8.09
## rural=1 4783 255 283 2.70 8.09
##
## Chisq= 8.1 on 1 degrees of freedom, p= 0.00446
Which shows a significant difference between children living in rural (higher survival to age 1) versus urban (lower survival to age 1). This may be suggestive that children in urban areas may live in poorer environmental conditions.
Next we illustrate a k-sample test, but this time we don’t dichotomize household SES
table(haiti$v190)
##
## 1 2 3 4 5
## 1994 1590 1686 1208 769
fit3<-survfit(Surv(death.age, d.event)~v190, data=haiti)
summary(fit3)
## Call: survfit(formula = Surv(death.age, d.event) ~ v190, data = haiti)
##
## v190=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1994 60 0.970 0.00383 0.962 0.977
## 1 1915 4 0.968 0.00395 0.960 0.976
## 2 1883 4 0.966 0.00407 0.958 0.974
## 3 1853 11 0.960 0.00440 0.952 0.969
## 4 1803 3 0.958 0.00449 0.950 0.967
## 5 1760 2 0.957 0.00455 0.949 0.966
## 6 1716 5 0.955 0.00470 0.945 0.964
## 7 1677 6 0.951 0.00489 0.942 0.961
## 8 1623 2 0.950 0.00495 0.940 0.960
## 9 1572 5 0.947 0.00512 0.937 0.957
## 11 1487 2 0.946 0.00519 0.936 0.956
## 12 1457 5 0.942 0.00537 0.932 0.953
##
## v190=2
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1590 43 0.973 0.00407 0.965 0.981
## 1 1534 3 0.971 0.00421 0.963 0.979
## 2 1506 5 0.968 0.00443 0.959 0.977
## 3 1464 4 0.965 0.00461 0.956 0.974
## 4 1431 1 0.965 0.00466 0.955 0.974
## 5 1392 3 0.962 0.00480 0.953 0.972
## 6 1355 6 0.958 0.00509 0.948 0.968
## 7 1324 2 0.957 0.00518 0.947 0.967
## 8 1293 1 0.956 0.00523 0.946 0.966
## 9 1268 3 0.954 0.00538 0.943 0.964
## 10 1227 1 0.953 0.00543 0.942 0.964
## 11 1196 3 0.951 0.00559 0.940 0.962
## 12 1172 2 0.949 0.00569 0.938 0.960
##
## v190=3
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1686 72 0.957 0.00492 0.948 0.967
## 1 1605 8 0.953 0.00518 0.942 0.963
## 2 1571 6 0.949 0.00537 0.938 0.959
## 3 1537 5 0.946 0.00553 0.935 0.957
## 4 1502 5 0.943 0.00568 0.932 0.954
## 5 1466 6 0.939 0.00588 0.927 0.950
## 6 1420 8 0.934 0.00613 0.922 0.946
## 7 1371 7 0.929 0.00636 0.916 0.941
## 8 1339 2 0.927 0.00643 0.915 0.940
## 9 1308 3 0.925 0.00653 0.913 0.938
## 10 1275 1 0.924 0.00656 0.912 0.937
## 11 1258 2 0.923 0.00663 0.910 0.936
## 12 1227 6 0.919 0.00685 0.905 0.932
##
## v190=4
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1208 40 0.967 0.00515 0.957 0.977
## 1 1156 6 0.962 0.00551 0.951 0.973
## 2 1141 3 0.959 0.00569 0.948 0.971
## 3 1125 2 0.958 0.00581 0.946 0.969
## 4 1099 7 0.952 0.00621 0.939 0.964
## 5 1066 2 0.950 0.00633 0.937 0.962
## 6 1042 6 0.944 0.00667 0.931 0.957
## 7 1017 3 0.941 0.00684 0.928 0.955
## 8 994 3 0.939 0.00702 0.925 0.953
## 9 969 2 0.937 0.00713 0.923 0.951
## 10 947 1 0.936 0.00719 0.922 0.950
## 12 914 3 0.933 0.00739 0.918 0.947
##
## v190=5
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 769 17 0.978 0.00530 0.968 0.988
## 1 745 1 0.977 0.00545 0.966 0.987
## 2 728 1 0.975 0.00561 0.964 0.986
## 3 714 1 0.974 0.00577 0.963 0.985
## 5 674 2 0.971 0.00610 0.959 0.983
## 6 650 1 0.969 0.00627 0.957 0.982
## 7 638 2 0.966 0.00661 0.954 0.979
## 8 619 2 0.963 0.00695 0.950 0.977
## 9 601 3 0.959 0.00745 0.944 0.973
## 11 578 1 0.957 0.00762 0.942 0.972
## 12 564 2 0.953 0.00796 0.938 0.969
plot(fit3, ylim=c(.9,1), xlim=c(0,14), col=1:5, conf.int=F)
title(main="Survival Function for Infant Mortality", sub="Household SES")
legend("topright", legend = c("Lowest","Low", "Median", "Higher", "Highest" ), col=1:5, lty=1)
#Two- sample test
survdiff(Surv(death.age, d.event)~v190, data=haiti)
## Call:
## survdiff(formula = Surv(death.age, d.event) ~ v190, data = haiti)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## v190=1 1994 109 117.9 0.678 0.953
## v190=2 1590 77 94.0 3.069 4.007
## v190=3 1686 131 99.0 10.356 13.728
## v190=4 1208 78 71.7 0.554 0.678
## v190=5 769 33 45.4 3.385 3.858
##
## Chisq= 18.4 on 4 degrees of freedom, p= 0.00104
Which shows variation in survival when SES is treated as quintiles, as the DHS defines it. The biggest difference we see is between the highest (light blue) and the median (green ) groups.
Lastly, we examine comparing survival across multiple variables, in this case the
haiti$secedu<-recode(haiti$v106, recodes ="2:3 = 1; 0:1=0; else=NA")
fit4<-survfit(Surv(death.age, d.event)~rural+secedu, data=haiti)
summary(fit4)
## Call: survfit(formula = Surv(death.age, d.event) ~ rural + secedu,
## data = haiti)
##
## rural=0, secedu=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1194 43 0.964 0.00539 0.953 0.975
## 1 1147 5 0.960 0.00569 0.949 0.971
## 2 1124 5 0.956 0.00597 0.944 0.967
## 3 1109 3 0.953 0.00614 0.941 0.965
## 4 1078 6 0.948 0.00648 0.935 0.960
## 5 1051 5 0.943 0.00675 0.930 0.956
## 6 1016 4 0.939 0.00698 0.926 0.953
## 7 995 7 0.933 0.00736 0.918 0.947
## 8 972 1 0.932 0.00742 0.917 0.946
## 9 948 3 0.929 0.00759 0.914 0.944
## 10 926 1 0.928 0.00764 0.913 0.943
## 12 898 7 0.921 0.00806 0.905 0.937
##
## rural=0, secedu=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1270 44 0.965 0.00513 0.955 0.975
## 1 1216 6 0.961 0.00546 0.950 0.971
## 2 1192 3 0.958 0.00562 0.947 0.969
## 3 1169 3 0.956 0.00579 0.944 0.967
## 4 1137 3 0.953 0.00595 0.942 0.965
## 5 1104 2 0.951 0.00606 0.940 0.963
## 6 1069 3 0.949 0.00624 0.937 0.961
## 7 1046 4 0.945 0.00647 0.933 0.958
## 8 1020 4 0.941 0.00671 0.928 0.955
## 9 994 5 0.937 0.00700 0.923 0.951
## 11 946 2 0.935 0.00713 0.921 0.949
## 12 921 4 0.931 0.00738 0.916 0.945
##
## rural=1, secedu=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 3734 108 0.971 0.00274 0.966 0.976
## 1 3589 9 0.969 0.00285 0.963 0.974
## 2 3532 10 0.966 0.00297 0.960 0.972
## 3 3462 15 0.962 0.00315 0.956 0.968
## 4 3383 7 0.960 0.00323 0.953 0.966
## 5 3295 6 0.958 0.00331 0.952 0.964
## 6 3217 16 0.953 0.00350 0.946 0.960
## 7 3137 7 0.951 0.00358 0.944 0.958
## 8 3056 2 0.950 0.00360 0.943 0.958
## 9 2977 7 0.948 0.00369 0.941 0.955
## 10 2893 1 0.948 0.00371 0.941 0.955
## 11 2828 5 0.946 0.00378 0.939 0.954
## 12 2771 6 0.944 0.00386 0.937 0.952
##
## rural=1, secedu=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 1049 37 0.965 0.00570 0.954 0.976
## 1 1003 2 0.963 0.00584 0.951 0.974
## 2 981 1 0.962 0.00592 0.950 0.973
## 3 953 2 0.960 0.00608 0.948 0.972
## 5 908 2 0.958 0.00625 0.946 0.970
## 6 881 3 0.954 0.00650 0.942 0.967
## 7 849 2 0.952 0.00668 0.939 0.965
## 8 820 3 0.949 0.00695 0.935 0.962
## 9 799 1 0.948 0.00704 0.934 0.961
## 10 777 1 0.946 0.00714 0.932 0.960
## 11 760 1 0.945 0.00724 0.931 0.959
## 12 744 1 0.944 0.00734 0.930 0.958
plot(fit4, ylim=c(.9,1), xlim=c(0,14), col=c(1,1,2,2),lty=c(1,2,1,2), conf.int=F)
title(main="Survival Function for Infant Mortality", sub="Rural/Urban * Mother's Education")
legend("topright", legend = c("Urban, Low Edu","Urban High Edu", "Rural, Low Edu","Rural High Edu" ), col=c(1,1,2,2),lty=c(1,2,1,2))
# test
survdiff(Surv(death.age, d.event)~rural+secedu, data=haiti)
## Call:
## survdiff(formula = Surv(death.age, d.event) ~ rural + secedu,
## data = haiti)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## rural=0, secedu=0 1194 90 70.7 5.289 6.455
## rural=0, secedu=1 1270 83 74.7 0.915 1.129
## rural=1, secedu=0 3734 199 221.2 2.232 4.708
## rural=1, secedu=1 1049 56 61.4 0.471 0.561
##
## Chisq= 9.1 on 3 degrees of freedom, p= 0.0283
Which shows a significant differenc between at least two of the groups, in this case, I would say that it’s most likely finding differences between the Urban, low Education and the Rural.
In this example, we will examine how to calculate the survival function for a longitudinally collected data set. Here I use data from the ECLS-K. Specifically, we will examine the transition into poverty between kindergarten and third grade.
First we load our data
load("~/Google Drive/dem7903_App_Hier/data/eclsk.Rdata")
names(eclsk)<-tolower(names(eclsk))
library (car)
library(survival)
#get out only the variables I'm going to use for this example
myvars<-c( "childid","gender", "race", "r1_kage","r4age", "r5age", "r6age", "r7age","c1r4mtsc", "c4r4mtsc", "c5r4mtsc", "c6r4mtsc", "c7r4mtsc", "w1povrty","w1povrty","w3povrty", "w5povrty", "w8povrty","wkmomed", "s2_id")
eclsk<-eclsk[,myvars]
eclsk$age1<-ifelse(eclsk$r1_kage==-9, NA, eclsk$r1_kage/12)
eclsk$age2<-ifelse(eclsk$r4age==-9, NA, eclsk$r4age/12)
#for the later waves, the NCES group the ages into ranges of months, so 1= <105 months, 2=105 to 108 months. So, I fix the age at the midpoint of the interval they give, and make it into years by dividing by 12
eclsk$age3<-recode(eclsk$r5age,recodes="1=105; 2=107; 3=109; 4=112; 5=115; 6=117; -9=NA")/12
eclsk$pov1<-ifelse(eclsk$w1povrty==1,1,0)
eclsk$pov2<-ifelse(eclsk$w3povrty==1,1,0)
eclsk$pov3<-ifelse(eclsk$w5povrty==1,1,0)
#Recode race with white, non Hispanic as reference using dummy vars
eclsk$hisp<-recode (eclsk$race, recodes="3:4=1;-9=NA; else=0")
eclsk$black<-recode (eclsk$race, recodes="2=1;-9=NA; else=0")
eclsk$asian<-recode (eclsk$race, recodes="5=1;-9=NA; else=0")
eclsk$nahn<-recode (eclsk$race, recodes="6:7=1;-9=NA; else=0")
eclsk$other<-recode (eclsk$race, recodes="8=1;-9=NA; else=0")
eclsk$male<-recode(eclsk$gender, recodes="1=1; 2=0; -9=NA")
eclsk$mlths<-recode(eclsk$wkmomed, recodes = "1:2=1; 3:9=0; else = NA")
eclsk$mgths<-recode(eclsk$wkmomed, recodes = "1:3=0; 4:9=1; else =NA")
Now, I need to form the transition variable, this is my event variable, and in this case it will be 1 if a child enters poverty between the first wave of the data and the third grade wave, and 0 otherwise. NOTE I need to remove any children who are already in poverty age wave 1, because they are not at risk of experiencing this particular transition.
eclsk<-subset(eclsk, is.na(pov1)==F&is.na(pov2)==F&is.na(pov3)==F&is.na(age1)==F&is.na(age2)==F&is.na(age3)==F&pov1!=1)
eclsk$povtran1<-ifelse(eclsk$pov1==0&eclsk$pov2==0, 0,1)
eclsk$povtran2<-ifelse(eclsk$pov1==0&eclsk$pov3==0, 0,1)
Now we do the entire data set. To analyze data longitudinally, we need to reshape the data from the current “wide” format (repeated measures in columns) to a “long” format (repeated observations in rows). The reshape() function allows us to do this easily. It allows us to specify our repeated measures, time varying covariates as well as time-constant covariates.
e.long<-reshape(eclsk, idvar="childid", varying=list(age=c("age1","age2"), age2=c("age2", "age3"), povtran=c("povtran1", "povtran2")), times=1:2, direction="long" , drop = names(eclsk)[4:20])
e.long<-e.long[order(e.long$childid, e.long$time),]
#find which kids failed in the first time period and remove them from the second risk period risk set
failed1<-which(e.long$povtran1==1&e.long$time==2)
e.long<-e.long[-failed1,]
e.long$age1r<-round(e.long$age1, 0)
e.long$age2r<-round(e.long$age2, 0)
head(e.long, n=10)
## childid gender race pov1 pov2 pov3 hisp black asian nahn other
## 0001002C.1 0001002C 2 1 0 0 0 0 0 0 0 0
## 0001002C.2 0001002C 2 1 0 0 0 0 0 0 0 0
## 0001007C.1 0001007C 1 1 0 0 0 0 0 0 0 0
## 0001007C.2 0001007C 1 1 0 0 0 0 0 0 0 0
## 0001010C.1 0001010C 2 1 0 0 0 0 0 0 0 0
## 0001010C.2 0001010C 2 1 0 0 0 0 0 0 0 0
## 0002004C.1 0002004C 2 1 0 0 0 0 0 0 0 0
## 0002004C.2 0002004C 2 1 0 0 0 0 0 0 0 0
## 0002006C.1 0002006C 2 1 0 1 1 0 0 0 0 0
## 0002008C.1 0002008C 2 1 0 0 0 0 0 0 0 0
## male mlths mgths time age1 age2 povtran1 age1r age2r
## 0001002C.1 0 0 0 1 6.433333 7.894167 0 6 8
## 0001002C.2 0 0 0 2 7.894167 9.750000 0 8 10
## 0001007C.1 1 0 1 1 5.300000 6.825000 0 5 7
## 0001007C.2 1 0 1 2 6.825000 8.916667 0 7 9
## 0001010C.1 0 0 1 1 5.885833 7.385833 0 6 7
## 0001010C.2 0 0 1 2 7.385833 9.333333 0 7 9
## 0002004C.1 0 0 1 1 5.450000 6.977500 0 5 7
## 0002004C.2 0 0 1 2 6.977500 9.083333 0 7 9
## 0002006C.1 0 NA NA 1 5.158333 6.685833 1 5 7
## 0002008C.1 0 0 1 1 5.905833 7.433333 0 6 7
#poverty transition based on mother's education at time 1.
fit<-survfit(Surv(time = age2r, event = povtran1)~mlths, e.long)
summary(fit)
## Call: survfit(formula = Surv(time = age2r, event = povtran1) ~ mlths,
## data = e.long)
##
## 258 observations deleted due to missingness
## mlths=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 6 12599 3 1.000 0.000137 0.999 1.000
## 7 12558 221 0.982 0.001181 0.980 0.984
## 8 7753 83 0.972 0.001638 0.968 0.975
##
## mlths=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 756 82 0.892 0.0113 0.870 0.914
## 8 430 25 0.840 0.0147 0.811 0.869
plot(fit, col=c(2,3),ylim=c(.7,1), lwd=2 , main="Survival function for poverty transition, K-5th Grade")
legend(x =0, y=.8,col = c(2,3),lty=1,lwd=2 ,legend=c("Mom HS or more", "Mom < HS"))
survdiff(Surv(time = age2r, event = povtran1)~mlths, e.long)
## Call:
## survdiff(formula = Surv(time = age2r, event = povtran1) ~ mlths,
## data = e.long)
##
## n=13358, 258 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## mlths=0 12599 307 390.9 18 330
## mlths=1 759 107 23.1 306 330
##
## Chisq= 330 on 1 degrees of freedom, p= 0
fit2<-survfit(Surv(time = age2r, event = povtran1)~mlths+black, e.long)
summary(fit2)
## Call: survfit(formula = Surv(time = age2r, event = povtran1) ~ mlths +
## black, data = e.long)
##
## 260 observations deleted due to missingness
## mlths=0, black=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 6 11807 3 1.000 0.000147 0.999 1.000
## 7 11770 176 0.985 0.001128 0.983 0.987
## 8 7313 71 0.975 0.001588 0.972 0.978
##
## mlths=0, black=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 786 45 0.943 0.00829 0.927 0.959
## 8 438 12 0.917 0.01091 0.896 0.939
##
## mlths=1, black=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 694 72 0.896 0.0116 0.874 0.919
## 8 399 23 0.845 0.0151 0.815 0.875
##
## mlths=1, black=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 7 62 10 0.839 0.0467 0.752 0.935
## 8 31 2 0.785 0.0573 0.680 0.905
plot(fit2, col=c(2,3,2,3),lty=c(1,1,2,2),ylim=c(.7,1), lwd=2 )
title(main="Survival function for poverty transition, K-5th Grade", sub="By Race and Mother's Education")
legend(x =0, y=.8,col=c(2,3,2,3),lty=c(1,1,2,2),lwd=2 ,legend=c("Mom > HS & Not Black", "Mom > HS & Black", "Mom < HS & Not Black", "Mom < HS & Black"))
survdiff(Surv(time = age2r, event = povtran1)~mlths+black, e.long)
## Call:
## survdiff(formula = Surv(time = age2r, event = povtran1) ~ mlths +
## black, data = e.long)
##
## n=13356, 260 observations deleted due to missingness.
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## mlths=0, black=0 11807 250 367.10 37.4 336.5
## mlths=0, black=1 790 57 23.85 46.1 49.9
## mlths=1, black=0 697 95 21.22 256.5 275.9
## mlths=1, black=1 62 12 1.83 56.3 57.8
##
## Chisq= 404 on 3 degrees of freedom, p= 0