Event History- HW 2

Event History Analysis Plan

Rebecca Luttinen

Carry out the following analysis
Kaplan-Meier survival analysis of the outcome
1. Define a grouping variable, this can be dichotomous or categorical.
  1. Women who have experienced emotional IPV and women that have not
    
    -goal is to create a dichotomous variable for those that have experienced IPV and those that have not
  2. Do you have a research hypothesis about the survival patterns for the levels of the categorical variable? State it.
    
    -Hypothesis: women that have not experienced IPV are more likely to experience birth sooner
2. Comparison of Kaplan-Meier survival across grouping variables in your data. Interpret your results.

#read in data from DHS in Uganda 2011 and 2016 and combine into one data-set
library(haven)
library(knitr)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.0      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(car)

Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

library(dplyr)


ugtotal <- read_dta("C:/Users/rlutt/Downloads/UGIR7BFL.DTA")

ugtotal<-zap_labels(ugtotal)

#cutdown data-set to only use variables of interest

#recode variables

#emotional ipv & births

sub<-ugtotal%>%
  transmute(int.cmc=v008,
            emoipv=d104,
              fbir.cmc=b3_01,
                 sbir.cmc=b3_02, 
                 weight=v005/1000000,
                 psu=v021,
                 strata=v022)
select(sub,int.cmc,fbir.cmc, sbir.cmc, emoipv, weight, psu, strata)

# A tibble: 18,506 × 7
   int.cmc fbir.cmc sbir.cmc emoipv weight   psu strata
     <dbl>    <dbl>    <dbl>  <dbl>  <dbl> <dbl>  <dbl>
 1    1400       NA       NA     NA   1.10     1      1
 2    1400     1381     1330     NA   1.10     1      1
 3    1400     1208     1173     NA   1.10     1      1
 4    1400     1389       NA     NA   1.10     1      1
 5    1400       NA       NA     NA   1.10     1      1
 6    1400       NA       NA     NA   1.10     1      1
 7    1400     1380     1358     NA   1.10     1      1
 8    1400     1329     1293      0   1.10     1      1
 9    1400     1324       NA     NA   1.10     1      1
10    1400     1364     1316      1   1.10     1      1
# … with 18,496 more rows

#birth intervals
ugtotal%>%
mutate(birthint = (b3_01- b3_02))

# A tibble: 18,506 × 4,973
   caseid     v000   v001  v002  v003  v004   v005  v006  v007  v008 v008a  v009
   <chr>      <chr> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 "    0001… UG7       1     1     2     1 1.10e6     8  2016  1400 42609     9
 2 "    0001… UG7       1     3     2     1 1.10e6     8  2016  1400 42613     7
 3 "    0001… UG7       1     4     1     1 1.10e6     8  2016  1400 42609    12
 4 "    0001… UG7       1     4     2     1 1.10e6     8  2016  1400 42609     7
 5 "    0001… UG7       1     4     3     1 1.10e6     8  2016  1400 42609     8
 6 "    0001… UG7       1     4     4     1 1.10e6     8  2016  1400 42611     2
 7 "    0001… UG7       1     4     6     1 1.10e6     8  2016  1400 42611     1
 8 "    0001… UG7       1     4     7     1 1.10e6     8  2016  1400 42609     2
 9 "    0001… UG7       1     4     8     1 1.10e6     8  2016  1400 42609     5
10 "    0001… UG7       1     5     1     1 1.10e6     8  2016  1400 42609     1
# … with 18,496 more rows, and 4,961 more variables: v010 <dbl>, v011 <dbl>,
#   v012 <dbl>, v013 <dbl>, v014 <dbl>, v015 <dbl>, v016 <dbl>, v017 <dbl>,
#   v018 <dbl>, v019 <dbl>, v019a <dbl>, v020 <dbl>, v021 <dbl>, v022 <dbl>,
#   v023 <dbl>, v024 <dbl>, v025 <dbl>, v026 <dbl>, v027 <dbl>, v028 <dbl>,
#   v029 <dbl>, v030 <dbl>, v031 <dbl>, v032 <dbl>, v034 <dbl>, v040 <dbl>,
#   v042 <dbl>, v044 <dbl>, v045a <dbl>, v045b <dbl>, v045c <dbl>, v046 <dbl>,
#   v101 <dbl>, v102 <dbl>, v103 <dbl>, v104 <dbl>, v105 <dbl>, v105a <dbl>, …

This project is a survival analysis of the event of experiencing a second birth amongst a population of Ugandan women, stratified by if they have experienced IPV or not

Define your event variable:
1. birth
Define a duration or time variable
1. interval between births
Define a censoring indicator

-women that have not experienced their next (second) birth

#censoring indicator- women that are not at risk for a second birth

table(is.na(ugtotal$bidx_01))


FALSE  TRUE 
13745  4761

Estimate the survival function for your outcome and plot it

library(ggsurvfit)

fit <-survfit(Surv(sbir.cmc)~emoipv, data=sub)

fit %>%
  ggsurvfit()+
  labs(title = "Survival Function for Second Birth Interval,Uganda DHS 2016",
       y = "S(t)", 
       x= "Months")

This plot makes risk of a second birth seem similar amongst women that have experienced and that have not experienced emotional IPV to be at a similar risk for having a second birth. Next, I will check other types of IPV to see if I can find anything different.