Prepare Data

The whole population data provided by Narjes is in a SQLite database accessible here:

library(dplyr)
library(DBI)
library(ggplot2)

con <- dbConnect(RSQLite::SQLite(),  'DATA/abm1.sqlite')

dbListTables(con)
## [1] "abm_m1"       "sqlite_stat1" "sqlite_stat4"

First, I’ll look at how some life-course parameters (stress, LQ) vary with age and whether that differs by model generations. To do this I aggregate the observations by Age, Race, and Generation and within each stratum calculate mean values of the two parameters.

a1 <- tbl(con, 'abm_m1') %>%
  group_by(Generation, Age, Race) %>%
  summarize(mean_stress = mean(AverageStressScore),
            mean_LQ = mean(LQ)) %>%
  collect() %>%
  mutate(GEN = factor(Generation))


head(a1)

Life course stress trajectories

Now I plot the relationship between agent age and each variable separately by generational status.

g <- ggplot(a1, aes(x=Age, y=mean_stress, group = GEN, 
                    color = GEN))
g + geom_smooth() + facet_grid(Race ~ .)

NOTE: For some reason agents age well past 100. In next iteration of model need to have ‘mortality’ function beginning about 35 with all agents out of pool by 50. For now I will truncate all graphs at Age = 50.

g <- ggplot(a1, aes(x=Age, y=mean_stress, group = GEN, 
                    color = GEN))
g + geom_smooth() +  facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(0, 50))

Interpretation: Generation 0 is most different, which makes sense. Other generations accumulate stress with age. The fact that both race groups have cumulative increasing stress scores with age doesn’t match empirical data. The cumulative stress should vary in the population with some accumulating negative values, others flat, and others positive. That empirical pattern might be apparent when stratifying by Social Class.

Now I will restrict to the ‘middle’ generations (generations 2:4) because they have most complete life-course data, and produce Race x Social Class summaries.

 a2 <- tbl(con, 'abm_m1') %>%
   filter(Generation %in% c(1,2,3,4)) %>%
   group_by(Race, SocialClass, Age) %>%
   summarize(mean_stress = mean(AverageStressScore),
             mean_LQ = mean(LQ)) %>%
  collect()

g <- ggplot(a2, aes(x=Age, y=mean_stress, group = SocialClass, 
                    color = SocialClass))
g + geom_smooth() +  facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(0, 50))

Interpretation: Although the class-stratified stress trajectories do not exactly match the theoretical and empirical data (would expect some to accumulate ‘protective’ negative scores) the pattern is proportionately accurate and thus probably accebtable for now. Another concern is the lack of difference by race. Need to reconsider the hypothesized mechanisms to see whether any changes needed to more closely align with empirical data.

Life course LQ trajectories

Now following same process with mean LQ. First examining by generation:

g <- ggplot(a1, aes(x=Age, y=mean_LQ, group = GEN, 
                    color = GEN))
g + geom_smooth() +  facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(0, 50))

Interpretation: There is more generational difference for LQ – this is likely because of the built-in trans-generational conveyance of LQ. The Generation-0 and Generation-5 are most discrepant, whereas 1:4 are relatively similar.
Questions:
* What is causing the non-linear pattern in Generation 4 for Whites? LQ should be set and constant after age 14. * Why is Generation 5 so different by Race?

g <- ggplot(a2, aes(x=Age, y=mean_LQ, group = SocialClass, 
                    color = SocialClass))
g + geom_smooth() +  facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(0, 50))

Birth Rates & Pregnancy Outcomes

Now I am interested in age-specific birth rate and the risk for Preterm Birth (PTB; <37 weeks) and Very Preterm Birth (VPTB; <32 weeks).

a3 <- tbl(con, 'abm_m1') %>%
  filter(Generation %in% c(1,2,3,4)) %>%
  mutate(Pregnant = ifelse(!is.na(GestationalAge), 1, 0),
         PTB = ifelse(GestationalAge < 37, 1, 0),
         VPTB = ifelse(GestationalAge < 32, 1, 0)) %>%
  group_by(Race, SocialClass, Age) %>%
  summarise(CBR = Pregnant / n(),
            PTBr = mean(PTB, na.rm = T),
            VPTBr = mean(VPTB, na.rm = T)) %>%
  collect()

First plotting Birth Rates by Race, Class, and Age…

g <- ggplot(a3, aes(x = Age, y = CBR, group = SocialClass, color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(15, 50))

Not sure why Low and High SES women have such similar CBR and Middle SES women are nearly zero. Something seems wrong there.

Now plotting Preterm and Very Preterm Birth Rates…

g <- ggplot(a3, aes(x = Age, y = PTBr, group = SocialClass, color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(15, 50))

g <- ggplot(a3, aes(x = Age, y = VPTBr, group = SocialClass, color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(15, 50))

Interpretation: Clearly I have miscalibrated the gestational age function. The overall prevalence of middle and low SES women is way too high – this is probably a function of the cumulative stress score. Similarly the prevalence for high SES women is unreasonably low. Finally the patterns for VPTB suggest that the gestational age function does not produce expected distributions.

a4 <- tbl(con, 'abm_m1') %>%
  filter(Generation %in% c(1,2,3,4)) %>%
  collect()

 hist(a4$GestationalAge)

This histogram looks relatively reasonable…the left tail not quite fat enough and the mode is a little low but actually not bad.

table(a4$Parity, a4$Race)
##    
##     Black White
##   0  1823  4354
##   1 24438 51752
##   2 22159 64097
##   3 12611 35667
##   4  5660 13199
##   5  2140  3560
##   6   636  2083
##   7    91     0
##   8    72     0
prop.table(table(a4$Parity, a4$Race), margin = 2)
##    
##           Black       White
##   0 0.026181244 0.024921013
##   1 0.350969410 0.296213197
##   2 0.318239265 0.366872338
##   3 0.181114462 0.204147397
##   4 0.081286802 0.075547186
##   5 0.030733879 0.020376391
##   6 0.009133994 0.011922478
##   7 0.001306908 0.000000000
##   8 0.001034037 0.000000000

Distribution of parity is similar by race.

Looking at alternate Gestational Age Functions

After playing with continuous Gestational Age for some time it seemed this was an unnecessary level of complexity. Instead of trying to model the continuous GA distribution in weeks, I am primarily interested in the risk of preterm birth. So re-parameterized the empirical model (Georgia data) as log-binomial and now reassigning births as ‘preterm’ (1/0) as a function of same input parameters.

# First create updated dataset with custom probability of PTB
a5 <- a4 %>%
  filter(GestationalAge > 0,
         Age < 44) %>%
  mutate(ModStress = AverageStressScore / 30,
         AGEc = Age - 26.8,
         logPTBr = -2.284 + AGEc* -0.001112 + 
           AGEc*AGEc* 0.0017724 +
           ifelse(Parity == 1, 0.121553, 
                  ifelse(Parity > 1, 0.0611974, 0)) +
           ifelse(PriorVeryPretermBirthCount > 0, 0.9778791, 0) +
           ModStress*AGEc*-0.0001271 +
           ModStress*AGEc*AGEc*-0.0007764 +
           ModStress* 0.2545653, #0.4545653
         PTBprob = exp(logPTBr),
         PTBprob = ifelse(PTBprob > 1, 1, PTBprob)) 

# rbinom() function seemed to fail inside dplyr()
a5$PTB <-unlist(lapply(a5$PTBprob, rbinom, n = 1, size = 1))

# Aggregate outcomes for visualization
a6 <- a5 %>%
  group_by(Race, SocialClass, Age) %>%
  summarise(BIRTHS = n(),
            PTBn = sum(PTB, na.rm = T),
            PTBr = mean(PTB, na.rm = T))

g <- ggplot(a6, aes(x = Age, y = PTBr, group = SocialClass, color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .) +
  scale_x_continuous(limits = c(15, 50))

This is not ‘perfect’ but is reasonable and probably sufficient for our purposes here. Specifically it captures class (and to a lesser extent race) differences, and generally has increasing risk with age. It is not quite the quadratic relationship but I think given focus on PTB as one of many inputs to child performance this adequately captures social stratification of in-utero exposures.