Second round of data prepared by Narjes was put in SQLite and grabbed it here:
library(dplyr)
library(DBI)
library(ggplot2)
con <- dbConnect(RSQLite::SQLite(), 'DATA/abm1.sqlite')
dbListTables(con)
## [1] "abm_m1" "abm_m2" "sqlite_stat1" "sqlite_stat4"
Data ‘a1’ is the generation-specific stress and LQ:
a1 <- tbl(con, 'abm_m2') %>%
group_by(Generation, Age, Race) %>%
summarize(mean_stress = mean(AverageStressScore),
mean_LQ = mean(LQ)) %>%
collect() %>%
mutate(GEN = factor(Generation))
head(a1)
Checking generational similarity of life course stress trajectories by race and assuring that age-limits more reasonable – they look to be:
g <- ggplot(a1, aes(x=Age, y=mean_stress, group = GEN,
color = GEN))
g + geom_smooth() + facet_grid(Race ~ .)
Now we are looking at Social Class x Generation to see if differences:
a1b <- tbl(con, 'abm_m2') %>%
group_by(Generation, Age, SocialClass) %>%
summarize(mean_stress = mean(AverageStressScore),
mean_LQ = mean(LQ)) %>%
collect() %>%
mutate(GEN = factor(Generation))
g <- ggplot(a1b, aes(x=Age, y=mean_stress, group = GEN,
color = GEN))
g + geom_smooth() + facet_grid(SocialClass ~ .)
Here I am limiting to middle generations (e.g. pooling generations 1-4) and looking at race x SES stratification
a2 <- tbl(con, 'abm_m2') %>%
filter(Generation %in% c(1,2,3,4)) %>%
group_by(Race, SocialClass, Age) %>%
summarize(mean_stress = mean(AverageStressScore),
mean_LQ = mean(LQ)) %>%
collect()
g <- ggplot(a2, aes(x=Age, y=mean_stress, group = SocialClass,
color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .)
Now following same process with mean LQ. First examining race x generation:
g <- ggplot(a1, aes(x=Age, y=mean_LQ, group = GEN,
color = GEN))
g + geom_smooth() + facet_grid(Race ~ .)
Now examining LQ for SocialClass x generations:
g <- ggplot(a1b, aes(x=Age, y=mean_LQ, group = GEN,
color = GEN))
g + geom_smooth() + facet_grid(SocialClass ~ .)
g <- ggplot(a2, aes(x=Age, y=mean_LQ, group = SocialClass,
color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .)
INTERPRETATION: The patterns of LQ across generations by RACE look weird – but they look reasonable by Social Class across generations, and they look reasonable when GEN 2-4 pooled. Perhaps this is something about changes in population composition – for instance if Black (or White) women are ‘dying’ and thus different numbers contributing to the mean?
Based on skype with Allen & Narjes (3/30/18), I learned that the variable ‘GestationalAge’ is coded like this:
* 36 = Preterm * 38 = Term * 100 = Not pregnant
a3 <- tbl(con, 'abm_m2') %>%
filter(Generation %in% c(1,2,3,4)) %>%
mutate(Pregnant = ifelse(is.na(GestationalAge), 0,
ifelse(GestationalAge == 100, 0, 1))) %>%
group_by(Age, Race, SocialClass) %>%
summarize(CBR = mean(Pregnant)) %>%
collect()
g <- ggplot(a3, aes(x = Age, y = CBR, group = SocialClass, color = SocialClass))
g + geom_smooth() + facet_grid(Race ~ .)
Why is the Crude Birth Rate rising with age when the age-specific birth rates from parameter table go up in 20’s but then decline? Here I look at a single agent, number ‘6883’ (randomly chosen) across her life course:
x <- tbl(con, 'abm_m2') %>%
filter(LifeTimeAgentID == 6883) %>%
select(LifeTimeAgentID, Age, GestationalAge, PregnancyOutcome, IPI, Parity) %>%
collect()
print(tbl_df(x), n = nrow(x))
## # A tibble: 71 x 6
## LifeTimeAgentID Age GestationalAge PregnancyOutcome IPI Parity
## <int> <int> <int> <chr> <chr> <int>
## 1 6883 1 NA <NA> <NA> NA
## 2 6883 2 NA <NA> <NA> NA
## 3 6883 3 NA <NA> <NA> NA
## 4 6883 4 NA <NA> <NA> NA
## 5 6883 5 NA <NA> <NA> NA
## 6 6883 6 NA <NA> <NA> NA
## 7 6883 7 NA <NA> <NA> NA
## 8 6883 8 NA <NA> <NA> NA
## 9 6883 9 NA <NA> <NA> NA
## 10 6883 10 NA <NA> <NA> NA
## 11 6883 11 NA <NA> <NA> NA
## 12 6883 12 NA <NA> <NA> NA
## 13 6883 13 NA <NA> <NA> NA
## 14 6883 14 NA <NA> <NA> NA
## 15 6883 15 NA <NA> <NA> NA
## 16 6883 16 NA <NA> <NA> NA
## 17 6883 17 NA <NA> <NA> NA
## 18 6883 18 NA <NA> <NA> NA
## 19 6883 19 NA <NA> <NA> NA
## 20 6883 20 NA <NA> <NA> NA
## 21 6883 21 NA <NA> <NA> NA
## 22 6883 22 NA <NA> <NA> NA
## 23 6883 23 NA <NA> <NA> NA
## 24 6883 24 NA <NA> <NA> NA
## 25 6883 25 NA <NA> <NA> NA
## 26 6883 26 NA <NA> <NA> NA
## 27 6883 27 NA <NA> <NA> NA
## 28 6883 28 NA <NA> <NA> NA
## 29 6883 29 38 LiveBirth NormalIPI 0
## 30 6883 30 38 LiveBirth NormalIPI 1
## 31 6883 31 38 LiveBirth NormalIPI 1
## 32 6883 32 38 LiveBirth NormalIPI 1
## 33 6883 33 38 LiveBirth NormalIPI 1
## 34 6883 34 38 LiveBirth NormalIPI 1
## 35 6883 35 38 LiveBirth NormalIPI 1
## 36 6883 36 38 LiveBirth NormalIPI 1
## 37 6883 37 38 LiveBirth NormalIPI 1
## 38 6883 38 38 LiveBirth NormalIPI 1
## 39 6883 39 38 LiveBirth NormalIPI 1
## 40 6883 40 38 LiveBirth NormalIPI 1
## 41 6883 41 38 LiveBirth NormalIPI 1
## 42 6883 42 38 LiveBirth NormalIPI 1
## 43 6883 43 38 LiveBirth NormalIPI 1
## 44 6883 44 38 LiveBirth NormalIPI 1
## 45 6883 45 38 LiveBirth NormalIPI 1
## 46 6883 46 38 LiveBirth NormalIPI 1
## 47 6883 47 38 LiveBirth NormalIPI 1
## 48 6883 48 38 LiveBirth NormalIPI 1
## 49 6883 49 38 LiveBirth NormalIPI 1
## 50 6883 50 38 LiveBirth NormalIPI 1
## 51 6883 51 38 LiveBirth NormalIPI 1
## 52 6883 52 38 LiveBirth NormalIPI 1
## 53 6883 53 38 LiveBirth NormalIPI 1
## 54 6883 54 38 LiveBirth NormalIPI 1
## 55 6883 55 38 LiveBirth NormalIPI 1
## 56 6883 56 38 LiveBirth NormalIPI 1
## 57 6883 57 38 LiveBirth NormalIPI 1
## 58 6883 58 38 LiveBirth NormalIPI 1
## 59 6883 59 38 LiveBirth NormalIPI 1
## 60 6883 60 38 LiveBirth NormalIPI 1
## 61 6883 61 38 LiveBirth NormalIPI 1
## 62 6883 62 38 LiveBirth NormalIPI 1
## 63 6883 63 38 LiveBirth NormalIPI 1
## 64 6883 64 38 LiveBirth NormalIPI 1
## 65 6883 65 38 LiveBirth NormalIPI 1
## 66 6883 66 38 LiveBirth NormalIPI 1
## 67 6883 67 38 LiveBirth NormalIPI 1
## 68 6883 68 38 LiveBirth NormalIPI 1
## 69 6883 69 38 LiveBirth NormalIPI 1
## 70 6883 70 38 LiveBirth NormalIPI 1
## 71 6883 71 38 LiveBirth NormalIPI 1
I notice that prior to Age = 29, the GestationalAge variable is NA (missing). That suggests that rather than having GestationalAge values of 36, 38, and 100, we actually have 4 unique values – this affects aggregation for calculating rates.
The second thing I notice is that the PregnancyOutcome becomes LiveBirth at age 29 (presumably her first pregnancy), but then she remains ‘pregnant’ for every subsequent year. This is what is causing the appearance of increasing birth rates with age…it looks like every agent gets pregnant every year!
My recommendation:
1. The GestationalAge and PregnancyOutcome variables should be initialized to a ‘not pregnant’ state (so there are no NA’s).
2. With each tick forward in time GestationalAge and PregnancyOutcome need to return to ‘non-pregnant’ and then apply the logic for deciding whether they become pregnant in that new year.