Intro

The purpose of this analysis is to evaluate the impact of coding decisions on the active status indicator for partnerships. This has implications for the for the definition of partnership type and duration, and estimates of respondent degree.

The problem

There are two key issues:

  1. Definition of casl (ptype 2) vs. inst (ptype 3) partners
    The survey asks respondents if they have only had sex with this partner one time. In a cross-sectional survey, a “yes” answer can mean two things: this is the first of what will be more in the future, so this is a casl partner, or this is the one and only act, so this is an inst partner. The respondent’s answer to the active status question may allow us to distinguish between these two things. Whether we decide to use or ignore active status, this determines the distribution of partnership types, and that, in turn, affects the degree distribution and the duration distribution of active partnerships, since inst partners are excluded from momentary degree counts and durations.

    • Current ARTnetData scripts code all one-time partners as “inst” partners, even if the respondent explicitly says they do expect to have sex with this person again. (script here)

  2. Treatment of missing (“DK”) responses to the active status question
    The survey asks respondents whether they expect to have sex with this partner again, and answer options include DK. The DK fraction is typically small for heterosexual partnerships, but large for MSM. The assignment of these responses will affect the estimates of degree, the question is by how much.

    • Current ARTnet scripts code DK as “not active” (script here).

The distributions of partner type, degree and duration are fundamental features of the sexual networks, the ERGM specifications, and the reachable path. So it is important to understand the implications that our coding decisions can have on these outcomes.

Both current ARTnet coding decisions result in the exclusion of first-time casl partners from the casl partner pool. This will lower the estimated mean degree and concurrency fraction, and raise the estimated mean age for active casl partnerships.

The coding alternatives

These two features, active status, and partner type, are interdependent, so there is a decision tree of options. The sequence is:

  • Do we use active=yes to allow one-time partners to be classified as casl instead of inst?

  • Do we impute the “DK” answers, using some rule? This will also reclassify some one-time partners as casl instead of inst.

Raw survey responses

This section presents descriptive stats on the raw survey responses to the “expect to have sex again” (active) and “had sex only once” (once) questions.

Active

The raw responses to the ongoing question, ONGOING.orig are shown below, broken down by the current ARTnet ptype variable (any non-main partnership reported as ONCE is coded as ptype=3).

Plot

Almost 20% of ptype 3 partners are reported as ongoing/active.

# have to construct the overall prev for each yr

activedf <- l %>%
  group_by(ptype, ONGOING.orig) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(activedf, 
       aes(x=factor(ONGOING.orig), 
           y=pct,
           group=factor(ptype), 
           fill=factor(ptype))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "Raw responses to active status by ARTnet ptype",
       x = "ONGOING.orig", y = "Percent",
       fill = "ptype")

Table

sjPlot::tab_xtab(var.row = l$ONGOING.orig, 
                 var.col = l$ptype, 
                 title = "Raw responses to ONGOING.orig x ptype",
                 show.row.prc = T,
                 show.summary = F)
Raw responses to ONGOING.orig x ptype
ONGOING.orig ptype Total
1 2 3
0 402
6.8 %
1084
18.4 %
4406
74.8 %
5892
100 %
1 2110
29.4 %
3659
50.9 %
1414
19.7 %
7183
100 %
88 103
3.4 %
1220
39.7 %
1749
56.9 %
3072
100 %
99 1
4.3 %
4
17.4 %
18
78.3 %
23
100 %
Total 2616
16.2 %
5967
36.9 %
7587
46.9 %
16170
100 %

One-time

The raw responses to the one-time question, ONCE are shown below, broken down by the raw response to the ONGOING.orig question.

Plot

About 20% of those who say the partnership is ongoing/active report having sex only once.

otdf <- l %>%
  mutate(ONCE = ifelse(ONCE==1, "yes", "no")) %>%
  group_by(ONGOING.orig, ONCE) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(otdf,
       aes(x=ONCE, 
           y=pct, 
           group=factor(ONGOING.orig), 
           fill=factor(ONGOING.orig))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "Raw responses to had sex only once by active status",
       x = "Once", y = "Percent",
       fill = "ONGOING.orig")

Table

sjPlot::tab_xtab(var.row = l$ONGOING.orig, 
                 var.col = l$ONCE, 
                 title = "Raw responses to ONCE x ONGOING.orig",
                 show.row.prc = T,
                 show.col.prc = T,
                 show.summary = F)
Raw responses to ONCE x ONGOING.orig
ONGOING.orig ONCE Total
1 2
0 4406
74.8 %
58.1 %
1486
25.2 %
17.3 %
5892
100 %
36.4 %
1 1414
19.7 %
18.6 %
5769
80.3 %
67.2 %
7183
100 %
44.4 %
88 1749
56.9 %
23.1 %
1323
43.1 %
15.4 %
3072
100 %
19 %
99 18
78.3 %
0.2 %
5
21.7 %
0.1 %
23
100 %
0.1 %
Total 7587
46.9 %
100 %
8583
53.1 %
100 %
16170
100 %
100 %

Alternate definitions: comparison

Active

We compare 2 alternatives:

  1. active.strict – the current ARTnet coding, where DK is coded 0.

  2. active.imp – imputes the DK responses by ptype: p=0.5 for ptype=2, p=0.1 for ptype=3. Note that ARTnet already recodes all “DK” to active for main partners, and we don’t change that.

Overall, the imputation results in about 10% of casl and inst partnerships having their status changed to active (see table).

Plot

The effect is larger for ARTnet defined casl partners: 25% of the non-active partners have their status changed to active.

activedf <- l %>%
  group_by(ptype, active.strict, active.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(activedf,
       aes(x=factor(active.strict), y=pct, 
           group=factor(active.imp), fill=factor(active.imp))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  facet_wrap(~ptype) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "active.strict vs. active.imp by ptype",
       x = "active.strict", y = "Percent",
       fill = "active.imp")

Table

sjPlot::tab_xtab(var.row = l$active.strict, 
                 var.col = l$active.imp, 
                 title = "Strict vs. imputed active status indicator",
                 show.row.prc = T,
                 show.summary = F)
Strict vs. imputed active status indicator
active.strict active.imp Total
0 1
0 8106
91 %
803
9 %
8909
100 %
1 0
0 %
7289
100 %
7289
100 %
Total 8106
50 %
8092
50 %
16198
100 %

Partner type

Here there are 3 possible definitions, depending on the use of the active status indicator for ptype classification, and which active status indicator is used.

  1. pype – the original ARTnet coding, where all non-main partners are classified exclusively by the ONCE variable.

  2. ptype.alt – non-main partners are classified by both ONCE and active.strict. One-time partners are classified as casl partners if the respondent said they expect to have sex with them again.

  • ptype==3 if ONCE=1 & active.strict=0
  • ptype==2 if ONCE=1 & active.strict=1.
  1. ptype.imp – uses active.imp in place of active.strict. Only affects the active=DK partners. Changes some of these casl partners to active, and reclassifies a small number of inst partners as active (and therefore) casl partners.

About 19% of inst partners are reclassified to casl if we use the active status indicator (ptype.alt). The additional impact of imputing active status when unknown is minimal.

Plot

ptype v. ptype.alt
partnerdf <- l %>%
  group_by(ptype, ptype.alt) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(partnerdf,
       aes(x=factor(ptype), y=pct, 
           group=factor(ptype.alt), fill=factor(ptype.alt))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "ptype vs. ptype.alt",
       x = "ptype", y = "Percent",
       fill = "ptype.alt")

ptype v. ptype.imp
partnerdf <- l %>%
  group_by(ptype, ptype.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(partnerdf,
       aes(x=factor(ptype), y=pct, 
           group=factor(ptype.imp), fill=factor(ptype.imp))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "ptype vs. ptype.imp",
       x = "ptype", y = "Percent",
       fill = "ptype.imp")

Tables

ptype v. ptype.alt
sjPlot::tab_xtab(var.row = l$ptype, 
                 var.col = l$ptype.alt, 
                 title = "Ptype: Ignoring vs. using active.strict",
                 show.row.prc = T,
                 show.summary = F)
Ptype: Ignoring vs. using active.strict
ptype ptype.alt Total
1 2 3
1 2618
100 %
0
0 %
0
0 %
2618
100 %
2 0
0 %
5978
100 %
0
0 %
5978
100 %
3 0
0 %
1414
18.6 %
6188
81.4 %
7602
100 %
Total 2618
16.2 %
7392
45.6 %
6188
38.2 %
16198
100 %
ptype v. ptype.imp
sjPlot::tab_xtab(var.row = l$ptype, 
                 var.col = l$ptype.imp, 
                 title = "Ptype: Ignoring vs. using active.imp",
                 show.row.prc = T,
                 show.summary = F)
Ptype: Ignoring vs. using active.imp
ptype ptype.imp Total
1 2 3
1 2618
100 %
0
0 %
0
0 %
2618
100 %
2 0
0 %
5978
100 %
0
0 %
5978
100 %
3 0
0 %
1597
21 %
6005
79 %
7602
100 %
Total 2618
16.2 %
7575
46.8 %
6005
37.1 %
16198
100 %

Impact on key measures: Comparison

Degree distributions

The active and once coding alternatives have effects on the deg.casl and deg.tot degree distributions. There is no effect on deg.main.

There are lots of plots here, and the impacts shown in the 1st plot on each tab are summarized in the Mean Degree and Concurrency sections. The key finding from the 2nd plot on each tab in this section is that:

  • Degree shifts are not always from deg to deg+1. We also see shifts from deg0 to deg2, deg2 to deg5, etc.

  • Again, the big impact is from using the active status indicator (rather than ONCE), not the imputation.

Plots

deg.casl v. deg.casl.alt

The difference here is using the strict active status indicator to classify one-time partners as casl. Respondent has answered “yes, i expect to have sex with this partner again”.

# by degree value
orig <- d %>%
  group_by(deg.casl) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "orig") %>%
  rename(deg = deg.casl) %>%
  select(-num)

alt <- d %>%
  group_by(deg.casl.alt) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "alt") %>%
  rename(deg = deg.casl.alt) %>%
  select(-num)

bind_rows (orig, alt) %>%
  mutate(measure = forcats::fct_relevel(measure, "orig")) %>%
  ggplot(aes(x=factor(deg), y=pct, 
           group=measure, fill=measure)) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.casl vs. deg.casl.alt",
       x = "degree", y = "Percent",
       fill = "measure")

# by original deg.casl
degdf <- d %>%
  group_by(deg.casl, deg.casl.alt) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(degdf,
       aes(x=factor(deg.casl), y=pct, 
           group=factor(deg.casl.alt), fill=factor(deg.casl.alt))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.casl vs. deg.casl.alt",
       x = "deg.casl", y = "Percent",
       fill = "deg.casl.alt")

deg.casl v. deg.casl.imp

The difference here is using the strict + imputed active status indicator to classify one-time partners as casl. Respondent has answered either “yes, i expect to have sex with this partner again”, or “DK”, and the “DK” have been imputed with p=0.5 or 0.1 as being active, for repeat and one-time partners respectively.

# by degree value

imp <- d %>%
  group_by(deg.casl.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "imp") %>%
  rename(deg = deg.casl.imp) %>%
  select(-num)

bind_rows (orig, imp) %>%
  mutate(measure = forcats::fct_relevel(measure, "orig")) %>%
  ggplot(aes(x=factor(deg), y=pct, 
           group=measure, fill=measure)) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.casl vs. deg.casl.imp",
       x = "degree", y = "Percent",
       fill = "measure")

# by original deg.casl
degdf <- d %>%
  group_by(deg.casl, deg.casl.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(degdf,
       aes(x=factor(deg.casl), y=pct, 
           group=factor(deg.casl.imp), fill=factor(deg.casl.imp))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.casl vs. deg.casl.imp",
       x = "deg.casl", y = "Percent",
       fill = "deg.casl.imp")

deg.tot v. deg.tot.alt

The difference here is using the strict active status indicator to classify one-time partners as casl. Respondent has answered “yes, i expect to have sex with this partner again”.

# by degree value
orig <- d %>%
  group_by(deg.tot) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "orig") %>%
  rename(deg = deg.tot) %>%
  select(-num)

alt <- d %>%
  group_by(deg.tot.alt) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "alt") %>%
  rename(deg = deg.tot.alt) %>%
  select(-num)

bind_rows (orig, alt) %>%
  mutate(measure = forcats::fct_relevel(measure, "orig")) %>%
  ggplot(aes(x=factor(deg), y=pct, 
           group=measure, fill=measure)) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.tot vs. deg.tot.alt",
       x = "degree", y = "Percent",
       fill = "measure")

# by original deg.tot
degdf <- d %>%
  group_by(deg.tot, deg.tot.alt) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(degdf,
       aes(x=factor(deg.tot), y=pct, 
           group=factor(deg.tot.alt), fill=factor(deg.tot.alt))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.tot vs. deg.tot.alt",
       x = "deg.tot", y = "Percent",
       fill = "deg.tot.alt")

deg.tot v. deg.tot.imp

The difference here is using the strict + imputed active status indicator to classify one-time partners as casl. Respondent has answered either “yes, i expect to have sex with this partner again”, or “DK”, and the “DK” have been imputed with p=0.5 or 0.1 as being active, for repeat and one-time partners respectively.

# by degree value

imp <- d %>%
  group_by(deg.tot.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num),
         measure = "imp") %>%
  rename(deg = deg.tot.imp) %>%
  select(-num)

bind_rows (orig, imp) %>%
  mutate(measure = forcats::fct_relevel(measure, "orig")) %>%
  ggplot(aes(x=factor(deg), y=pct, 
           group=measure, fill=measure)) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.tot vs. deg.tot.imp",
       x = "degree", y = "Percent",
       fill = "measure")

# by original deg.tot
degdf <- d %>%
  group_by(deg.tot, deg.tot.imp) %>%
  summarize(num = n()) %>%
  mutate(pct = num/sum(num))

ggplot(degdf,
       aes(x=factor(deg.tot), y=pct, 
           group=factor(deg.tot.imp), fill=factor(deg.tot.imp))) +
  geom_bar(stat="identity", position="dodge", alpha=0.7) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  labs(title = "deg.tot vs. deg.tot.imp",
       x = "deg.tot", y = "Percent",
       fill = "deg.tot.imp")

Mean degree

Again, only the mean degree for casual and total partnerships is affected.

orig <- d %>% 
  select(deg.casl, deg.tot) %>% 
  sjmisc::descr() 
alt <- d %>% 
  select(deg.casl.alt, deg.tot.alt) %>% 
  sjmisc::descr()
imp <- d %>% 
  select(deg.casl.imp, deg.tot.imp) %>% 
  sjmisc::descr()


degs <- data.frame((bind_rows(orig[1,], alt[1,], imp[1,],
                              orig[2,], alt[2,], imp[2,]))) %>% 
  select(-c("label", "iqr", "skew")) %>%
  mutate(var = c(rep("deg.casl",3), rep("deg.tot",3)),
         type = rep(c("orig", "alt", "imp"),2)
         ) %>%
  rename(status = var,
         measure = type,
         NA.pct = NA.prc,
         median = md,
         tr.mean = trimmed)

This just summarizes the changes in the degree distributions. Bottom line:

  • casl mean degree is 31% higher using the alt measure and 51% higher using the imp measure

  • total mean degree is 18% higher using the alt measure and 30% higher using the imp measure

degs %>%
  mutate(status = rep("", 6)) %>%
  kable(caption= "Mean degree estimate comparisons", 
        digits = c(0,0,2,2,2,2,2,1,2)) %>%
  pack_rows("deg.casl", 1,3) %>%
  pack_rows("deg.tot", 4,6) %>%
  kable_styling(full_width=F, position="center",
                bootstrap_options = c("striped"))
Mean degree estimate comparisons
status measure n NA.pct mean sd se median tr.mean range
deg.casl
orig 4904 0 0.59 1.01 0.01 0 0.36 5 (0-5)
alt 4904 0 0.77 1.17 0.02 0 0.52 5 (0-5)
imp 4904 0 0.89 1.23 0.02 0 0.65 5 (0-5)
deg.tot
orig 4904 0 1.00 1.08 0.02 1 0.81 5 (0-5)
alt 4904 0 1.19 1.21 0.02 1 0.99 5 (0-5)
imp 4904 0 1.30 1.27 0.02 1 1.11 5 (0-5)

Concurrency

Again, only influences concurrency fractions for casual and total partnerships.

orig <- d %>% 
  select(conc.casl, conc.tot) %>% 
  sjmisc::descr() 
alt <- d %>% 
  select(conc.casl.alt, conc.tot.alt) %>% 
  sjmisc::descr()
imp <- d %>% 
  select(conc.casl.imp, conc.tot.imp) %>% 
  sjmisc::descr()


conc <- data.frame((bind_rows(orig[1,], alt[1,], imp[1,],
                              orig[2,], alt[2,], imp[2,]))) %>% 
  select(-c("label", "iqr", "skew")) %>%
  mutate(var = c(rep("conc.casl",3), rep("conc.tot",3)),
         type = rep(c("orig", "alt", "imp"),2)
         ) %>%
  rename(status = var,
         measure = type,
         NA.pct = NA.prc,
         median = md,
         tr.mean = trimmed)

This also just summarizes the changes in the degree distributions. Bottom line:

  • casl concurrency is 41% higher using the alt measure and 64% higher using the imp measure

  • total concurrency is 28% higher using the alt measure and 47% higher using the imp measure

conc %>%
  mutate(status = rep("", 6)) %>%
  kable(caption= "Concurrency estimate comparisons", 
        digits = c(0,0,2,2,2,2,2,1,2)) %>%
  pack_rows("conc.casl", 1,3) %>%
  pack_rows("conc.tot", 4,6) %>%
  kable_styling(full_width=F, position="center",
                bootstrap_options = c("striped"))
Concurrency estimate comparisons
status measure n NA.pct mean sd se median tr.mean range
conc.casl
orig 4904 0 0.15 0.36 0.01 0 0.06 1 (0-1)
alt 4904 0 0.21 0.41 0.01 0 0.14 1 (0-1)
imp 4904 0 0.24 0.43 0.01 0 0.18 1 (0-1)
conc.tot
orig 4904 0 0.21 0.41 0.01 0 0.13 1 (0-1)
alt 4904 0 0.27 0.44 0.01 0 0.21 1 (0-1)
imp 4904 0 0.30 0.46 0.01 0 0.26 1 (0-1)

Duration distn for casl partnerships

For this, we need to contstruct the duration variable for the one-time partners classified as active (either strict, or imputed), because the ARTnet scripts set duration to missing for all one-time partners. The raw variable needed is OFENDMM, as all these cases start and end in the survey year. Units for duration is weeks. This is now done in the make_artnet_data script.

orig <- l %>% 
  filter(ptype==2) %>% select(duration, active.strict) %>% 
  group_by(active.strict) %>%
  sjmisc::descr() 
alt <- l %>% 
  filter(ptype.alt==2) %>% select(duration.alt, active.imp) %>% 
  group_by(active.imp) %>%
  sjmisc::descr()
imp <- l %>% 
  filter(ptype.imp==2) %>% select(duration.imp, active.imp) %>% 
  group_by(active.imp) %>%
  sjmisc::descr()

# NB: alt and imp are the same for ended partnerships
durs <- data.frame((bind_rows(orig[2], alt[2], imp[2],
                              orig[1], alt[1]))) %>% 
  select(-c("label", "iqr", "skew")) %>%
  mutate(var = c(rep("active",3), rep("ended",2)),
         type = c("orig", "alt", "imp", "orig", "alt")
         ) %>%
  rename(status = var,
         measure = type)

Table

The effect of most interest is the impact on the estimated mean age active partnerships:

  • mean age of active partnerships is 28% lower using the alt measure and 30% lower using the imp measure

There is less effect on the estimated duration of ended partnerships.

durs %>%
  mutate(status = rep("", nrow(durs))) %>%
  kable(caption= "Duration estimate comparisons, by active status", 
        digits = c(0,0,2,1,1,1,1,1,0)) %>% 
  pack_rows(durs$status[1], 1,3) %>%
  pack_rows(durs$status[4], 4,nrow(durs)) %>%
  kable_styling(full_width=F, position="center",
                bootstrap_options = c("striped")) %>%
  add_footnote(label="NA.prc range is 0-100%", notation="symbol")
Duration estimate comparisons, by active status
status measure n NA.prc mean sd se md trimmed range
active
orig 3642 0.5 132.8 196.1 3.2 70.6 89 1946 (0.14-1946.14)
alt 5669 0.4 96.0 171.3 2.3 31.9 57 1946.02 (0.13-1946.14)
imp 5852 0.4 93.4 169.2 2.2 30.3 54 1946.02 (0.13-1946.14)
ended
orig 2303 0.7 70.7 139.4 2.9 21.9 41 2143.69 (0.02-2143.71)
alt 1688 0.6 68.4 143.0 3.5 21.6 38 2143.69 (0.02-2143.71)
* NA.prc range is 0-100%

Plots

All
colors <- c("orig"="cadetblue", "alt"="goldenrod", "imp" = "purple")

ggplot() +
  geom_density(data=l %>% filter(ptype.imp==2),
               aes(x=duration.imp, fill="imp")) +
  geom_density(data=l %>% filter(ptype.alt==2),
               aes(x=duration.alt, fill="alt"), show.legend = FALSE) +
  geom_density(data=l %>% filter(ptype==2),
               aes(x=duration, fill="orig"), show.legend = FALSE) +
  xlim(0,200) +
  labs(title = "Duration distribution comparisons: All partnerships",
       caption = "ARTnet Data",
       x = "Weeks",
       fill = "Legend") +
  scale_fill_manual(breaks=c("orig","alt","imp"),
                    values = adjustcolor(colors, alpha.f = 0.3)) +
  guides(fill = guide_legend(override.aes = list(alpha = 0.3)))

Active
ggplot() +
  geom_density(data=l %>% filter(ptype.imp==2 & active.imp==1),
               aes(x=duration.imp, fill="imp")) +
  geom_density(data=l %>% filter(ptype.alt==2 & active.imp==1),
               aes(x=duration.alt, fill="alt"), show.legend = FALSE) +
  geom_density(data=l %>% filter(ptype==2 & active==1),
               aes(x=duration, fill="orig"), show.legend = FALSE) +
  xlim(0,200) +
  labs(title = "Duration distribution comparisons: Active partnerships",
       caption = "ARTnet Data",
       x = "Weeks",
       fill = "Legend") +
  scale_fill_manual(breaks=c("orig","alt","imp"),
                    values = adjustcolor(colors, alpha.f = 0.3)) +
  guides(fill = guide_legend(override.aes = list(alpha = 0.3)))

Ended
ggplot() +
  geom_density(data=l %>% filter(ptype.imp==2 & active.imp==0),
               aes(x=duration.imp, fill="imp")) +
  geom_density(data=l %>% filter(ptype.alt==2 & active.imp==0),
               aes(x=duration.alt, fill="alt"), show.legend = FALSE) +
  geom_density(data=l %>% filter(ptype==2 & active==0),
               aes(x=duration, fill="orig"), show.legend = FALSE) +
  xlim(0,200) +
  labs(title = "Duration distribution comparisons: Ended partnerships",
       caption = "ARTnet Data",
       x = "Weeks",
       fill = "Legend") +
  scale_fill_manual(breaks=c("orig","alt","imp"),
                    values = adjustcolor(colors, alpha.f = 0.3)) +
  guides(fill = guide_legend(override.aes = list(alpha = 0.3)))