Transmission Chain Analysis

Author

Timothy Achala

1 Overview

The epicontacts package (RECON) is the primary R tool for handling, analysing, and visualising transmission chains and contact-tracing data. It combines two data structures:

  • Linelist — one row per case with clinical and demographic variables.
  • Contacts — directed edges recording who infected whom, plus edge-level attributes (location, duration).

2 Data Preparation

2.1 Linelist snapshot

First 8 rows of the case linelist
case_id generation date_onset date_outcome outcome gender age hospital
CASE_001 1 2014-04-08 2014-04-20 Recover f 51 Port Hospital
CASE_002 5 2014-04-08 2014-04-25 Death f 24 Military Hospital
CASE_003 5 2014-04-12 2014-04-22 Recover f 37 Other
CASE_004 3 2014-04-09 2014-04-25 Death m 41 Military Hospital
CASE_005 4 2014-04-19 2014-04-29 Recover f 38 Port Hospital
CASE_006 4 2014-04-22 2014-05-11 Death f 31 Port Hospital
CASE_007 3 2014-04-18 2014-05-01 Recover f 53 Other
CASE_008 4 2014-04-15 2014-05-04 Recover m 31 Central Hospital

2.2 Contacts snapshot

First 6 rows of the contacts (edges) table
infector case_id location duration
CASE_004 CASE_005 Community 10
CASE_004 CASE_006 Nosocomial 6
CASE_007 CASE_008 Community 3
CASE_009 CASE_011 Community 1
CASE_008 CASE_013 Community 8
CASE_012 CASE_014 Community 9

2.3 epicontacts object overview

epicontacts object overview
Metric Value
Unique IDs in linelist 200
Unique IDs in contacts 198
IDs present in both 198
Total transmission links 186
% contacts with both cases in linelist 100 %

3 Epidemic Curve

Weekly case counts by outcome. The curve shows exponential rise from April 2014, peaking June–July before declining — consistent with a controlled Ebola outbreak. Deaths (red) are distributed proportionally throughout.

Interpretation: The epidemic curve reveals a single-peaked outbreak with most cases in May–August 2014. The stable proportion of deaths throughout suggests no major shifts in case severity. The tail-off from September onwards is consistent with effective contact tracing and isolation.


4 Transmission Network Analysis

4.1 Network structure metrics

Transmission network metrics
Metric Value
Nodes (cases in network) 200.0000
Edges (transmission links) 186.0000
Network density 0.0047
Number of clusters/components 14.0000
Largest cluster size 60.0000
Max out-degree (super-spreader) 18.0000
Mean out-degree 0.9300
Median serial interval (days) 55.5000

4.2 Out-degree distribution

Out-degree distribution: how many secondary cases each infector generated. Highly right-skewed — most cases infected 0–2 others, while a small number of super-spreaders generated 5 or more.

Interpretation: The strongly right-skewed distribution means a small number of cases drive the majority of transmission — a pattern called heterogeneous transmission or the 20/80 rule. Identifying and isolating high-degree nodes is critical for outbreak control.

4.3 In-degree distribution

In-degree distribution. Almost all cases have one known infector, confirming single-source infection per case as expected in a directed transmission tree.

5 Serial Interval

The serial interval (SI) — time between infector and infectee symptom onset — is extracted with get_pairwise(), the EpiRHandbook-recommended approach.

Serial interval summary statistics (days)
Value
N pairs 186.0
Mean 74.1
Median 55.5
SD 63.0
Min 0.0
Q25 23.0
Q75 106.5
Max 260.0

Serial interval distribution. Right-skewed with a median near 9 days, consistent with published Ebola SI estimates of 5–12 days. The dashed red line marks the median.

Interpretation: A median SI of 55.5 days is consistent with Ebola’s known incubation period (2–21 days). The right tail reflects cases where intermediate transmissions may have been missed. SI estimates are essential for calculating the effective reproduction number (Rt).


6 Pairwise Analysis

6.1 Transmission by setting

Transmission links by exposure setting. Community transmission dominates (~65%), highlighting markets, funerals, and households as priority intervention targets. Nosocomial transmission (~35%) underscores the need for strict hospital IPC measures.

6.2 Pairwise age difference

Age difference between each infector–infectee pair (positive = infector older). A distribution centred near zero indicates transmission does not strongly prefer any age direction.

7 Cluster Analysis

7.1 Cluster size table

Distribution of transmission cluster sizes
Cluster size No. clusters Cases involved
1 (singleton) 2 2
2–4 1 4
5–9 6 37
≥ 10 5 157

Cluster size distribution (log-scale y-axis). Most clusters are small, but large clusters (>=10 cases) represent sustained transmission chains requiring intensive contact tracing resources.

8 Cases and CFR by Generation

Case counts (bars) and case fatality ratio (red line) by transmission generation. Generation 1 = index cases. CFR stability across generations suggests consistent access to care throughout the outbreak.

Interpretation: Case counts peak in mid-chain generations (2–4), reflecting geometric growth before interventions take hold. A rising CFR with generation may indicate that later cases have reduced access to care as the epidemic overwhelms health facilities.


9 Key Findings and Recommendations

Summary of key epidemiological findings and recommended actions
Finding Detail Recommended Action
Transmission generations 5 generations identified Map full chain to find undetected intermediate cases
Super-spreader events 7 cases infected ≥5 others Retrospective investigation of high-degree nodes’ exposure contexts
Dominant setting 64% community-based links Strengthen community isolation, safe burials, and contact tracing
Serial interval Median 55.5 days (IQR 23–106.5) Use SI estimate to parameterise Rt and forecast trajectory
Nosocomial burden 36% of links are nosocomial Enforce IPC protocols; screen and cohorte hospital staff
Large clusters 5 clusters of ≥10 cases Prioritise contact tracing resources to largest active clusters