0.1 What is AACT?

The AACT (Aggregate Analysis of ClinicalTrials.gov) database is a publicly available relational database containing all information from ClinicalTrials.gov. It is maintained by the Clinical Trials Transformation Initiative (CTTI).

Website: https://aact.ctti-clinicaltrials.org

0.2 Data needed

  1. studies - Main trial information (one row per trial)
  2. sponsors - Who funded/ran the trial
  3. conditions - What diseases the trial studies
  4. facilities - Where the trial sites are located

1 Packages

library(RPostgres)  # Connects R to PostgreSQL AACT database

2 Connect to AACT Database

# AACT login credentials
username <- "USERNAME"
password <- "PASSWORD"

# Establish connection
con <- dbConnect(
  RPostgres::Postgres(),
  dbname = "aact",
  host = "aact-db.ctti-clinicaltrials.org",
  port = 5432,
  user = username,
  password = password
)

3 Download - 1. studies - Main trial information (one row per trial)

Contains one row per registered clinical trial. Filtered to:

query_studies <- "
SELECT 
    nct_id, 
    brief_title, 
    overall_status, 
    phase,
    enrollment, 
    start_date, 
    completion_date, 
    number_of_arms,
    study_type
FROM studies
WHERE 
    (phase = 'PHASE2' OR phase = 'PHASE1/PHASE2' OR phase = 'PHASE2/PHASE3')
    AND study_type = 'INTERVENTIONAL'
    AND start_date >= '2019-03-01'
    AND start_date <= '2024-12-31'
"

studies_raw <- dbGetQuery(con, query_studies)

4 Download - 2. sponsors - Who funded/ran the trial

Trials can have multiple sponsors, use lead sponsor only.

query_sponsors <- "
SELECT 
    nct_id,
    agency_class,
    lead_or_collaborator
FROM sponsors
WHERE lead_or_collaborator = 'lead'
"

sponsors_raw <- dbGetQuery(con, query_sponsors)

5 Download - 3. conditions - What diseases the trial studies

To link trial to condition (therapeutic area/disease).

query_conditions <- "
SELECT 
    nct_id,
    downcase_name
FROM conditions
"

conditions_raw <- dbGetQuery(con, query_conditions)

6 Download - 4. facilities - Where the trial sites are located

Use this to count how many sites each trial has.

query_facilities <- "
SELECT 
    nct_id,
    name,
    country
FROM facilities
"

facilities_raw <- dbGetQuery(con, query_facilities)

7 Disconnect from AACT and Save Datasets Locally

# Disconnect from database
dbDisconnect(con)

# Save each table as RData file
save(studies_raw, file = "aact_studies_raw.RData")
save(sponsors_raw, file = "aact_sponsors_raw.RData")
save(conditions_raw, file = "aact_conditions_raw.RData")
save(facilities_raw, file = "aact_facilities_raw.RData")

8 Datasets Summary

File Description Rows Downloaded - 30 Nov 2025
aact_studies_raw.RData Main trial information 25,419
aact_sponsors_raw.RData Lead sponsor for each trial 559,371
aact_conditions_raw.RData Conditions studied 990,427
aact_facilities_raw.RData Trial site locations 3,345,178