Section 2.3 Notes

Notes for Section 2.3: The ADaM Data Structures

2.3.1 The ADaM Subject-Level Analysis Dataset (ADSL)

It should be noted that although the ADSL contains subject-level variables that are also important in other datasets, there is no requirement that every ADSL variable be present in other analysis datasets.

However, at a minimum, any ADSL variable needed to enable analysis (e.g., statistical model covariates, population flags, subgrouping variables) should appear in the analysis dataset.

  1. Purpose of ADSL:
    • ADSL is a subject-level dataset that contains one record per subject, regardless of the clinical trial design.
    • It includes key subject-level variables such as population flags, treatment variables, demographic information, stratification factors, and important dates.
    • ADSL is essential for merging with other datasets (both ADaM and SDTM) and serves as a source for subject-level variables used in other ADaM datasets.
  2. Content of ADSL:
    • Required Variables: ADSL must include certain required variables as specified in the ADaMIG.
    • Optional Variables: Other subject-level variables that describe a subject’s experience in the trial can also be included.
    • Traceability: ADSL variables should be consistent across datasets, and any variable present in both ADSL and other ADaM datasets must have the same values, type, and label.
  3. Key Points:
    • ADSL is not intended to store all data values from a study. It focuses on key facts that are analysis-enabling or facilitate the interpretation of analysis.
    • ADSL is required in a CDISC-based submission, even if no other ADaM datasets are submitted.
    • The FDA’s Study Data Technical Conformance Guide (sdTCG) requests that “core” subject-level variables be present in all analysis datasets.
  4. Examples of ADSL Variables:
    • Population Flags: Variables like FASFL (Full Analysis Set Population Flag) indicate whether a subject is included in a specific population.
    • Treatment Variables: Variables like TRT01P (Planned Treatment for Period 01) describe the treatment assigned to the subject.
    • Demographic Variables: Variables like AGE, SEX, and RACE provide demographic information about the subject.
  5. Traceability and Metadata:
    • ADSL and its related metadata are required for CDISC-based submissions.
    • The metadata should clearly describe the derivation and source of each variable to ensure traceability.

Section 2.3.1: ADSL Structure diagram

Explanation of the Diagram:

  • ADSL is the central dataset that contains one record per subject.
  • It includes various types of variables such as Population Flags, Treatment Variables, Demographic Information, Stratification Factors, and Important Dates.
  • Each category of variables is further broken down into specific examples, such as FASFL for population flags, TRT01P for treatment variables, and AGE, SEX, and RACE for demographic information.
  • The diagram illustrates the hierarchical structure of ADSL and how it organizes key subject-level data for analysis and traceability.

flowchart LR
    A[ADSL - Subject-Level Analysis Dataset] --> B[1 Record per Subject]
    B --> C[Population Flags]
    B --> D[Treatment Variables]
    B --> E[Demographic Information]
    B --> F[Stratification Factors]
    B --> G[Important Dates]
    C --> H[FASFL - Full Analysis Set Population Flag]
    D --> I[TRT01P - Planned Treatment for Period 01]
    E --> J[AGE - Age]
    E --> K[SEX - Sex]
    E --> L[RACE - Race]
    F --> M[STRATAR - Stratification Factor]
    G --> N[TRTSDT - Date of First Exposure to Treatment]
    G --> O[TRTEDT - Date of Last Exposure to Treatment]


2.3.2 The ADaM Basic Data Structure (BDS)

A BDS dataset contains 1 or more records per subject, per analysis parameter, per analysis timepoint. Analysis timepoint is conditionally required, depending on the analysis. In situations where there is no analysis timepoint, the structure is one or more records per subject per analysis parameter.

Notes on ADaM Basic Data Structure (BDS)

  1. Structure Overview:
    • The BDS dataset is designed to support statistical analysis in clinical trials.
    • It contains one or more records per subject, per analysis parameter, and per analysis timepoint (if applicable).
    • The analysis timepoint is conditionally required and can be represented by variables like AVISIT, ATPT, or other timing variables.
    • If no analysis timepoint is needed, the structure simplifies to one or more records per subject per analysis parameter.
  2. Key Variables:
    • Core Variables:
      • AVAL: The value being analyzed (e.g., a lab result, score, or derived value).
      • PARAM: Describes the analysis parameter (e.g., “Hemoglobin Level”).
    • Additional Variables:
      • BASETYPE: Used when there are multiple baseline definitions for a single analysis parameter.
      • DTYPE: Describes the derivation type (e.g., imputation, calculation).
      • Subject identification and treatment variables are also included to support analysis.
  3. Relationship with ADSL:
    • The ADSL dataset provides subject-level variables that may be used in BDS datasets.
    • However, not all ADSL variables need to be included in the BDS dataset. Only those relevant to the analysis should be added.
  4. Flexibility and Traceability:
    • BDS datasets are highly flexible, allowing the addition of rows and columns to support specific analyses and ensure traceability.
    • Derived columns or rows can be added following the rules outlined in Section 4.2 of the ADaM Implementation Guide.
  5. Comparison with SDTM:
    • BDS datasets do not have direct counterparts in SDTM.
    • While BDS datasets may resemble SDTM Findings class datasets due to their vertical structure, they are more robust and flexible for statistical analysis.
    • BDS datasets can be derived from multiple SDTM domains (Findings, Events, Interventions, Special-purpose) or other ADaM datasets.
  6. Record Types:
    • A record in a BDS dataset can represent:
      • Observed values (e.g., lab results).
      • Derived values (e.g., time-to-event calculations).
      • Imputed values (e.g., missing data imputation).
    • Derived values can be highly complex, such as tumor growth rates calculated from regression models.
  7. Dataset Design:
    • A single study may require multiple BDS datasets, each tailored to specific analyses.
    • The number of BDS datasets should be optimized to avoid overloading a single dataset with unnecessary variables or records.
  8. Best Practices:
    • Avoid forcing all data into a single BDS dataset. Instead, design multiple datasets to support different analyses.
    • Ensure traceability by documenting the derivation of all variables and records.
    • Follow the ADaM Model document for guidance on dataset design and implementation.
  9. Reference:
These notes summarize the key aspects of the ADaM Basic Data Structure (BDS) and its role in clinical trial data analysis. The BDS is a critical component of the ADaM standard, enabling flexible and traceable analysis datasets.
Below is a Diagram that visually represents the structure and relationships of the ADaM Basic Data Structure (BDS) as described in your notes:
The diagram illustrates the hierarchical structure of ADSL and how it organizes key subject-level data for analysis and traceability.

flowchart LR
    A[BDS Dataset] -->|Contains| B[Records]
    B -->|Per Subject| C[Subject ID]
    B -->|Per Analysis Parameter| D[PARAM]
    B -->|Conditionally Required| E[Analysis Timepoint]
    E -->|Variables| F[AVISIT, ATPT, etc.]
    B -->|Additional Variables| G[BASETYPE, DTYPE, etc.]
    A -->|Derived From| H[SDTM Domains]
    H -->|Findings| I[LB, EG, etc.]
    H -->|Events| J[AE, CM, etc.]
    H -->|Interventions| K[EX, SU, etc.]
    H -->|Special-purpose| L[DM, DS, etc.]
    A -->|Supports| M[Statistical Analysis]
    M -->|Includes| N[Observed Values]
    M -->|Includes| O[Derived Values]
    M -->|Includes| P[Imputed Values]
    A -->|Relationship With| Q[ADSL Dataset]
    Q -->|Provides| R[Subject-Level Variables]
    Q -.->|Not All Variables Included| A
    A -->|Flexibility| S[Add Rows/Columns]
    S -->|Traceability| T[Derived Columns/Rows]
    A -->|Optimization| U[Multiple BDS Datasets]
    U -->|Avoid Overloading| V[Single Dataset]

Explanation of the Diagram:

  1. BDS Dataset: The central entity containing records.
  2. Records: Each record is structured per subject, per analysis parameter, and optionally per analysis timepoint.
  3. Analysis Timepoint: Conditionally required and represented by variables like AVISIT or ATPT.
  4. Additional Variables: Includes BASETYPE, DTYPE, and others to describe the structure and derivation of the data.
  5. Derived From: BDS datasets can be derived from multiple SDTM domains (Findings, Events, Interventions, Special-purpose).
  6. Supports Statistical Analysis: BDS datasets are designed to support analysis, including observed, derived, and imputed values.
  7. Relationship with ADSL: ADSL provides subject-level variables, but not all ADSL variables are included in BDS.
  8. Flexibility: BDS allows adding rows and columns for traceability and analysis support.
  9. Optimization: Multiple BDS datasets should be designed to avoid overloading a single dataset.

This diagram can be rendered using any Mermaid-compatible tool (e.g., Mermaid Live Editor, Markdown viewers with Mermaid support). Let me know if you need further adjustments!