1 M1: Study design

1.1 A study is only as good as the study design

You are the best statistician you know - Brian Healy

  1. Study Design
    • Experimental question- Define outcome, sources, analysis plan
      • What are we trying to learn? PFS/ OS/ Relapse
      • How to prove this
    • Sample Population- Sample size, types of sample
      • Who are we going to study Potential Question: Female->Male, Adult -> Kids
  2. Data Collection
    • What kind of data (DM,AE,Lab) Potential Question: Lab Batch/ Regions
  3. Analysis of data
    • Result- was it effective? Null Hypothesis
    • Conclusion? to Whom it will apply? - Significance of effect/generalizability
      • One sample tests
      • t-test, wilcoxon test
      • ANOVA
      • Linear Regression
      • Log Rank
      • CMH
      • Chi Square
      • Logistic

Population vs. Sample

Sample: a Random and representative subset of Population

-why: Cant take all the population Chance, Bias- MA vs entire country/ entire pop vs super fans -Sampling variability : variability from sample to sample -CI: incorporate the uncertainty in the estimated mean Goal: the sample is able to make an inference about the population

Description vs. Inference

  1. Describe the data that has been collected

  2. Describe statistical inference

Variable

A variable is something that measured in all of the people/ in our sample

Examples:

Continuous variables: Age

Categorical :

  • Binary : Sex, Event
  • Nominal : Disease Grade, Race
  • Ordinal Variable: Mild/Moderate/Severe: Expression PD-L1

Time to Event: [[Survival]] Time

Ways to express data

Distribution:

  • Numerical statistics: Describe data

    • Summary statistics: Location (min, max, mean, median, q1,q3)
    • Variability (Sd, variance)
    • Proportion
  • Graphics: Display data

    • Scatter Plot
    • Bar plot/box plot
    • histogram: symmetry/skewness a continuous variable break into bins
    • KM

Why we check the data distribution: to ensure the data quality ! - Height: meter, feet

Mean: arithmetic mean (age), geometric mean (pk concentration) \[ \bar x =\frac{\sum_{i=1}^n x_i} {n} \] \[ \bar{x}= \sqrt{x_1*x_2*..x_i} \]

Median: is the middle number or 50% percentile Interquartile range: 25-75

Comparison

  • Categorical variable: Contingency Table
  • Continuous variable: Correlation Correlation does not equal causation level of correlation matters in field (0.1 for social science, >0.9 physics, 0.5 clinical trails)

Basic Analysis Lookup Table

Outcome Variable Analysis
Continuous Binary T-test, Wilcoxon, Fisher exact
Continuous Continuous Correlation, Linear Reg
Binary Binary Chi-square, Logistic Regression
Binary Continuous Logistic Regression
Time to Event Binary Log-rank test

2 Ref:

Thanks 77 for sharing Brian Healy lecture notes