You are the best statistician you know - Brian Healy
Population vs. Sample
Sample: a Random and representative subset of Population
-why: Cant take all the population Chance, Bias- MA vs entire country/ entire pop vs super fans -Sampling variability : variability from sample to sample -CI: incorporate the uncertainty in the estimated mean Goal: the sample is able to make an inference about the population
Description vs. Inference
Describe the data that has been collected
Describe statistical inference
Variable
A variable is something that measured in all of the people/ in our sample
Examples:
Continuous variables: Age
Categorical :
Time to Event: [[Survival]] Time
Ways to express data
Distribution:
Numerical statistics: Describe data
Graphics: Display data
Why we check the data distribution: to ensure the data quality ! - Height: meter, feet
Mean: arithmetic mean (age), geometric mean (pk concentration) \[ \bar x =\frac{\sum_{i=1}^n x_i} {n} \] \[ \bar{x}= \sqrt{x_1*x_2*..x_i} \]
Median: is the middle number or 50% percentile Interquartile range: 25-75
Comparison
Basic Analysis Lookup Table
| Outcome | Variable | Analysis |
|---|---|---|
| Continuous | Binary | T-test, Wilcoxon, Fisher exact |
| Continuous | Continuous | Correlation, Linear Reg |
| Binary | Binary | Chi-square, Logistic Regression |
| Binary | Continuous | Logistic Regression |
| Time to Event | Binary | Log-rank test |
Thanks 77 for sharing Brian Healy lecture notes