Designing Population Health Studies

March 31 and April 2 2025
Eric Delmelle

Chapter Overview

Research design fundamentals in epidemiology and public health
Measurement validity and reliability
Error and bias in population health research
Study design approaches:
- Cross-sectional
- Case-control
- Cohort
- Experimental
Qualitative methods and their integration with quantitative approaches
Ethical considerations in population health research

1 A Matter of Measurement

Primary vs. Secondary Data

Primary data:
- Data collected specifically for the purpose of the study.

Secondary data:
- Data collected for other purposes but reorganized and reanalyzed.

Examples:
- Health insurance claims data
- Employment records
- National health surveys

2 Primary vs. Secondary Data

Primary data
- Collected specifically for a new study
- Controlled by the researcher
- Tailored to answer specific research questions
Secondary data
- Pre-existing data collected for another purpose
- Often large-scale and readily available
- May require cleaning or transformation

While primary data offers control and precision, secondary data can save time and resources.

3 Levels of Measurement

Level	Description	Example	Operations
Nominal	Categories with no ranking	Blood type, Sex	Equality/inequality
Ordinal	Ordered categories	Health self-rating: excellent, good, fair, poor	Greater than/less than
Interval	Equal units, no true zero	Temperature in °C	Addition/subtraction
Ratio	Equal units with true zero	Weight, Blood pressure	Multiplication/division

Understanding measurement levels is crucial for selecting appropriate statistical analyses. A variable can always be reduced to a lower level of measurement (continuous to categorical), but not elevated (categorical to continuous).

4 Ecological Studies and Fallacy

Unit of analysis: Group (e.g., city-level data)
Examples:
- Community fluoride levels and dental caries
- Countries’ smoking rates and lung cancer rates
Ecological fallacy: Attributing group-level associations to individuals
Example:
- Classrooms with more women had higher average grades
- But individual-level analysis showed men had higher grades in each classroom

Classroom A	Classroom B	Classroom C
F 70	F 65	F 65
F 70	F 70	F 70
F 70	F 70	F 80
F 75	F 75	F 80
F 70	F 80	M 70
F 80	F 85	M 75
F 80	M 80	M 75
F 80	M 80	M 80
M 95	M 85	M 85
M 100	M 90	M 90
Class Mean
F 74, M 98, FM: 79	F 74, M 84, FM: 78	F 74, M 79, FM: 77

5 Variables and Levels of Measurement

Categorical variables:
- Dichotomous (e.g., male/female)
- Polytomous (e.g., blood type)
- Nominal (no implied order)
- Ordinal (ranked, e.g., “good” > “fair”)
Continuous variables:
- Interval scale (e.g., temperature in Celsius)
- Ratio scale (e.g., body weight, height)

Note: Continuous variables can be converted to categorical, but not vice versa.

6 Types of Research Design

Key concepts to distinguish studies:

Purpose: Descriptive vs. analytical
Investigator control: Observational vs. interventional
Directionality: Forward vs. backward
Sample selection: Based on exposure, disease, or neither
Timing: Prospective vs. retrospective

Study Types: - Cross-sectional - Case-control - Retrospective cohort - Prospective cohort - Randomized controlled trial (RCT)

7 Basic Terminology: Exposure and Disease

E = “Exposure”
- Risk factor (e.g., smoking, occupational hazard)
- Intervention (e.g., drug, prevention program)
D = “Disease” or outcome
- Disease, injury, death
- Any health-related outcome
E₀ and D₀ = Absence of exposure/disease
E₁ and D₁ = Presence of exposure/disease

8 Study Designs Summary

Study Type	Purpose	Control	Directionality	Sample Selection	Timing
Cross-sectional	Descriptive/Analytical	Observational	Concurrent	Representative sample	Retrospective
Case-control	Analytical	Observational	Backward	Based on disease	Retrospective
Retrospective cohort	Analytical	Observational	Forward	Based on exposure	Retrospective
Prospective cohort	Analytical	Observational	Forward	Based on exposure	Prospective
Randomized control trial	Analytical	Interventional	Forward	Based on exposure	Prospective

9 Cross-Sectional Studies

Also called prevalence studies
Exposure and outcome assessed simultaneously
Can be descriptive or analytical
Provides a “snapshot” of a population
Relatively quick and inexpensive

Limitations:

Cannot establish temporal sequence
Only includes survivors of disease
Not suitable for rare diseases

A snapshot in time: both exposure and outcome measured simultaneously

10 In-class exercise

11 Case-Control Analysis

Cannot directly compute relative risk
Use odds ratio (OR) as an estimate:

\[OR = \frac{ad}{bc}\]

In a 2×2 table:

\[ \begin{array}{|c|c|c|} \hline & D_1 & D_0 \\ \hline E_1 & a & b \\ \hline E_0 & c & d \\ \hline \end{array} \]

When disease is rare, OR ≈ RR

What is happening here?

12 Cohort Studies

Start with exposure status (E₁ and E₀)
Follow forward to observe outcome
Two types:
- Prospective: Start now, follow into future
- Retrospective: Look back at historical exposure
Can study multiple outcomes
Directly computes incidence and relative risk

Following groups forward from exposure to outcome

13 Cohort Analysis

Relative Risk (RR) quantifies association between exposure and outcome:

\[RR = \frac{a/(a+b)}{c/(c+d)}\]

In a 2×2 table:

\[ \begin{array}{|c|c|c|} \hline & D_1 & D_0 \\ \hline E_1 & a & b \\ \hline E_0 & c & d \\ \hline \end{array} \]

Allows for:
- Direct incidence calculation
- Assessment of multiple outcomes
- Use with rare exposures

Challenges:

Time-consuming and costly
Potential for loss to follow-up
Not ideal for rare diseases
Diagnostic changes over time may affect results

14 Randomized Controlled Trials (RCTs)

Gold standard for assessing causal relationships
Participants randomly allocated to intervention (E₁) or control (E₀)
Minimizes confounding and bias through:
- Randomization
- Blinding (single, double)
Can be conducted at individual or group level

Phases of clinical trials:

Phase I: Safety and dosage (small sample)
Phase II: Effectiveness and side effects
Phase III: Confirm effectiveness, monitor adverse reactions (RCT)
Phase IV: Post-marketing surveillance

Randomization helps balance known and unknown confounders.

15 Validity and Reliability

Measurement Validity

Face validity: Appears reasonable on the surface
Content validity: Covers full scope of concept
Construct validity: Reflects theoretical concept
Criterion validity: Correlates with external standard
Concurrent validity: Correlates with present condition
Predictive validity: Forecasts future outcome

Study Validity

Internal validity: Results valid for study sample
External validity: Results generalize to other populations

Reliability

Test-retest: Same test, different times
Inter-observer: Different observers agree
Intra-observer: Same observer consistent over time

16 Reliability vs. Validity

Reliability = consistency of measurement
Validity = accuracy of what is intended to be measured
A tool can be reliable but not valid
A tool cannot be valid if it is not reliable

Target analogy for validity and reliability

17 Types of Error

Random Error

Caused by chance or sampling variation
Affects precision
Can be reduced by increasing sample size
Produces unpredictable fluctuations

Systematic Error (Bias)

Consistent, repeatable error due to flaws in design or measurement
Affects validity
Not reduced by increasing sample size
Must be addressed in study design

Systematic differences between those selected and not selected
Examples:
- Low response rate
Healthy worker effect
Volunteer bias
Berkson’s bias (hospital sampling)
- Loss to follow-up
- Survivor bias

Measurement error or misclassification
Examples:
- Recall bias
- Observer/interviewer bias
- Social desirability bias
- Instrument bias
- Diagnostic suspicion bias

Third variable distorts exposure-outcome relationship
Must be:
1. Associated with exposure
2. Risk factor for the outcome
3. Not an intermediate step in the causal path

19 Controlling for Confounding

At Design Stage

Randomization
- Evenly distributes confounders across groups
Restriction
- Limit study to specific subgroup
Matching
- Pair participants with similar characteristics

At Analysis Stage

Stratification
- Analyze within homogeneous strata
- Mantel-Haenszel summary estimate
Multivariable Modeling
- Include confounders as covariates
- Logistic regression, Cox models

20 Effect Modification (Interaction)

Occurs when the effect of exposure differs across levels of a third variable
Not the same as confounding
Can be additive or multiplicative
Additive Model:
\[RREM = RRE + RRM - 1\]
Multiplicative Model:
\[RREM = RRE \times RRM\]
Interpretation:
- If RREM > expected → synergism
- If RREM < expected → antagonism

Example: Asbestos and smoking on lung cancer risk Interaction models

21 Qualitative Methods

Originates from the social sciences
Explores perceptions, beliefs, experiences
Often used to:
- Understand lived experiences
- Explore context and meaning
- Inform survey and tool development
Common techniques:
- In-depth interviews
- Focus groups
- Participant observation
- Document analysis

Mixed methods combine qualitative insight with quantitative data.

22 Types of Qualitative Methods

Observation
Interviews
Document Analysis

Participant observation: Researcher immersed in environment
Captures real behavior and interactions
Varies in degree of participation

Structured: Same questions for all
Semi-structured: Flexible, guided
In-depth: Deep exploration
Focus groups: Group discussion dynamics

Systematic review of written or visual materials
Examples:
- Public records, policy reports
- Personal journals, media, videos

23 Integrating Qualitative & Quantitative Methods

Ways to integrate qualitative methods:

Pre-study: To develop hypotheses or instruments
During study: To explain unexpected results
Post-study: To interpret and validate findings
Standalone: As an alternative or complement to quantitative

Example: Regional Health Needs Assessment

Quantitative: Epidemiologic data, service access
Qualitative: Focus groups with residents, interviews with key informants

Combining methods gives depth and context.

24 Ethical Considerations

Four Ethical Principles:

Autonomy – Respect for individuals
Beneficence – Maximize benefits
Non-maleficence – Do no harm
Justice – Fair treatment and burden distribution

Research Protections:

Informed consent
Confidentiality
Data security
Vulnerable populations
Institutional Review Boards (IRBs)
Research integrity and transparency

25 Key Takeaways

Choose the study design that best answers your research question
Recognize and control for bias and confounding
Understand the role of validity and reliability
Combine methods when appropriate
Always prioritize ethics and participant rights

“No method is inherently superior—it all depends on the research question.” – T. Kue Young