What is Statistical Inference?

  • We use samples to learn about populations
  • Main tools:
    • Point estimation
    • Confidence intervals
    • Hypothesis testing
  • Focus: Hypothesis Testing and p-value

Hypothesis Testing (Math)

\[ H_0: \text{Null hypothesis (no effect)} \] \[ H_1: \text{Alternative hypothesis (effect exists)} \]

Decision rule:

  • If p-value < α → Reject \(H_0\)
  • If p-value ≥ α → Fail to reject \(H_0\)

\[ \alpha = 0.05 \]

p-value (Math)

\[ p = P(\text{Data as extreme as observed} \mid H_0 \text{ is true}) \]

  • Small p-value → strong evidence against \(H_0\)
  • Large p-value → weak evidence against \(H_0\)

Example: Does Weight Affect MPG?

  • Estimated slope = -5.344
  • p-value = 1.29^{-10}

If p-value < 0.05 → Weight significantly affects MPG.

ggplot #1 — Scatter + Regression (Code shown)

ggplot(mtcars, aes(wt, mpg)) +
  geom_point(size=2) +
  geom_smooth(method="lm", se=TRUE) +
  labs(title="MPG vs Weight",
       x="Weight (1000 lbs)",
       y="Miles per gallon")

ggplot #2 — Residual Diagnostics

Confidence Interval (Math)

\[ \hat{\beta}_1 \pm t^* SE(\hat{\beta}_1) \]

95% CI for slope:

(-6.486, -4.203)

Plotly Interactive 3D

Final Takeaways

  • Hypothesis testing evaluates claims about parameters
  • p-value measures evidence against \(H_0\)
  • ggplot shows fit & diagnostics
  • plotly provides interactive 3D visualization