Pitfalls of Hypothesis Testing

Regression - MSSC 5780

Navid Mohseni

Introduction

Hypothesis Testing in Regression

Hypothesis testing is a fundamental concept in statistics that allows us to draw conclusions about populations based on sample data.

Pitfalls

  • Statistical Significance Vs Practical Significance

    Just because a predictor is statistically significant does not mean it has a meaningful or practical impact on the response variable.

  • P-values Depend on Sample Size

    With a large enough sample size, even tiny effects can be detected, leading to statistically significant p-values. Conversely, with small samples, you might not find significant effects even if they exist.

  • Omitted Variable Bias

    If a significant predictor is left out of the model, the coefficients of the included predictors can be biased.

  • Multiple Comparisons Problem

    If you test many hypotheses simultaneously, the chance of finding at least one significant result just by chance increases.

  • Multicollinearity

    When predictors are highly correlated, it can inflate the variance of the coefficient estimates, making them unstable and hard to interpret. It can also lead to significant predictors being deemed non-significant.

Problem

Imagine a weight loss drug trial where two groups of participants are tested: one group receives the drug, and the other receives a placebo. After a month, the drug group loses an average of 0.5 pounds more than the placebo group. If the sample size is enormous, this difference might be statistically significant (p < 0.05). However, from a practical standpoint, a half-pound difference might not be deemed significant or worth the potential side effects and cost of the drug.

Large Sample Size

  1. Large Sample Size and t-tests: In the context of regression, the t-test is used to test if a particular coefficient is different from zero.

  2. Large Sample Size and F-tests: The F-test in the context of regression is used to test if at least one of the predictors has a non-zero coefficient.

  3. The test \(H_0: \beta_j = 0\) will always be rejected as long as the sample size is large enough, even \(x_j\) has a very small effect on \(y\).

Example

\[ y = 2 - 0.01x_1 + 0.01x_2 + 0.02 x_3 + \epsilon \]

Where

\[ \epsilon \sim N(0,1) \]

set.seed(123)

n <- 1000000 
sigma <- 1  

x1 <- rnorm(n, mean = 0, sd = 1)
x2 <- rnorm(n, mean = 0, sd = 1)
x3 <- rnorm(n, mean = 0, sd = 1)

epsilon <- rnorm(n, mean = 0, sd = sigma)
y <- 2 - 0.01*x1 + 0.01*x2 + 0.02*x3 + epsilon

model <- lm(y ~ x1 + x2 + x3)
summary(model)

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5738 -0.6755  0.0006  0.6744  4.7527 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  2.000175   0.001001 1998.919   <2e-16 ***
x1          -0.010795   0.001001  -10.787   <2e-16 ***
x2           0.008595   0.001001    8.588   <2e-16 ***
x3           0.019547   0.001001   19.537   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.001 on 999996 degrees of freedom
Multiple R-squared:  0.000572,  Adjusted R-squared:  0.000569 
F-statistic: 190.8 on 3 and 999996 DF,  p-value: < 2.2e-16
anova(model)
Analysis of Variance Table

Response: y
             Df  Sum Sq Mean Sq F value    Pr(>F)    
x1        1e+00     117  116.89 116.745 < 2.2e-16 ***
x2        1e+00      74   74.02  73.926 < 2.2e-16 ***
x3        1e+00     382  382.16 381.680 < 2.2e-16 ***
Residuals 1e+06 1001251    1.00                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What should we do?

Conclusion

  • Statistical Significance Vs Practical Significance

    1. Definition: Statistical significance refers to the likelihood that a relationship between two or more variables is caused by something other than random chance.

      Definition: Practical significance refers to the real-world importance or relevance of an observed effect. It’s about the magnitude and implications of a difference, not just whether it exists.

    2. P-value: In hypothesis testing, the p-value is a common measure used to determine statistical significance. If the p-value is less than a predetermined threshold (commonly 0.05), the result is deemed statistically significant.

      Effect Size: Based on the context. Always report and interpret the effect size alongside p-values. Effect sizes provide a measure of the magnitude of the relationship or difference, which can help in assessing practical significance.

  • Confidence Intervals

    Instead of just reporting p-values, provide confidence intervals for estimates. A confidence interval gives a range of plausible values for an estimate and can provide insight into the precision and practical significance of an effect

  • Bayesian Approach

    Bayesian statistics offers an alternative to the frequentist approach and comes with its own set of advantages

Thanks