I show two examples where the p-value < 0.05 threshold (statistical significance) leads to poor conclusions.
set.seed(123)
nobs <- 1000000 # as sample size increases, p-values fall, even if the signal is weak
x <- rnorm(nobs)
y <- 50 + 2*x + rnorm(nobs,0, 100)
df <- data.frame(x=x, y=y)
lm.mod <- lm(y ~ x, data=df)
summary(lm.mod)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -458.92 -67.44 0.03 67.39 455.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.84441 0.09998 498.57 <2e-16 ***
## x 1.91856 0.09998 19.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 99.98 on 999998 degrees of freedom
## Multiple R-squared: 0.0003681, Adjusted R-squared: 0.0003671
## F-statistic: 368.2 on 1 and 999998 DF, p-value: < 2.2e-16
The coefficient for x is statistically significant. Practically, this doesn’t mean much. The \(R^2\) is less than 0.1%.
set.seed(123)
nobs <- 20 # as sample size decreases, p-values increase, even if the signal exists
x <- rnorm(nobs)
y <- 50 + 2*x + rnorm(nobs,0, 5)
df <- data.frame(x=x, y=y)
lm.mod <- lm(y ~ x, data=df)
summary(lm.mod)
##
## Call:
## lm(formula = y ~ x, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5615 -3.0093 -0.9287 4.1543 6.2955
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 49.7991 0.9598 51.883 <2e-16 ***
## x 1.6087 1.0013 1.607 0.126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.245 on 18 degrees of freedom
## Multiple R-squared: 0.1254, Adjusted R-squared: 0.07682
## F-statistic: 2.581 on 1 and 18 DF, p-value: 0.1256
The coefficient for x is not statistically significant at the 90% confidence level. However, the model has some predictive power. Combined with a strong narrative, intuition, and policy implications, this may be a viable model despite being “statistically insignificant”.