El p-valor

Ejemplo: Prueba T, poblaciones independientes

Se toma una prueba realizada sobre diferencia de medias frente al peso de niños nacidos por madres fumadoras y niños nacidos por madres no fumadoras

\(H_0: \mu(nf)−\mu(f)=0\)

\(H_a:\mu(nf)−\mu(f)≠0\)

smoker    <- births %>% filter(smoke == "smoker") %>% pull(weight)
nonsmoker <- births %>% filter(smoke == "nonsmoker") %>% pull(weight)
mean(nonsmoker) - mean(smoker)
## [1] 0.4005
ggplot(births,aes(x = weight)) + 
  geom_histogram(aes(y = ..density.., colour = smoke)) +
  facet_grid(.~ smoke) +
  theme_bw() + theme(legend.position = "none")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

par(mar = c(2, 2, 2, 2))
par(mfrow = c(1, 2))
qqnorm(nonsmoker, xlab = "", ylab = "",
       main = "nonsmoker", col = "firebrick")
qqline(nonsmoker)
qqnorm(smoker, xlab = "", ylab = "",
       main = "smoker", col = "springgreen4")
qqline(smoker)

shapiro.test(smoker)
## 
##  Shapiro-Wilk normality test
## 
## data:  smoker
## W = 0.89491, p-value = 0.0003276
shapiro.test(nonsmoker)
## 
##  Shapiro-Wilk normality test
## 
## data:  nonsmoker
## W = 0.92374, p-value = 2.234e-05
ggplot(data = births) +
  geom_boxplot(aes(x = smoke, y = weight, colour = smoke)) +
  theme_bw() + theme(legend.position = "none")

require(car)
## Loading required package: car
## Warning: package 'car' was built under R version 4.1.2
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
## The following object is masked from 'package:openintro':
## 
##     densityPlot
fligner.test(weight ~ smoke, data = births)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  weight by smoke
## Fligner-Killeen:med chi-squared = 0.56858, df = 1, p-value = 0.4508
leveneTest(weight ~ smoke, data = births, center = "median")
## Levene's Test for Homogeneity of Variance (center = "median")
##        Df F value Pr(>F)
## group   1  0.4442 0.5062
##       148

\(\alpha = 0.05\)

prueba_t = t.test(
  x           = smoker,
  y           = nonsmoker,
  alternative = "two.sided",
  mu          = 0,
  var.equal   = TRUE,
  conf.level  = 0.95
)

p_valor=prueba_t$p.value
p_valor
## [1] 0.1228756
ifelse(p_valor>0.05, "No se rechaza la H_O", "Se rechaza la H_O")
## [1] "No se rechaza la H_O"

Ejemplo: Prueba T, poblaciones pareadas

datos <- data.frame(
          corredor = c(1:10),
          antes = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3),
          despues = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)
        )
head(datos, 4)
##   corredor antes despues
## 1        1  12.9    12.7
## 2        2  13.5    13.6
## 3        3  12.8    12.0
## 4        4  15.6    15.2
diferencia <- datos$antes - datos$despues
datos      <- cbind(datos, diferencia)
head(datos,4)
##   corredor antes despues diferencia
## 1        1  12.9    12.7        0.2
## 2        2  13.5    13.6       -0.1
## 3        3  12.8    12.0        0.8
## 4        4  15.6    15.2        0.4
colMeans(datos[,-1])
##      antes    despues diferencia 
##      14.48      14.53      -0.05

\(H_0: \mu(nf)−\mu(f)=0\)

\(H_a:\mu(nf)−\mu(f)≠0\)

\(\alpha = 0.05\)

par(mar = c(2, 2, 2, 2))
par(mfrow = c(1, 2))
qqnorm(datos$antes, xlab = "", ylab = "", main = "antes")
qqline(datos$antes)
qqnorm(datos$despues, xlab = "", ylab = "", main = "despues")
qqline(datos$despues)

shapiro.test(datos$antes)
## 
##  Shapiro-Wilk normality test
## 
## data:  datos$antes
## W = 0.94444, p-value = 0.6033
shapiro.test(datos$despues)
## 
##  Shapiro-Wilk normality test
## 
## data:  datos$despues
## W = 0.93638, p-value = 0.5135

Como se encuentra el p-valor por fomula

\(\bar{d} = -0.05\)

\(\hat{s}_{diferencial}\)

\(SE = \frac{\hat{s}_{diferencial}}{\sqrt{n}}\)

\(Tcal =\frac{\bar{d}}{SE}\)

$p-valor= P(t_{df}=9 < Tcal) + P(t_{df}=9>| Tcal $

Continuando con la prueba T Pareada

pt(q = -0.2133085, df = 9) + (1 - pt(q = 0.2133085, df = 9))
## [1] 0.83584

\(d = \tfrac{\left | \bar{d} \right |}{\hat{s}_{diferencia}}\)

\(d = \tfrac{\left | -0.05 \right |}{0.7412} = 0.068\)

#El p-value al ser menor que α, no hay evidencias significativas para rechazar H0 en favor de HA. No se pude considerar que el rendimiento de los atletas haya cambiado.