P-Valor

El p-valor

Ejemplo: Prueba T, poblaciones independientes

Se toma una prueba realizada sobre diferencia de medias frente al peso de niños nacidos por madres fumadoras y niños nacidos por madres no fumadoras

$H_0: \mu(nf)−\mu(f)=0$

$H_a:\mu(nf)−\mu(f)≠0$

smoker    <- births %>% filter(smoke == "smoker") %>% pull(weight)
nonsmoker <- births %>% filter(smoke == "nonsmoker") %>% pull(weight)
mean(nonsmoker) - mean(smoker)

## [1] 0.4005

ggplot(births,aes(x = weight)) + 
  geom_histogram(aes(y = ..density.., colour = smoke)) +
  facet_grid(.~ smoke) +
  theme_bw() + theme(legend.position = "none")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

par(mar = c(2, 2, 2, 2))
par(mfrow = c(1, 2))
qqnorm(nonsmoker, xlab = "", ylab = "",
       main = "nonsmoker", col = "firebrick")
qqline(nonsmoker)
qqnorm(smoker, xlab = "", ylab = "",
       main = "smoker", col = "springgreen4")
qqline(smoker)

shapiro.test(smoker)

## 
##  Shapiro-Wilk normality test
## 
## data:  smoker
## W = 0.89491, p-value = 0.0003276

shapiro.test(nonsmoker)

## 
##  Shapiro-Wilk normality test
## 
## data:  nonsmoker
## W = 0.92374, p-value = 2.234e-05

ggplot(data = births) +
  geom_boxplot(aes(x = smoke, y = weight, colour = smoke)) +
  theme_bw() + theme(legend.position = "none")

require(car)

## Loading required package: car

## Warning: package 'car' was built under R version 4.1.2

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

## The following object is masked from 'package:openintro':
## 
##     densityPlot

fligner.test(weight ~ smoke, data = births)

## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  weight by smoke
## Fligner-Killeen:med chi-squared = 0.56858, df = 1, p-value = 0.4508

leveneTest(weight ~ smoke, data = births, center = "median")

## Levene's Test for Homogeneity of Variance (center = "median")
##        Df F value Pr(>F)
## group   1  0.4442 0.5062
##       148

$\alpha = 0.05$

prueba_t = t.test(
  x           = smoker,
  y           = nonsmoker,
  alternative = "two.sided",
  mu          = 0,
  var.equal   = TRUE,
  conf.level  = 0.95
)

p_valor=prueba_t$p.value
p_valor

## [1] 0.1228756

ifelse(p_valor>0.05, "No se rechaza la H_O", "Se rechaza la H_O")

## [1] "No se rechaza la H_O"

Ejemplo: Prueba T, poblaciones pareadas

datos <- data.frame(
          corredor = c(1:10),
          antes = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3),
          despues = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)
        )
head(datos, 4)

##   corredor antes despues
## 1        1  12.9    12.7
## 2        2  13.5    13.6
## 3        3  12.8    12.0
## 4        4  15.6    15.2

diferencia <- datos$antes - datos$despues
datos      <- cbind(datos, diferencia)
head(datos,4)

##   corredor antes despues diferencia
## 1        1  12.9    12.7        0.2
## 2        2  13.5    13.6       -0.1
## 3        3  12.8    12.0        0.8
## 4        4  15.6    15.2        0.4

colMeans(datos[,-1])

##      antes    despues diferencia 
##      14.48      14.53      -0.05

$H_0: \mu(nf)−\mu(f)=0$

$H_a:\mu(nf)−\mu(f)≠0$

$\alpha = 0.05$

par(mar = c(2, 2, 2, 2))
par(mfrow = c(1, 2))
qqnorm(datos$antes, xlab = "", ylab = "", main = "antes")
qqline(datos$antes)
qqnorm(datos$despues, xlab = "", ylab = "", main = "despues")
qqline(datos$despues)

shapiro.test(datos$antes)

## 
##  Shapiro-Wilk normality test
## 
## data:  datos$antes
## W = 0.94444, p-value = 0.6033

shapiro.test(datos$despues)

## 
##  Shapiro-Wilk normality test
## 
## data:  datos$despues
## W = 0.93638, p-value = 0.5135

Como se encuentra el p-valor por fomula

$\bar{d} = -0.05$

$\hat{s}_{diferencial}$

$SE = \frac{\hat{s}_{diferencial}}{\sqrt{n}}$

$Tcal =\frac{\bar{d}}{SE}$

$p-valor= P(t_{df}=9 < Tcal) + P(t_{df}=9>| Tcal $

Continuando con la prueba T Pareada

pt(q = -0.2133085, df = 9) + (1 - pt(q = 0.2133085, df = 9))

## [1] 0.83584

$d = \tfrac{\left | \bar{d} \right |}{\hat{s}_{diferencia}}$

$d = \tfrac{\left | -0.05 \right |}{0.7412} = 0.068$

#El p-value al ser menor que α, no hay evidencias significativas para rechazar H0 en favor de HA. No se pude considerar que el rendimiento de los atletas haya cambiado.