Sistemas de hipótesis

Una muestra

\[ \begin{array}{|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} \\ \hline H_0: \mu \geq \mu_0 \leftrightarrow \mu - \mu_0 \geq 0 & H_1: \mu < \mu_0 \leftrightarrow \mu - \mu_0 < 0 \\ \hline H_0: \mu = \mu_0 \leftrightarrow \mu - \mu_0 = 0 & H_1: \mu \neq \mu_0 \leftrightarrow \mu - \mu_0 \neq 0 \\ \hline H_0: \mu \leq \mu_0 \leftrightarrow \mu - \mu_0 \leq 0 & H_1: \mu > \mu_0 \leftrightarrow \mu - \mu_0 > 0 \\ \hline \end{array} \]

Estadísticos de prueba

Varianza conocida

\[ \text{Si }X{\sim}N(\mu,\sigma_x^2)\text{ entonces }\frac{\overline{x}-\mu_0}{\sqrt{\frac{\sigma_x^2}{n}}}{\sim}N(0,1) \]

\[ \text{Si }X{\sim}P\left(E_P(X),Var_P(X)\right)\text{ entonces }\frac{\overline{x}-\mu_0}{\sqrt{\frac{\sigma_x^2}{n}}}\stackrel{n{\rightarrow}\infty}{\sim}N(0,1) \]

Varianza desconocida

\[ \text{Si }X{\sim}N(\mu,\sigma_x^2)\text{ entonces }\frac{\overline{x}-\mu_0}{\sqrt{\frac{S_x^2}{n}}}{\sim}t_{(n-1)} \]

\[ \text{Si }X{\sim}P\left(E_P(X),Var_P(X)\right)\text{ entonces }\frac{\overline{x}-\mu_0}{\sqrt{\frac{S_x^2}{n}}}\stackrel{n{\rightarrow}\infty}{\sim}N(0,1) \]

Regiones de rechazo de la hipótesis nula (\(H_0\))

Varianza conocida

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \geq \mu_0 \leftrightarrow \mu - \mu_0 \geq 0 & H_1: \mu < \mu_0 \leftrightarrow \mu - \mu_0 < 0 & \left( -\infty, \overline{x} - Z_{1-\alpha}\sqrt{\frac{\sigma_x^2}{n}} \right) \\ \hline H_0: \mu = \mu_0 \leftrightarrow \mu - \mu_0 = 0 & H_1: \mu \neq \mu_0 \leftrightarrow \mu - \mu_0 \neq 0 & \left( -\infty, \overline{x} - Z_{1-\frac{\alpha}{2}}\sqrt{\frac{\sigma_x^2}{n}} \right) \cup \left( \overline{x} + Z_{1-\frac{\alpha}{2}}\sqrt{\frac{\sigma_x^2}{n}}, +\infty \right) \\ \hline H_0: \mu \leq \mu_0 \leftrightarrow \mu - \mu_0 \leq 0 & H_1: \mu > \mu_0 \leftrightarrow \mu - \mu_0 > 0 & \left( \overline{x} + Z_{1-\alpha}\sqrt{\frac{\sigma_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

Varianza desconocida

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \geq \mu_0 \leftrightarrow \mu - \mu_0 \geq 0 & H_1: \mu < \mu_0 \leftrightarrow \mu - \mu_0 < 0 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}} \right) \\ \hline H_0: \mu = \mu_0 \leftrightarrow \mu - \mu_0 = 0 & H_1: \mu \neq \mu_0 \leftrightarrow \mu - \mu_0 \neq 0 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}} \right) \cup \left( \overline{x} + t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline H_0: \mu \leq \mu_0 \leftrightarrow \mu - \mu_0 \leq 0 & H_1: \mu > \mu_0 \leftrightarrow \mu - \mu_0 > 0 & \left( \overline{x} + t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

Varianza desconocida \(n{\rightarrow}\infty\)

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \geq \mu_0 \leftrightarrow \mu - \mu_0 \geq 0 & H_1: \mu < \mu_0 \leftrightarrow \mu - \mu_0 < 0 & \left( -\infty, \overline{x} - Z_{1-\alpha}\sqrt{\frac{S_x^2}{n}} \right) \\ \hline H_0: \mu = \mu_0 \leftrightarrow \mu - \mu_0 = 0 & H_1: \mu \neq \mu_0 \leftrightarrow \mu - \mu_0 \neq 0 & \left( -\infty, \overline{x} - Z_{1-\frac{\alpha}{2}}\sqrt{\frac{S_x^2}{n}} \right) \cup \left( \overline{x} + Z_{1-\frac{\alpha}{2}}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline H_0: \mu \leq \mu_0 \leftrightarrow \mu - \mu_0 \leq 0 & H_1: \mu > \mu_0 \leftrightarrow \mu - \mu_0 > 0 & \left( \overline{x} + Z_{1-\alpha}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

Ejercicios y ejemplos

Varianza desconocida

Suponer una población de habitantes con un salario mínimo de $1.462.000 pesos; simular la población tomar una muestrra y probar las hipótesis con respecto a la media de que ésta es, efectivamente, $1.462.000.

  • Carga de paquetes
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic)
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## 
## The following object is masked from 'package:Matrix':
## 
##     mean
## 
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## 
## The following object is masked from 'package:purrr':
## 
##     cross
## 
## The following object is masked from 'package:ggplot2':
## 
##     stat
## 
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## 
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
  • Simulación de la población
# Fijar los parámetros poblacionales
mu <- 1462000

sigma2 <- runif(
  n=1,
  min=162000,
  max=162000
)

# Simular la población
ingresos <- rnorm(
  n=1000000,
  mean=mu,
  sd=sqrt(sigma2)
)

# Crear un data frame
INGRESOS <- data.frame(
  ingresos
)
  • Distribución de los datos
INGRESOS %>% 
  ggplot(
    mapping=aes(
      y=ingresos
    )
  ) +
  geom_boxplot(
    colour="cyan",
    fill="magenta"
  ) +
  labs(
    title="Boxplot de Ingresos",
    y="Ingresos",
    x=""
  )

Vamos a probar hipótesis

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \geq \mu_0=1462000 & H_1: \mu < \mu_0=1462000 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}} \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.3
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="less",
  mu=1462000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = -0.55063, df = 3e+05, p-value = 0.2909
## alternative hypothesis: true mean is less than 1462000
## 95 percent confidence interval:
##     -Inf 1462001
## sample estimates:
## mean of x 
##   1462000

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \geq \mu_0=2000000 & H_1: \mu < \mu_0=2000000 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}} \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.2
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="less",
  mu=2000000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = -598101, df = 2e+05, p-value < 2.2e-16
## alternative hypothesis: true mean is less than 2e+06
## 95 percent confidence interval:
##     -Inf 1462002
## sample estimates:
## mean of x 
##   1462000

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu = \mu_0=1462000 & H_1: \mu \neq \mu_0=1462000 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}} \right) \cup \left( \overline{x} + t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.1
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="two.sided",
  mu=1462000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = 1.7321, df = 99999, p-value = 0.08326
## alternative hypothesis: true mean is not equal to 1462000
## 95 percent confidence interval:
##  1462000 1462005
## sample estimates:
## mean of x 
##   1462002

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu = \mu_0=2000000 & H_1: \mu \neq \mu_0=2000000 & \left( -\infty, \overline{x} - t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}} \right) \cup \left( \overline{x} + t_{\left(n-1,1-\frac{\alpha}{2}\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.03
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="two.sided",
  mu=2000000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = -230776, df = 29999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 2e+06
## 95 percent confidence interval:
##  1461994 1462003
## sample estimates:
## mean of x 
##   1461998

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \leq \mu_0=1462000 & H_1: \mu > \mu_0=1462000 & \left( \overline{x} + t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.02
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="greater",
  mu=1462000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = 0.1528, df = 19999, p-value = 0.4393
## alternative hypothesis: true mean is greater than 1462000
## 95 percent confidence interval:
##  1461996     Inf
## sample estimates:
## mean of x 
##   1462000

\[ \begin{array}{|c|c|c|} \hline \text{Hipótesis nula ($H_0$)} & \text{Hipótesis alternativa ($H_1$)} & \text{Región de rechazo ($H_0$)} \\ \hline H_0: \mu \leq \mu_0=2000000 & H_1: \mu > \mu_0=2000000 & \left( \overline{x} + t_{\left(n-1,1-\alpha\right)}\sqrt{\frac{S_x^2}{n}}, +\infty \right) \\ \hline \end{array} \]

  • Obtener una muestra aleatoria
muestra <- INGRESOS %>% 
  sample_frac(
    size=0.01
  )
ggplot(
  muestra,
  aes(
    x="",
    y=ingresos
  )
) +
  geom_jitter(
    width=0.2,
    height=0
  ) +
  stat_summary(
    fun.data="mean_se",
    col="red"
  ) +
  labs(
    title="Gráfico para Jitter la muestra de ingresos",
    y="Ingresos",
    x=""
  )

  • Probar la hipotesis Nula
t.test(
  x=muestra,
  alternative="greater",
  mu=2000000
)
## 
##  One Sample t-test
## 
## data:  muestra
## t = -132942, df = 9999, p-value = 1
## alternative hypothesis: true mean is greater than 2e+06
## 95 percent confidence interval:
##  1461995     Inf
## sample estimates:
## mean of x 
##   1462002