Sistema de hipótesis

Hipótesis nula

\[H_0:\sigma_h^2\geq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\geq0\]

\[H_0:\sigma_h^2=\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2=0\]

\[H_0:\sigma_h^2\leq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\leq0\]

Hipótesis alternativa

\[H_1:\sigma_h^2<\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2<0\]

\[H_1:\sigma_h^2\neq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\neq0\]

\[H_1:\sigma_h^2>\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2>0\]

Generar los datos de ingresos

Fijar semilla, las medias y las desviaciones estándar

media.hombres <- 928116
media.mujeres <- 928116
desvest.hombres <-  25000
desvest.mujeres <- 35000

set.seed(12345678)
ingresos.hombres <- rnorm(n=24000, mean=media.hombres, sd=desvest.hombres)
head(ingresos.hombres)

## [1] 956813.3 968576.0 868639.4 921698.5 950425.3 909307.4

Ingresos.hombres <- cbind.data.frame(c(rep("hombres",24000)),c(ingresos.hombres), stringsAsFactors = TRUE)
colnames(Ingresos.hombres) <- c("Genero","Ingreso")
head(Ingresos.hombres)

set.seed(12345678)
ingresos.mujeres <- rnorm(n=26000, mean=media.mujeres, sd=desvest.mujeres)
Ingresos.mujeres <- 
cbind.data.frame(c(rep("mujeres",26000)),c(ingresos.mujeres), stringsAsFactors = TRUE)
colnames(Ingresos.mujeres) <- c("Genero","Ingreso")
head(Ingresos.mujeres)

Poner todo dentro de un data frame o marco de datos

Ingresos <- rbind.data.frame(Ingresos.hombres, Ingresos.mujeres)
head(Ingresos)

tail(Ingresos)

Gráfico de los ingresos según el género

library(ggplot2);library(dplyr)

## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_unnamed' by
## 'rlang::check_dots_unnamed' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_used' by
## 'rlang::check_dots_used' when loading 'tibble'

## Warning: replacing previous import 'ellipsis::check_dots_empty' by
## 'rlang::check_dots_empty' when loading 'tibble'

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

ggplot(data=Ingresos,aes(x=Genero,y=Ingreso, colour=Genero)) + geom_boxplot()

Seleccionar una muestra aleatoria

set.seed(12345678)
muestra <- Ingresos[sample(1:nrow(Ingresos),size=200),]
head(muestra)

Realización de la prueba de hipótesis

Estadístico de prueba

\[F=\frac{S_x^2}{S_y^2}=\frac{S_h^2}{S_m^2}\sim F_{(n_h-1,n_m-1)}\]

Sistema de hipótesis

\[H_0:\sigma_h^2\leq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\leq0\text{ versus }H_1:\sigma_h^2>\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2>0\]

var.test(x=muestra[muestra$Genero=="hombres","Ingreso"],y=muestra[muestra$Genero=="mujeres","Ingreso"],alternative="greater")

## 
##  F test to compare two variances
## 
## data:  muestra[muestra$Genero == "hombres", "Ingreso"] and muestra[muestra$Genero == "mujeres", "Ingreso"]
## F = 0.61725, num df = 99, denom df = 99, p-value = 0.9914
## alternative hypothesis: true ratio of variances is greater than 1
## 95 percent confidence interval:
##  0.4427721       Inf
## sample estimates:
## ratio of variances 
##          0.6172514

Visualización de la prueba de hipótesis

library(visualize)
gl.m <- length(muestra[muestra$Genero=="mujeres","Ingreso"])-1
gl.h <- length(muestra[muestra$Genero=="hombres","Ingreso"])-1
valor.de.tabla <- qf(0.95, df1=gl.h, df2=gl.m)
valor.de.tabla

## [1] 1.394061

F = var(muestra[muestra$Genero=="hombres","Ingreso"])/var(muestra[muestra$Genero=="mujeres","Ingreso"])
F

## [1] 0.6172514

pf(F,gl.h,gl.m,lower.tail=FALSE)

## [1] 0.9914096

visualize.f(stat=valor.de.tabla,df1=gl.h,df2=gl.m,section="upper")
abline(v=F,col="red",lty=2,lwd=3)

Sistema de hipótesis

\[H_0:\sigma_h^2=\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2=0\text{ versus }H_1:\sigma_h^2\neq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\neq0\]

var.test(x=muestra[muestra$Genero=="hombres","Ingreso"],y=muestra[muestra$Genero=="mujeres","Ingreso"],alternative="two.sided")

## 
##  F test to compare two variances
## 
## data:  muestra[muestra$Genero == "hombres", "Ingreso"] and muestra[muestra$Genero == "mujeres", "Ingreso"]
## F = 0.61725, num df = 99, denom df = 99, p-value = 0.01718
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4153124 0.9173799
## sample estimates:
## ratio of variances 
##          0.6172514

Visualización de la prueba de hipótesis

library(visualize)
gl.m <- length(muestra[muestra$Genero=="mujeres","Ingreso"])-1
gl.h <- length(muestra[muestra$Genero=="hombres","Ingreso"])-1
valor.de.tabla.1 <- qf(0.95, df1=gl.h, df2=gl.m)
valor.de.tabla.1

## [1] 1.394061

valor.de.tabla.2 <- qf(0.05, df1=gl.h, df2=gl.m)
valor.de.tabla.2

## [1] 0.7173286

F = var(muestra[muestra$Genero=="hombres","Ingreso"])/var(muestra[muestra$Genero=="mujeres","Ingreso"])
F

## [1] 0.6172514

pf(F,gl.h,gl.m,lower.tail=FALSE)

## [1] 0.9914096

visualize.f(stat=c(valor.de.tabla.2,valor.de.tabla.1),df1=gl.h,df2=gl.m,section="tails")

## Warning: Abnormal request for tails condition supplied on nonsymmetric distribution.

abline(v=F,col="red",lty=2,lwd=3)

Sistema de hipótesis

\[H_0:\sigma_h^2\geq\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2\geq0\text{ versus }H_1:\sigma_h^2<\sigma_m^2\leftrightarrow\sigma_h^2-\sigma_m^2<0\]

var.test(x=muestra[muestra$Genero=="hombres","Ingreso"],y=muestra[muestra$Genero=="mujeres","Ingreso"],alternative="less")

## 
##  F test to compare two variances
## 
## data:  muestra[muestra$Genero == "hombres", "Ingreso"] and muestra[muestra$Genero == "mujeres", "Ingreso"]
## F = 0.61725, num df = 99, denom df = 99, p-value = 0.00859
## alternative hypothesis: true ratio of variances is less than 1
## 95 percent confidence interval:
##  0.0000000 0.8604862
## sample estimates:
## ratio of variances 
##          0.6172514

Visualización de la prueba de hipótesis

library(visualize)
gl.m <- length(muestra[muestra$Genero=="mujeres","Ingreso"])-1
gl.h <- length(muestra[muestra$Genero=="hombres","Ingreso"])-1
valor.de.tabla <- qf(0.05, df1=gl.h, df2=gl.m)
valor.de.tabla

## [1] 0.7173286

F = var(muestra[muestra$Genero=="hombres","Ingreso"])/var(muestra[muestra$Genero=="mujeres","Ingreso"])
F

## [1] 0.6172514

pf(F,gl.h,gl.m,lower.tail=FALSE)

## [1] 0.9914096

visualize.f(stat=valor.de.tabla,df1=gl.h,df2=gl.m,section="lower")
abline(v=F,col="red",lty=2,lwd=3)

Ejercicios

Realizar la prueba de hipótesis con un nivel de significancia del 0.02
Realizar la prueba de hipótesis con un nivel de significancia del 0.03
Realizar la prueba de hipótesis con un nivel de significancia del 0.06
Realizar la prueba de hipótesis con un nivel de significancia del 0.07
Realizar la prueba de hipótesis con un nivel de significancia del 0.08

Prueba de hipótesis sobre la varianza

M Sc. Mario Gregorio Saavedra Rodrgíguez

2/5/2020

Estadístico de prueba

Sistema de hipótesis

Hipótesis nula

Hipótesis alternativa

Generar los datos de ingresos

Fijar semilla, las medias y las desviaciones estándar

Poner todo dentro de un data frame o marco de datos

Gráfico de los ingresos según el género

Seleccionar una muestra aleatoria

Realización de la prueba de hipótesis

Estadístico de prueba

Sistema de hipótesis

Visualización de la prueba de hipótesis

Sistema de hipótesis

Visualización de la prueba de hipótesis

Sistema de hipótesis

Visualización de la prueba de hipótesis

Ejercicios