StudentsPerformance <- read.csv("C:/Users/user/Desktop/StudentsPerformance.csv")
library(dplyr)
SP <- StudentsPerformance %>%
mutate(.,
gender=as.factor(gender),
race.ethnicity=as.factor(race.ethnicity),
parental.level.of.education=as.factor(parental.level.of.education),
lunch=as.factor(lunch),
test.preparation.course=as.factor(test.preparation.course),
math.score=as.numeric(math.score),
reading.score=as.numeric(reading.score),
writing.score=as.numeric(writing.score)
)
str(SP)
## 'data.frame': 1000 obs. of 8 variables:
## $ gender : Factor w/ 2 levels "female","male": 1 1 1 2 2 1 1 2 2 1 ...
## $ race.ethnicity : Factor w/ 5 levels "group A","group B",..: 2 3 2 1 3 2 2 2 4 2 ...
## $ parental.level.of.education: Factor w/ 6 levels "associate's degree",..: 2 5 4 1 5 1 5 5 3 3 ...
## $ lunch : Factor w/ 2 levels "free/reduced",..: 2 2 2 1 2 2 2 1 1 1 ...
## $ test.preparation.course : Factor w/ 2 levels "completed","none": 2 1 2 2 2 2 1 2 1 2 ...
## $ math.score : num 72 69 90 47 76 71 88 40 64 38 ...
## $ reading.score : num 72 90 95 57 78 83 95 43 64 60 ...
## $ writing.score : num 74 88 93 44 75 78 92 39 67 50 ...
Se evidencia que es una data de 1000 observaciones y 8 columnas. Con 5 variables de factor (cualitativas) y 3 de tipo numerico (cuantitativas)
# Margen de error o de precisión
MargenError <- seq(0, 0.1, by = 0.001) # Nivel de precisión hasta 10% (0.1)
# Cuál es el nivel de confianza de la muestra???
NC <- 0.95 # Nivel de confianza
# El valor del Z para el nivel de confianza
z <- qnorm((1 - NC) / 2, mean = 0, sd = 1) # Valor de Z
# Qué proporción de éxito es necesaria para la toma de la muestra??
P <- 0.5 # Probabilidad de éxito (es en la relacion p*q el valor mas alto posible, es decir la varianza mas alta)
# Cálculo de la muestra para población infinita
n0 <- ((z^2) * P * (1 - P)) / MargenError^2 # Cuando
# Cálculo de la muestra ajustada para población finita
N <- nrow(SP)
n <- n0 / (1 + (n0 / N))
# Creación del dataframe para el gráfico
data <- data.frame(MargenError, n=round(n, digits = 0) )
head(data, n = 10)
## MargenError n
## 1 0.000 NaN
## 2 0.001 999
## 3 0.002 996
## 4 0.003 991
## 5 0.004 984
## 6 0.005 975
## 7 0.006 964
## 8 0.007 951
## 9 0.008 938
## 10 0.009 922
# Gráfico que relaciona el tamaño de la muestra y el error obtenido
library(ggplot2)
ggplot(data, aes(x = n, y = MargenError)) +
geom_line() +
geom_point() +
labs(x = "Tamaño de muestra (n) | ajustado para población finita",
y = "Margen de error") +
ggtitle("Relación entre el tamaño de muestra y el margen de error") +
theme_classic()+
geom_vline(xintercept = 248, colour="red", size=0.5)+
geom_hline(yintercept = 0.054, colour="blue", size=0.5)
A partir de los datos de n y error se selecciono una muestra de 248
elementos, pues se encuentra en el punto de la grafica denominado “codo”
o donde las pendientes sufren el cambio mas drastico
# numero de muestras seleccionado
n <- 248
# Extraemos los ID de las observaciones
sam.MAS <- sample(nrow(SP), size = n, replace = TRUE)
# Extraemos los valores del dataframe a partir de los ID seleccionados
data.MAS <- SP[sam.MAS, ]
head(data.MAS, n = 10)
## gender race.ethnicity parental.level.of.education lunch
## 771 male group B high school standard
## 679 male group D associate's degree free/reduced
## 828 female group C some high school standard
## 621 female group C high school free/reduced
## 20 female group C associate's degree free/reduced
## 821 female group A some high school standard
## 417 male group C bachelor's degree standard
## 810 male group B bachelor's degree standard
## 92 male group C high school free/reduced
## 869 male group E associate's degree free/reduced
## test.preparation.course math.score reading.score writing.score
## 771 none 52 48 49
## 679 none 81 75 78
## 828 none 65 69 76
## 621 none 35 61 54
## 20 none 54 58 61
## 821 completed 85 90 92
## 417 completed 71 74 68
## 810 none 59 54 51
## 92 none 27 34 36
## 869 completed 78 74 72
Seleccione una base de datos de su interés y realice un análisis de muestreo (Prueba piloto) para seleccionar la muestra adecuada para un diseño sistemático y realice los cálculos de interés. ##
library(TeachingSampling)
library(ggplot2)
library(gridExtra)
tamaño <- 0.05 # se tomo 5% (se recomienda entre 5 a 10%)
n <- round(tamaño*N,0) # Numero de muestras a tomar
k <- round(N/n,0) # Cada cuanto debo seleccionar la muestra (depende del investigador)
# Extraccion de la muestra
set.seed(1234)
sam <- S.SY(N,k) # Filas de la muestra
data <- SP[sam,] # Seleccionar de la base de datos original las filas seleccionadas como muestra
str(data) # Revisar la estructura de la data muestreada
## 'data.frame': 50 obs. of 8 variables:
## $ gender : Factor w/ 2 levels "female","male": 1 2 1 2 2 2 2 1 1 2 ...
## $ race.ethnicity : Factor w/ 5 levels "group A","group B",..: 3 5 3 2 3 3 3 3 3 4 ...
## $ parental.level.of.education: Factor w/ 6 levels "associate's degree",..: 6 1 3 1 1 3 2 5 4 1 ...
## $ lunch : Factor w/ 2 levels "free/reduced",..: 2 2 1 1 1 2 2 2 2 2 ...
## $ test.preparation.course : Factor w/ 2 levels "completed","none": 2 1 2 2 1 2 2 1 1 2 ...
## $ math.score : num 69 81 33 44 78 84 58 70 81 61 ...
## $ reading.score : num 75 81 41 41 81 77 55 89 91 55 ...
## $ writing.score : num 78 79 43 38 82 74 48 88 87 52 ...
la muestra piloto cuenta con un total de 50, que equivale al 5% de la data original. Las observaciones seleccionadas de la muestra fueron: 16, 36, 56, 76, 96, 116, 136, 156, 176, 196, 216, 236, 256, 276, 296, 316, 336, 356, 376, 396, 416, 436, 456, 476, 496, 516, 536, 556, 576, 596, 616, 636, 656, 676, 696, 716, 736, 756, 776, 796, 816, 836, 856, 876, 896, 916, 936, 956, 976, 996y se evidencia que presentan un distanciamiento sistematico entre muestras.
# Valores necesarios para calcular tamaño de la muestra
Error.1 <- 0.10*mean(data$math.score) # Para calcular mi muestra admito un 10% de error
Var <- var(data[,6]) # varianza estimada con la columna "Numero de empelados"
MargenError <- seq(0, Error.1, by = Error.1/100)
NC <- 0.95
z <- abs(qnorm((1 - NC)/2, mean = 0, sd = 1))
n.0 <- round(((z^2) * Var) / MargenError^2, digits=0)
# Cálculos previos para población finita
n <- round(n.0 / (1 + (n.0 / N)), digits=0)
# Crear un data frame con los valores de n.0 y MargenError
data_plot_infinita <- data.frame(n = n.0, MargenError = MargenError)
data_plot_infinita
## n MargenError
## 1 Inf 0.00000
## 2 230710 0.06564
## 3 57677 0.13128
## 4 25634 0.19692
## 5 14419 0.26256
## 6 9228 0.32820
## 7 6409 0.39384
## 8 4708 0.45948
## 9 3605 0.52512
## 10 2848 0.59076
## 11 2307 0.65640
## 12 1907 0.72204
## 13 1602 0.78768
## 14 1365 0.85332
## 15 1177 0.91896
## 16 1025 0.98460
## 17 901 1.05024
## 18 798 1.11588
## 19 712 1.18152
## 20 639 1.24716
## 21 577 1.31280
## 22 523 1.37844
## 23 477 1.44408
## 24 436 1.50972
## 25 401 1.57536
## 26 369 1.64100
## 27 341 1.70664
## 28 316 1.77228
## 29 294 1.83792
## 30 274 1.90356
## 31 256 1.96920
## 32 240 2.03484
## 33 225 2.10048
## 34 212 2.16612
## 35 200 2.23176
## 36 188 2.29740
## 37 178 2.36304
## 38 169 2.42868
## 39 160 2.49432
## 40 152 2.55996
## 41 144 2.62560
## 42 137 2.69124
## 43 131 2.75688
## 44 125 2.82252
## 45 119 2.88816
## 46 114 2.95380
## 47 109 3.01944
## 48 104 3.08508
## 49 100 3.15072
## 50 96 3.21636
## 51 92 3.28200
## 52 89 3.34764
## 53 85 3.41328
## 54 82 3.47892
## 55 79 3.54456
## 56 76 3.61020
## 57 74 3.67584
## 58 71 3.74148
## 59 69 3.80712
## 60 66 3.87276
## 61 64 3.93840
## 62 62 4.00404
## 63 60 4.06968
## 64 58 4.13532
## 65 56 4.20096
## 66 55 4.26660
## 67 53 4.33224
## 68 51 4.39788
## 69 50 4.46352
## 70 48 4.52916
## 71 47 4.59480
## 72 46 4.66044
## 73 45 4.72608
## 74 43 4.79172
## 75 42 4.85736
## 76 41 4.92300
## 77 40 4.98864
## 78 39 5.05428
## 79 38 5.11992
## 80 37 5.18556
## 81 36 5.25120
## 82 35 5.31684
## 83 34 5.38248
## 84 33 5.44812
## 85 33 5.51376
## 86 32 5.57940
## 87 31 5.64504
## 88 30 5.71068
## 89 30 5.77632
## 90 29 5.84196
## 91 28 5.90760
## 92 28 5.97324
## 93 27 6.03888
## 94 27 6.10452
## 95 26 6.17016
## 96 26 6.23580
## 97 25 6.30144
## 98 25 6.36708
## 99 24 6.43272
## 100 24 6.49836
## 101 23 6.56400
# Crear un data frame con los valores de n y MargenError
data_plot_finita <- data.frame(n = n, MargenError = MargenError)
data_plot_finita
## n MargenError
## 1 NaN 0.00000
## 2 996 0.06564
## 3 983 0.13128
## 4 962 0.19692
## 5 935 0.26256
## 6 902 0.32820
## 7 865 0.39384
## 8 825 0.45948
## 9 783 0.52512
## 10 740 0.59076
## 11 698 0.65640
## 12 656 0.72204
## 13 616 0.78768
## 14 577 0.85332
## 15 541 0.91896
## 16 506 0.98460
## 17 474 1.05024
## 18 444 1.11588
## 19 416 1.18152
## 20 390 1.24716
## 21 366 1.31280
## 22 343 1.37844
## 23 323 1.44408
## 24 304 1.50972
## 25 286 1.57536
## 26 270 1.64100
## 27 254 1.70664
## 28 240 1.77228
## 29 227 1.83792
## 30 215 1.90356
## 31 204 1.96920
## 32 194 2.03484
## 33 184 2.10048
## 34 175 2.16612
## 35 167 2.23176
## 36 158 2.29740
## 37 151 2.36304
## 38 145 2.42868
## 39 138 2.49432
## 40 132 2.55996
## 41 126 2.62560
## 42 120 2.69124
## 43 116 2.75688
## 44 111 2.82252
## 45 106 2.88816
## 46 102 2.95380
## 47 98 3.01944
## 48 94 3.08508
## 49 91 3.15072
## 50 88 3.21636
## 51 84 3.28200
## 52 82 3.34764
## 53 78 3.41328
## 54 76 3.47892
## 55 73 3.54456
## 56 71 3.61020
## 57 69 3.67584
## 58 66 3.74148
## 59 65 3.80712
## 60 62 3.87276
## 61 60 3.93840
## 62 58 4.00404
## 63 57 4.06968
## 64 55 4.13532
## 65 53 4.20096
## 66 52 4.26660
## 67 50 4.33224
## 68 49 4.39788
## 69 48 4.46352
## 70 46 4.52916
## 71 45 4.59480
## 72 44 4.66044
## 73 43 4.72608
## 74 41 4.79172
## 75 40 4.85736
## 76 39 4.92300
## 77 38 4.98864
## 78 38 5.05428
## 79 37 5.11992
## 80 36 5.18556
## 81 35 5.25120
## 82 34 5.31684
## 83 33 5.38248
## 84 32 5.44812
## 85 32 5.51376
## 86 31 5.57940
## 87 30 5.64504
## 88 29 5.71068
## 89 29 5.77632
## 90 28 5.84196
## 91 27 5.90760
## 92 27 5.97324
## 93 26 6.03888
## 94 26 6.10452
## 95 25 6.17016
## 96 25 6.23580
## 97 24 6.30144
## 98 24 6.36708
## 99 23 6.43272
## 100 23 6.49836
## 101 22 6.56400
# Graficar tamaño de muestra para población infinita
plot_infinita <- ggplot(data = data_plot_infinita, aes(x = n, y = MargenError)) +
geom_line() +
labs(title = "Tamaño de muestra para población infinita",
x = "Tamaño de muestra (n)",
y = "Margen de error") +
theme_classic()
# Graficar tamaño de muestra para población finita
plot_finita <- ggplot(data = data_plot_finita, aes(x = n, y = MargenError)) +
geom_line() +
labs(title = "Tamaño de muestra para población finita",
x = "Tamaño de muestra (n)",
y = "Margen de error") +
theme_classic()+
geom_vline(xintercept = 240, colour="red", size=0.5)+
geom_hline(yintercept = 1.77228, colour="blue", size=0.5)
# Combinar las dos gráficas en una sola imagen
combined_plot <- grid.arrange(plot_infinita, plot_finita, nrow = 1)
## Tomamos la muestra de la base de datos, a partir del tamano de
muestra definitiva
# Seleccionar de la base de datos original las filas seleccionadas en el muestreo sistematico
set.seed(1234)
n.sis <- 240
# Calculo el tamaño del intervalo para cubrir el total de filas (nrow(SP))
# con el tamaño de muestra seleccionado (n.sis)
k.sis <- round( (n.sis/nrow(SP))*100, digits = 0)
# Determino aleatoriamente el valor desdee donde inicia el muestreo
arranque <- sample(1:k.sis, size = 1) # posición inicial
# Realizo el muestreo sistematico
sam.sis <- seq(arranque, arranque+k*(n.sis-1), k)
# Extraigo las observaciones a partir de la muestra sistematica seleccionada
data.sistematico <- SP[sam.sis,]
str(data.sistematico)
## 'data.frame': 240 obs. of 8 variables:
## $ gender : Factor w/ 2 levels "female","male": 1 2 1 2 2 2 2 1 1 2 ...
## $ race.ethnicity : Factor w/ 5 levels "group A","group B",..: 3 5 3 2 3 3 3 3 3 4 ...
## $ parental.level.of.education: Factor w/ 6 levels "associate's degree",..: 6 1 3 1 1 3 2 5 4 1 ...
## $ lunch : Factor w/ 2 levels "free/reduced",..: 2 2 1 1 1 2 2 2 2 2 ...
## $ test.preparation.course : Factor w/ 2 levels "completed","none": 2 1 2 2 1 2 2 1 1 2 ...
## $ math.score : num 69 81 33 44 78 84 58 70 81 61 ...
## $ reading.score : num 75 81 41 41 81 77 55 89 91 55 ...
## $ writing.score : num 78 79 43 38 82 74 48 88 87 52 ...
# Referencia: https://rpubs.com/luis_bolanos/922284
Seleccione una base de su interés donde contenga una variable cualitativa que se pueda estratificar y una variable cuantitativa que este asociada. Realice un análisis de muestreo (Prueba piloto) para seleccionar la muestra adecuada para un diseño estratificado y realice los cálculos de interés.
estratos <- levels(as.factor(SP$race.ethnicity) )
estrato1 <- subset(x=SP, race.ethnicity=="group A"); head(estrato1, n=10)
## gender race.ethnicity parental.level.of.education lunch
## 4 male group A associate's degree free/reduced
## 14 male group A some college standard
## 15 female group A master's degree standard
## 26 male group A master's degree free/reduced
## 47 female group A associate's degree standard
## 62 male group A some high school free/reduced
## 63 male group A associate's degree free/reduced
## 73 female group A associate's degree free/reduced
## 78 male group A bachelor's degree standard
## 83 male group A some college free/reduced
## test.preparation.course math.score reading.score writing.score
## 4 none 47 57 44
## 14 completed 78 72 70
## 15 none 50 53 58
## 26 none 73 74 72
## 47 completed 55 65 62
## 62 none 39 39 34
## 63 none 62 61 55
## 73 none 41 51 48
## 78 completed 80 78 81
## 83 completed 50 47 54
estrato2 <- subset(x=SP, race.ethnicity=="group B"); head(estrato2, n=10)
## gender race.ethnicity parental.level.of.education lunch
## 1 female group B bachelor's degree standard
## 3 female group B master's degree standard
## 6 female group B associate's degree standard
## 7 female group B some college standard
## 8 male group B some college free/reduced
## 10 female group B high school free/reduced
## 13 female group B high school standard
## 18 female group B some high school free/reduced
## 22 female group B some college free/reduced
## 27 male group B some college standard
## test.preparation.course math.score reading.score writing.score
## 1 none 72 72 74
## 3 none 90 95 93
## 6 none 71 83 78
## 7 completed 88 95 92
## 8 none 40 43 39
## 10 none 38 60 50
## 13 none 65 81 73
## 18 none 18 32 28
## 22 completed 65 75 70
## 27 none 69 54 55
estrato3 <- subset(x=SP, race.ethnicity=="group C"); head(estrato3, n=10)
## gender race.ethnicity parental.level.of.education lunch
## 2 female group C some college standard
## 5 male group C some college standard
## 11 male group C associate's degree standard
## 16 female group C some high school standard
## 17 male group C high school standard
## 19 male group C master's degree free/reduced
## 20 female group C associate's degree free/reduced
## 24 female group C some high school standard
## 28 female group C bachelor's degree standard
## 29 male group C high school standard
## test.preparation.course math.score reading.score writing.score
## 2 completed 69 90 88
## 5 none 76 78 75
## 11 none 58 54 52
## 16 none 69 75 78
## 17 none 88 89 86
## 19 completed 46 42 46
## 20 none 54 58 61
## 24 none 69 73 73
## 28 none 67 69 75
## 29 none 70 70 65
estrato4 <- subset(x=SP, race.ethnicity=="group D"); head(estrato4, n=10)
## gender race.ethnicity parental.level.of.education lunch
## 9 male group D high school free/reduced
## 12 male group D associate's degree standard
## 21 male group D high school standard
## 23 male group D some college standard
## 25 male group D bachelor's degree free/reduced
## 30 female group D master's degree standard
## 31 female group D some college standard
## 34 male group D some college standard
## 37 female group D associate's degree standard
## 38 female group D some high school free/reduced
## test.preparation.course math.score reading.score writing.score
## 9 completed 64 64 67
## 12 none 40 52 43
## 21 none 66 69 63
## 23 none 44 54 53
## 25 completed 74 71 80
## 30 none 62 70 75
## 31 none 69 74 74
## 34 none 40 42 38
## 37 none 74 81 83
## 38 none 50 64 59
estrato5 <- subset(x=SP, race.ethnicity=="group E"); head(estrato5, n=10)
## gender race.ethnicity parental.level.of.education lunch
## 33 female group E master's degree free/reduced
## 35 male group E some college standard
## 36 male group E associate's degree standard
## 45 female group E associate's degree free/reduced
## 51 male group E some college standard
## 52 male group E associate's degree free/reduced
## 57 female group E associate's degree standard
## 61 male group E bachelor's degree free/reduced
## 77 male group E some high school standard
## 80 female group E master's degree standard
## test.preparation.course math.score reading.score writing.score
## 33 none 56 72 65
## 35 none 97 87 82
## 36 completed 81 81 79
## 45 none 50 56 54
## 51 none 53 55 48
## 52 completed 77 69 68
## 57 completed 82 85 86
## 61 completed 79 74 72
## 77 none 30 26 22
## 80 none 62 68 68
Se definieron los estratos a partir de la variable cualitativarace.ethnicity porque es tiene mas de dos niveles y es una variable que puede agrupar de forma diferente las respuesta de la poblacion. sin embargo no se realizo una comparacion que determine que es la mejor para estratificar la población.
n_muestra <- nrow(SP)*0.1 #Se toma eñ 10% del total de la poblacion
n1 <- round(n_muestra * nrow(estrato1)/ nrow(SP), digits = 0); n1
## [1] 9
n2 <- round(n_muestra * nrow(estrato2)/ nrow(SP), digits = 0); n2
## [1] 19
n3 <- round(n_muestra * nrow(estrato3)/ nrow(SP), digits = 0); n3
## [1] 32
n4 <- round(n_muestra * nrow(estrato4)/ nrow(SP), digits = 0); n4
## [1] 26
n5 <- round(n_muestra * nrow(estrato5)/ nrow(SP), digits = 0); n5
## [1] 14
library(sampling)
# Tamaño total muestra piloto
sum(n1, n2, n3, n4, n5)
## [1] 100
estratos <- strata(SP,
stratanames = "race.ethnicity", # variable cualitativa que estratifica
size = c(n1, n2, n3, n4, n5), # vector con los tamanos estimados por estrato
method = "srswor") # simple random sampling without replacement
muestreado <- getdata(SP, estratos)
str(muestreado)
## 'data.frame': 100 obs. of 11 variables:
## $ gender : Factor w/ 2 levels "female","male": 1 1 2 2 2 1 2 2 2 1 ...
## $ parental.level.of.education: Factor w/ 6 levels "associate's degree",..: 3 6 1 3 5 1 5 5 5 6 ...
## $ lunch : Factor w/ 2 levels "free/reduced",..: 2 2 2 2 2 1 2 2 2 2 ...
## $ test.preparation.course : Factor w/ 2 levels "completed","none": 2 2 2 2 1 1 1 2 2 2 ...
## $ math.score : num 54 82 48 62 62 76 69 66 58 69 ...
## $ reading.score : num 64 82 43 55 66 94 77 65 50 75 ...
## $ writing.score : num 68 80 45 54 68 87 77 60 45 78 ...
## $ race.ethnicity : Factor w/ 5 levels "group A","group B",..: 2 2 2 2 2 2 2 2 2 3 ...
## $ ID_unit : int 495 506 566 683 685 716 760 775 835 16 ...
## $ Prob : num 0.0474 0.0474 0.0474 0.0474 0.0474 ...
## $ Stratum : int 1 1 1 1 1 1 1 1 1 2 ...
table(muestreado$race.ethnicity)
##
## group A group B group C group D group E
## 32 9 19 26 14
# Para una muestra total de 100 datos, estas son las muestras a tomar por cada estrato.
#Se extrae la media por grupo para la variable math.score
me1 <- mean(muestreado[muestreado$race.ethnicity == "group A","math.score"])
me2 <- mean(muestreado[muestreado$race.ethnicity == "group B","math.score"])
me3 <- mean(muestreado[muestreado$race.ethnicity == "group C","math.score"])
me4 <- mean(muestreado[muestreado$race.ethnicity == "group D","math.score"])
me5 <- mean(muestreado[muestreado$race.ethnicity == "group E","math.score"])
# Calculo de la media estimada total
media.est <- (1/nrow(SP))*(sum(SP$race.ethnicity == "group A")*me1+
sum(SP$race.ethnicity == "group B")*me2+
sum(SP$race.ethnicity == "group C")*me3+
sum(SP$race.ethnicity == "group D")*me4+
sum(SP$race.ethnicity == "group E")*me5)
cbind(me1, me2, me3, me4, me5, media.est)
## me1 me2 me3 me4 me5 media.est
## [1,] 60.09375 64.11111 60.57895 66.61538 74.5 64.73737
#A partir de la muestra estratificada piloto (media.est) calculamos el tamaño de la muestra
E <- 0.1*media.est # Error admitido, en este caso seleccionamos 10%
MargenError2 <- seq(0, E, by= E/20) # diferentes niveles de error que vamos a evaluar
NC2 <- 0.95 # Nivel de confianza
Z2 <- abs(qnorm((1-NC2)/2,mean = 0, sd=1)) # estimacion del valor en la distribucion Z
Nt <- nrow(SP) # Numero total de observaciones
#Calcula la varianza estimada de cada estratos
Var_st <-
((
sum(SP$race.ethnicity == "group A") / Nt) * var(muestreado[muestreado$race.ethnicity == "group A", "math.score"])
+ (sum(SP$race.ethnicity == "group B") / Nt) * var(muestreado[muestreado$race.ethnicity == "group B", "math.score"])
+ (sum(SP$race.ethnicity == "group C") / Nt) * var(muestreado[muestreado$race.ethnicity == "group C", "math.score"])
+ (sum(SP$race.ethnicity == "group D") / Nt) * var(muestreado[muestreado$race.ethnicity == "group D", "math.score"])
+ (sum(SP$race.ethnicity == "group E") / Nt) * var(muestreado[muestreado$race.ethnicity == "group E", "math.score"]) )
n0 <- Var_st * (1 / (MargenError2 / Z2)^2)
n <- n0 / (1 + n0 / Nt)
data <- data.frame(
MargenError2=round(MargenError2, digits = 3),
num.datos=round(n, digits = 0)
)
data
## MargenError2 num.datos
## 1 0.000 NaN
## 2 0.324 867
## 3 0.647 620
## 4 0.971 420
## 5 1.295 290
## 6 1.618 207
## 7 1.942 153
## 8 2.266 118
## 9 2.589 93
## 10 2.913 75
## 11 3.237 61
## 12 3.561 51
## 13 3.884 43
## 14 4.208 37
## 15 4.532 32
## 16 4.855 28
## 17 5.179 25
## 18 5.503 22
## 19 5.826 20
## 20 6.150 18
## 21 6.474 16
# Grafico
library(ggplot2)
ggplot(data, aes(y = MargenError2, x = num.datos) ) +
geom_line() +
geom_point() +
labs(y = "Margen de Error", x = "Tamaño de muestra") +
ggtitle("Relación entre Margen de Error y Tamaño de muestra") +
theme_classic()+
geom_hline(yintercept = 2.331, colour="red", size=0.5)+
geom_vline(xintercept = 130, colour="blue", size=0.5)
n.estra <- 130 # numero de muestras seleccionado
sam.estra <- sample(nrow(SP), size = n.estra, replace = TRUE)
data.estratificada <- SP[sam.estra, ]
str(data.estratificada)
## 'data.frame': 130 obs. of 8 variables:
## $ gender : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 1 2 2 2 ...
## $ race.ethnicity : Factor w/ 5 levels "group A","group B",..: 2 4 4 3 5 4 2 4 3 3 ...
## $ parental.level.of.education: Factor w/ 6 levels "associate's degree",..: 4 6 3 1 3 2 2 3 2 4 ...
## $ lunch : Factor w/ 2 levels "free/reduced",..: 1 2 2 1 1 2 2 2 2 1 ...
## $ test.preparation.course : Factor w/ 2 levels "completed","none": 2 2 2 2 2 1 1 2 2 1 ...
## $ math.score : num 49 59 57 68 57 68 65 76 86 46 ...
## $ reading.score : num 53 67 50 65 58 74 81 73 83 42 ...
## $ writing.score : num 52 61 54 61 57 74 81 68 86 46 ...
# Resumen poblacion
summary(SP)
## gender race.ethnicity parental.level.of.education lunch
## female:518 group A: 89 associate's degree:222 free/reduced:355
## male :482 group B:190 bachelor's degree :118 standard :645
## group C:319 high school :196
## group D:262 master's degree : 59
## group E:140 some college :226
## some high school :179
## test.preparation.course math.score reading.score writing.score
## completed:358 Min. : 0.00 Min. : 17.00 Min. : 10.00
## none :642 1st Qu.: 57.00 1st Qu.: 59.00 1st Qu.: 57.75
## Median : 66.00 Median : 70.00 Median : 69.00
## Mean : 66.09 Mean : 69.17 Mean : 68.05
## 3rd Qu.: 77.00 3rd Qu.: 79.00 3rd Qu.: 79.00
## Max. :100.00 Max. :100.00 Max. :100.00
# Resumen muestra aleatoria simple
summary(data.MAS)
## gender race.ethnicity parental.level.of.education lunch
## female:120 group A:27 associate's degree:50 free/reduced: 77
## male :128 group B:48 bachelor's degree :24 standard :171
## group C:77 high school :47
## group D:56 master's degree :14
## group E:40 some college :52
## some high school :61
## test.preparation.course math.score reading.score writing.score
## completed: 93 Min. : 23.00 Min. : 24.00 Min. : 15.00
## none :155 1st Qu.: 59.00 1st Qu.: 58.00 1st Qu.: 56.75
## Median : 67.00 Median : 70.00 Median : 69.00
## Mean : 67.69 Mean : 69.88 Mean : 68.81
## 3rd Qu.: 78.00 3rd Qu.: 81.00 3rd Qu.: 80.00
## Max. :100.00 Max. :100.00 Max. :100.00
# Resumen muestra sistematica
summary(data.sistematico)
## gender race.ethnicity parental.level.of.education lunch
## female: 26 group A: 3 associate's degree: 12 free/reduced: 19
## male : 24 group B: 11 bachelor's degree : 8 standard : 31
## NA's :190 group C: 20 high school : 9 NA's :190
## group D: 6 master's degree : 3
## group E: 10 some college : 12
## NA's :190 some high school : 6
## NA's :190
## test.preparation.course math.score reading.score writing.score
## completed: 19 Min. :32.00 Min. :34.00 Min. :33.00
## none : 31 1st Qu.:57.25 1st Qu.:57.25 1st Qu.:58.25
## NA's :190 Median :67.50 Median :69.00 Median :69.50
## Mean :65.64 Mean :69.22 Mean :67.82
## 3rd Qu.:78.75 3rd Qu.:81.00 3rd Qu.:78.75
## Max. :97.00 Max. :99.00 Max. :96.00
## NA's :190 NA's :190 NA's :190
# Resumen muestra estratificada
summary(data.estratificada)
## gender race.ethnicity parental.level.of.education lunch
## female:60 group A:17 associate's degree:37 free/reduced:39
## male :70 group B:18 bachelor's degree :19 standard :91
## group C:41 high school :22
## group D:38 master's degree : 5
## group E:16 some college :30
## some high school :17
## test.preparation.course math.score reading.score writing.score
## completed:48 Min. : 28.00 Min. : 23.00 Min. : 19.00
## none :82 1st Qu.: 57.00 1st Qu.: 60.00 1st Qu.: 57.00
## Median : 66.50 Median : 72.00 Median : 70.00
## Mean : 66.48 Mean : 69.76 Mean : 68.27
## 3rd Qu.: 77.50 3rd Qu.: 81.00 3rd Qu.: 79.00
## Max. :100.00 Max. :100.00 Max. :100.00
# Resumen muestra estratificada por estratos
by(data = data.estratificada$math.score,
INDICES = data.estratificada$race.ethnicity,
FUN = summary)
## data.estratificada$race.ethnicity: group A
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.00 50.00 63.00 61.53 71.00 97.00
## ------------------------------------------------------------
## data.estratificada$race.ethnicity: group B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 38.00 57.25 65.00 64.83 71.50 88.00
## ------------------------------------------------------------
## data.estratificada$race.ethnicity: group C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 37.00 57.00 65.00 64.37 70.00 94.00
## ------------------------------------------------------------
## data.estratificada$race.ethnicity: group D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 57.25 68.50 66.42 78.75 93.00
## ------------------------------------------------------------
## data.estratificada$race.ethnicity: group E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 76.50 79.00 79.12 86.00 100.00
Seleccione una base de su interés donde contenga una variable cuantitativa y aplique los métodos para generar los estratificados. Seleccione la muestra adecuada y realice los cálculos de interés.