Se trabaja un prueba de modelos avanzados…
require(ggplot2)
require(sandwich)
require(msm)
library("ggplot2")
library("sandwich")
library("msm")
Cargo de datos…
p <- read.csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv")
#p
summary(p)
id num_awards prog
Min. : 1.00 Min. :0.00 Min. :1.000
1st Qu.: 50.75 1st Qu.:0.00 1st Qu.:2.000
Median :100.50 Median :0.00 Median :2.000
Mean :100.50 Mean :0.63 Mean :2.025
3rd Qu.:150.25 3rd Qu.:1.00 3rd Qu.:2.250
Max. :200.00 Max. :6.00 Max. :3.000
math
Min. :33.00
1st Qu.:45.00
Median :52.00
Mean :52.65
3rd Qu.:59.00
Max. :75.00
Cambio de tipo de dato para las columnas
p <- within(p, {
prog <- factor(prog, levels=1:3, labels=c("General", "Academic",
"Vocational"))
id <- factor(id)
})
summary(p)
id num_awards prog math
1 : 1 Min. :0.00 General : 45 Min. :33.00
2 : 1 1st Qu.:0.00 Academic :105 1st Qu.:45.00
3 : 1 Median :0.00 Vocational: 50 Median :52.00
4 : 1 Mean :0.63 Mean :52.65
5 : 1 3rd Qu.:1.00 3rd Qu.:59.00
6 : 1 Max. :6.00 Max. :75.00
(Other):194
EDA: La siguiente tabla muestra el número promedio de premios por tipo de programa y otros EDA…
with(p, tapply(num_awards, prog, function(x) {
sprintf("M (SD) = %1.2f (%1.2f)", mean(x), sd(x))
}))
General Academic
"M (SD) = 0.20 (0.40)" "M (SD) = 1.00 (1.28)"
Vocational
"M (SD) = 0.24 (0.52)"
EDA: Histograma: Eje X: Nro de premios, Eje Y: Cantidad de Personas, Color: Programa (prog)
ggplot(p, aes(num_awards, fill = prog)) +
geom_histogram(binwidth=.5, position="dodge")
MODELAR A CONTINUACION.