ZEGEL IPAE
DESARROLLO DE SISTEMAS DE INFORMACION
Estadística Aplicada a la Computación
Informe Final
Autor: Benjamin Joseph Huaman Eusebio
2024
Los datos fueron obtenidos en una encuesta realizada a estudiantes de cursos de matemáticas y lengua portuguesa en la escuela secundaria. Contiene mucha información social, de género y de estudio interesante sobre los estudiantes.
El análisis propuesto comprende un caso específico relacionado con la educación en estas dos escuelas Gabriel Pereira y Mousinho de Silveira.
En este analisis se busca utilizar todo lo aprendido en este periodo academico y eso incluye:
La creación de tablas de frecuencia y la representación gráfica de datos. Se calcularán medidas estadísticas de tendencia y posición, y se abordará la presencia de datos faltantes y valores atípicos.
Se llevarán a cabo transformaciones y normalizaciones de variables para mejorar la calidad del análisis. Finalmente, se aplicarán modelos predictivos, incluyendo la regresión lineal y la regresión logística, con el objetivo de comprender relaciones entre las variables analizadas.
Datos obtenidos de las escuelas Gabriel Pereira y Mousinho de Silveira
Evaluación de la Relación entre el Apoyo Educativo y Familiar con las Calificaciones Finales de Estudiantes
En este estudio, se busca comprender la relación entre el rendimiento académico de los estudiantes en los cursos de matemáticas y portugués y tres variables específicas: el apoyo educativo adicional (schoolsup), el apoyo familiar (famsup) y la participación en clases extra pagadas (paid).
Evaluar cómo el apoyo educativo adicional (schoolsup), el apoyo familiar (famsup) y las clases extra pagadas (paid) se correlacionan con las calificaciones finales.
Los estudiantes de secundaria de las escuelas Gabriel Pereira y Mousinho de Silveira.
Los estudiantes que participaron en la encuesta, de los cursos de matemáticas y portugués, que se encuentran en la educación secundaria de las escuelas Gabriel Pereira y Mousinho de Silveira.
Un estudiante que participa en el curso de matematicas y/o portugues en la secundaria.
Estas calificaciones están relacionadas con la materia del curso, Matemáticas o Portugués:
# Unimos los datos que usaremos de Los estudiantes.
# Estudiantes del curso de Lenguaje portugues
d1=read.csv("student-por.csv")
# Estudiantes del curso de Matematicas
d2=read.csv("student-mat.csv")
#Cantidad de filas en el primer conjunto de datos
nrow(d1)
## [1] 649
#Cantidad de filas en el segundo conjunto de datos
nrow(d2)
## [1] 395
#Nombres de todos los datos
names(d1)
## [1] "school" "sex" "age" "address" "famsize"
## [6] "Pstatus" "Medu" "Fedu" "Mjob" "Fjob"
## [11] "reason" "guardian" "traveltime" "studytime" "failures"
## [16] "schoolsup" "famsup" "paid" "activities" "nursery"
## [21] "higher" "internet" "romantic" "famrel" "freetime"
## [26] "goout" "Dalc" "Walc" "health" "absences"
## [31] "G1" "G2" "G3"
d3<-rbind(d1,d2)
#Mostramos los primeros datos
head(d3)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## 1 GP F 18 U GT3 A 4 4 at_home teacher course
## 2 GP F 17 U GT3 T 1 1 at_home other course
## 3 GP F 15 U LE3 T 1 1 at_home other other
## 4 GP F 15 U GT3 T 4 2 health services home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 services other reputation
## guardian traveltime studytime failures schoolsup famsup paid activities
## 1 mother 2 2 0 yes no no no
## 2 father 1 2 0 no yes no no
## 3 mother 1 2 0 yes no no no
## 4 mother 1 3 0 no yes no yes
## 5 father 1 2 0 no yes no no
## 6 mother 1 2 0 no yes no yes
## nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1 yes yes no no 4 3 4 1 1 3
## 2 no yes yes no 5 3 3 1 1 3
## 3 yes yes yes no 4 3 2 2 3 3
## 4 yes yes yes yes 3 2 2 1 1 5
## 5 yes yes no no 4 3 2 1 2 5
## 6 yes yes yes no 5 4 2 1 2 5
## absences G1 G2 G3
## 1 4 0 11 11
## 2 2 9 11 11
## 3 6 12 13 12
## 4 0 14 14 14
## 5 0 11 13 13
## 6 6 12 12 13
# Cargamos las librerías
#Tablas de frecuencia para datos cuantitativos y cualitativos
library(epiDisplay)
#Tablas de frecuencia para datos cuantitativos
library(agricolae)
tab1(d3$school, graph =FALSE)
## d3$school :
## Frequency Percent Cum. percent
## GP 772 73.9 73.9
## MS 272 26.1 100.0
## Total 1044 100.0 100.0
tab1(d3$age, graph =FALSE)
## d3$age :
## Frequency Percent Cum. percent
## 15 194 18.6 18.6
## 16 281 26.9 45.5
## 17 277 26.5 72.0
## 18 222 21.3 93.3
## 19 56 5.4 98.7
## 20 9 0.9 99.5
## 21 3 0.3 99.8
## 22 2 0.2 100.0
## Total 1044 100.0 100.0
(table.freq(hist(d3$age , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 15.0 15.5 15.25 194 18.6 194 18.6
## 2 15.5 16.0 15.75 281 26.9 475 45.5
## 3 16.0 16.5 16.25 0 0.0 475 45.5
## 4 16.5 17.0 16.75 277 26.5 752 72.0
## 5 17.0 17.5 17.25 0 0.0 752 72.0
## 6 17.5 18.0 17.75 222 21.3 974 93.3
## 7 18.0 18.5 18.25 0 0.0 974 93.3
## 8 18.5 19.0 18.75 56 5.4 1030 98.7
## 9 19.0 19.5 19.25 0 0.0 1030 98.7
## 10 19.5 20.0 19.75 9 0.9 1039 99.5
## 11 20.0 20.5 20.25 0 0.0 1039 99.5
## 12 20.5 21.0 20.75 3 0.3 1042 99.8
## 13 21.0 21.5 21.25 0 0.0 1042 99.8
## 14 21.5 22.0 21.75 2 0.2 1044 100.0
tab1(d3$address, graph =FALSE)
## d3$address :
## Frequency Percent Cum. percent
## R 285 27.3 27.3
## U 759 72.7 100.0
## Total 1044 100.0 100.0
tab1(d3$famsize, graph =FALSE)
## d3$famsize :
## Frequency Percent Cum. percent
## GT3 738 70.7 70.7
## LE3 306 29.3 100.0
## Total 1044 100.0 100.0
tab1(d3$Pstatus, graph =FALSE)
## d3$Pstatus :
## Frequency Percent Cum. percent
## A 121 11.6 11.6
## T 923 88.4 100.0
## Total 1044 100.0 100.0
(table.freq(hist(d3$Medu , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0.0 0.5 0.25 9 0.9 9 0.9
## 2 0.5 1.0 0.75 202 19.3 211 20.2
## 3 1.0 1.5 1.25 0 0.0 211 20.2
## 4 1.5 2.0 1.75 289 27.7 500 47.9
## 5 2.0 2.5 2.25 0 0.0 500 47.9
## 6 2.5 3.0 2.75 238 22.8 738 70.7
## 7 3.0 3.5 3.25 0 0.0 738 70.7
## 8 3.5 4.0 3.75 306 29.3 1044 100.0
(table.freq(hist(d3$Fedu , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0.0 0.5 0.25 9 0.9 9 0.9
## 2 0.5 1.0 0.75 256 24.5 265 25.4
## 3 1.0 1.5 1.25 0 0.0 265 25.4
## 4 1.5 2.0 1.75 324 31.0 589 56.4
## 5 2.0 2.5 2.25 0 0.0 589 56.4
## 6 2.5 3.0 2.75 231 22.1 820 78.5
## 7 3.0 3.5 3.25 0 0.0 820 78.5
## 8 3.5 4.0 3.75 224 21.5 1044 100.0
tab1(d3$Mjob, graph =FALSE)
## d3$Mjob :
## Frequency Percent Cum. percent
## at_home 194 18.6 18.6
## health 82 7.9 26.4
## other 399 38.2 64.7
## services 239 22.9 87.5
## teacher 130 12.5 100.0
## Total 1044 100.0 100.0
tab1(d3$Fjob, graph =FALSE)
## d3$Fjob :
## Frequency Percent Cum. percent
## at_home 62 5.9 5.9
## health 41 3.9 9.9
## other 584 55.9 65.8
## services 292 28.0 93.8
## teacher 65 6.2 100.0
## Total 1044 100.0 100.0
tab1(d3$reason, graph =FALSE)
## d3$reason :
## Frequency Percent Cum. percent
## course 430 41.2 41.2
## home 258 24.7 65.9
## other 108 10.3 76.2
## reputation 248 23.8 100.0
## Total 1044 100.0 100.0
tab1(d3$guardian, graph =FALSE)
## d3$guardian :
## Frequency Percent Cum. percent
## father 243 23.3 23.3
## mother 728 69.7 93.0
## other 73 7.0 100.0
## Total 1044 100.0 100.0
(table.freq(hist(d3$traveltime , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.2 1.1 623 59.7 623 59.7
## 2 1.2 1.4 1.3 0 0.0 623 59.7
## 3 1.4 1.6 1.5 0 0.0 623 59.7
## 4 1.6 1.8 1.7 0 0.0 623 59.7
## 5 1.8 2.0 1.9 320 30.7 943 90.3
## 6 2.0 2.2 2.1 0 0.0 943 90.3
## 7 2.2 2.4 2.3 0 0.0 943 90.3
## 8 2.4 2.6 2.5 0 0.0 943 90.3
## 9 2.6 2.8 2.7 0 0.0 943 90.3
## 10 2.8 3.0 2.9 77 7.4 1020 97.7
## 11 3.0 3.2 3.1 0 0.0 1020 97.7
## 12 3.2 3.4 3.3 0 0.0 1020 97.7
## 13 3.4 3.6 3.5 0 0.0 1020 97.7
## 14 3.6 3.8 3.7 0 0.0 1020 97.7
## 15 3.8 4.0 3.9 24 2.3 1044 100.0
(table.freq(hist(d3$studytime , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.2 1.1 317 30.4 317 30.4
## 2 1.2 1.4 1.3 0 0.0 317 30.4
## 3 1.4 1.6 1.5 0 0.0 317 30.4
## 4 1.6 1.8 1.7 0 0.0 317 30.4
## 5 1.8 2.0 1.9 503 48.2 820 78.5
## 6 2.0 2.2 2.1 0 0.0 820 78.5
## 7 2.2 2.4 2.3 0 0.0 820 78.5
## 8 2.4 2.6 2.5 0 0.0 820 78.5
## 9 2.6 2.8 2.7 0 0.0 820 78.5
## 10 2.8 3.0 2.9 162 15.5 982 94.1
## 11 3.0 3.2 3.1 0 0.0 982 94.1
## 12 3.2 3.4 3.3 0 0.0 982 94.1
## 13 3.4 3.6 3.5 0 0.0 982 94.1
## 14 3.6 3.8 3.7 0 0.0 982 94.1
## 15 3.8 4.0 3.9 62 5.9 1044 100.0
(table.freq(hist(d3$failures , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0.0 0.2 0.1 861 82.5 861 82.5
## 2 0.2 0.4 0.3 0 0.0 861 82.5
## 3 0.4 0.6 0.5 0 0.0 861 82.5
## 4 0.6 0.8 0.7 0 0.0 861 82.5
## 5 0.8 1.0 0.9 120 11.5 981 94.0
## 6 1.0 1.2 1.1 0 0.0 981 94.0
## 7 1.2 1.4 1.3 0 0.0 981 94.0
## 8 1.4 1.6 1.5 0 0.0 981 94.0
## 9 1.6 1.8 1.7 0 0.0 981 94.0
## 10 1.8 2.0 1.9 33 3.2 1014 97.1
## 11 2.0 2.2 2.1 0 0.0 1014 97.1
## 12 2.2 2.4 2.3 0 0.0 1014 97.1
## 13 2.4 2.6 2.5 0 0.0 1014 97.1
## 14 2.6 2.8 2.7 0 0.0 1014 97.1
## 15 2.8 3.0 2.9 30 2.9 1044 100.0
tab1(d3$schoolsup, graph =FALSE)
## d3$schoolsup :
## Frequency Percent Cum. percent
## no 925 88.6 88.6
## yes 119 11.4 100.0
## Total 1044 100.0 100.0
tab1(d3$famsup, graph =FALSE)
## d3$famsup :
## Frequency Percent Cum. percent
## no 404 38.7 38.7
## yes 640 61.3 100.0
## Total 1044 100.0 100.0
tab1(d3$paid, graph =FALSE)
## d3$paid :
## Frequency Percent Cum. percent
## no 824 78.9 78.9
## yes 220 21.1 100.0
## Total 1044 100.0 100.0
tab1(d3$activities, graph =FALSE)
## d3$activities :
## Frequency Percent Cum. percent
## no 528 50.6 50.6
## yes 516 49.4 100.0
## Total 1044 100.0 100.0
tab1(d3$nursery, graph =FALSE)
## d3$nursery :
## Frequency Percent Cum. percent
## no 209 20 20
## yes 835 80 100
## Total 1044 100 100
tab1(d3$higher, graph =FALSE)
## d3$higher :
## Frequency Percent Cum. percent
## no 89 8.5 8.5
## yes 955 91.5 100.0
## Total 1044 100.0 100.0
tab1(d3$internet, graph =FALSE)
## d3$internet :
## Frequency Percent Cum. percent
## no 217 20.8 20.8
## yes 827 79.2 100.0
## Total 1044 100.0 100.0
tab1(d3$romantic, graph =FALSE)
## d3$romantic :
## Frequency Percent Cum. percent
## no 673 64.5 64.5
## yes 371 35.5 100.0
## Total 1044 100.0 100.0
(table.freq(hist(d3$famrel , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 30 2.9 30 2.9
## 2 1.5 2.0 1.75 47 4.5 77 7.4
## 3 2.0 2.5 2.25 0 0.0 77 7.4
## 4 2.5 3.0 2.75 169 16.2 246 23.6
## 5 3.0 3.5 3.25 0 0.0 246 23.6
## 6 3.5 4.0 3.75 512 49.0 758 72.6
## 7 4.0 4.5 4.25 0 0.0 758 72.6
## 8 4.5 5.0 4.75 286 27.4 1044 100.0
(table.freq(hist(d3$freetime , plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 64 6.1 64 6.1
## 2 1.5 2.0 1.75 171 16.4 235 22.5
## 3 2.0 2.5 2.25 0 0.0 235 22.5
## 4 2.5 3.0 2.75 408 39.1 643 61.6
## 5 3.0 3.5 3.25 0 0.0 643 61.6
## 6 3.5 4.0 3.75 293 28.1 936 89.7
## 7 4.0 4.5 4.25 0 0.0 936 89.7
## 8 4.5 5.0 4.75 108 10.3 1044 100.0
(table.freq(hist(d3$goout, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 71 6.8 71 6.8
## 2 1.5 2.0 1.75 248 23.8 319 30.6
## 3 2.0 2.5 2.25 0 0.0 319 30.6
## 4 2.5 3.0 2.75 335 32.1 654 62.6
## 5 3.0 3.5 3.25 0 0.0 654 62.6
## 6 3.5 4.0 3.75 227 21.7 881 84.4
## 7 4.0 4.5 4.25 0 0.0 881 84.4
## 8 4.5 5.0 4.75 163 15.6 1044 100.0
(table.freq(hist(d3$Dalc, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 727 69.6 727 69.6
## 2 1.5 2.0 1.75 196 18.8 923 88.4
## 3 2.0 2.5 2.25 0 0.0 923 88.4
## 4 2.5 3.0 2.75 69 6.6 992 95.0
## 5 3.0 3.5 3.25 0 0.0 992 95.0
## 6 3.5 4.0 3.75 26 2.5 1018 97.5
## 7 4.0 4.5 4.25 0 0.0 1018 97.5
## 8 4.5 5.0 4.75 26 2.5 1044 100.0
(table.freq(hist(d3$Walc, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 398 38.1 398 38.1
## 2 1.5 2.0 1.75 235 22.5 633 60.6
## 3 2.0 2.5 2.25 0 0.0 633 60.6
## 4 2.5 3.0 2.75 200 19.2 833 79.8
## 5 3.0 3.5 3.25 0 0.0 833 79.8
## 6 3.5 4.0 3.75 138 13.2 971 93.0
## 7 4.0 4.5 4.25 0 0.0 971 93.0
## 8 4.5 5.0 4.75 73 7.0 1044 100.0
(table.freq(hist(d3$health, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 1.0 1.5 1.25 137 13.1 137 13.1
## 2 1.5 2.0 1.75 123 11.8 260 24.9
## 3 2.0 2.5 2.25 0 0.0 260 24.9
## 4 2.5 3.0 2.75 215 20.6 475 45.5
## 5 3.0 3.5 3.25 0 0.0 475 45.5
## 6 3.5 4.0 3.75 174 16.7 649 62.2
## 7 4.0 4.5 4.25 0 0.0 649 62.2
## 8 4.5 5.0 4.75 395 37.8 1044 100.0
(table.freq(hist(d3$absences, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0 5 2.5 727 69.6 727 69.6
## 2 5 10 7.5 202 19.3 929 89.0
## 3 10 15 12.5 61 5.8 990 94.8
## 4 15 20 17.5 31 3.0 1021 97.8
## 5 20 25 22.5 12 1.1 1033 98.9
## 6 25 30 27.5 5 0.5 1038 99.4
## 7 30 35 32.5 1 0.1 1039 99.5
## 8 35 40 37.5 2 0.2 1041 99.7
## 9 40 45 42.5 0 0.0 1041 99.7
## 10 45 50 47.5 0 0.0 1041 99.7
## 11 50 55 52.5 1 0.1 1042 99.8
## 12 55 60 57.5 1 0.1 1043 99.9
## 13 60 65 62.5 0 0.0 1043 99.9
## 14 65 70 67.5 0 0.0 1043 99.9
## 15 70 75 72.5 1 0.1 1044 100.0
(table.freq(hist(d3$G1, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0 2 1 1 0.1 1 0.1
## 2 2 4 3 4 0.4 5 0.5
## 3 4 6 5 45 4.3 50 4.8
## 4 6 8 7 153 14.7 203 19.4
## 5 8 10 9 242 23.2 445 42.6
## 6 10 12 11 247 23.7 692 66.3
## 7 12 14 13 206 19.7 898 86.0
## 8 14 16 15 103 9.9 1001 95.9
## 9 16 18 17 39 3.7 1040 99.6
## 10 18 20 19 4 0.4 1044 100.0
(table.freq(hist(d3$G2, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0 2 1 20 1.9 20 1.9
## 2 2 4 3 1 0.1 21 2.0
## 3 4 6 5 39 3.7 60 5.7
## 4 6 8 7 109 10.4 169 16.2
## 5 8 10 9 251 24.0 420 40.2
## 6 10 12 11 265 25.4 685 65.6
## 7 12 14 13 194 18.6 879 84.2
## 8 14 16 15 110 10.5 989 94.7
## 9 16 18 17 51 4.9 1040 99.6
## 10 18 20 19 4 0.4 1044 100.0
(table.freq(hist(d3$G3, plot = FALSE, breaks = "Sturges")))
## Lower Upper Main Frequency Percentage CF CPF
## 1 0 2 1 54 5.2 54 5.2
## 2 2 4 3 1 0.1 55 5.3
## 3 4 6 5 26 2.5 81 7.8
## 4 6 8 7 86 8.2 167 16.0
## 5 8 10 9 216 20.7 383 36.7
## 6 10 12 11 254 24.3 637 61.0
## 7 12 14 13 203 19.4 840 80.5
## 8 14 16 15 134 12.8 974 93.3
## 9 16 18 17 62 5.9 1036 99.2
## 10 18 20 19 8 0.8 1044 100.0