Dataset yang digunakan adalah Student Performance
(Mathematics).
Analisis ini bertujuan untuk melihat perbedaan prestasi siswa
berdasarkan school, dengan studytime
sebagai covariate, serta G1, G2, dan G3 sebagai
indikator nilai.
library(car)
library(biotools)
Bagian ini bertujuan untuk membaca data, memeriksa struktur data, dan menyesuaikan tipe variabel yang akan digunakan pada analisis.
data <- read.csv2(
"E:/FILE SEMESTER 4/DATA MULTIVARIAT/tugas 3_Modul Anova_Manova/student+performance/student/student-mat.csv",
header = TRUE
)
str(data)
## 'data.frame': 395 obs. of 33 variables:
## $ school : chr "GP" "GP" "GP" "GP" ...
## $ sex : chr "F" "F" "F" "F" ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : chr "U" "U" "U" "U" ...
## $ famsize : chr "GT3" "GT3" "LE3" "GT3" ...
## $ Pstatus : chr "A" "T" "T" "T" ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : chr "at_home" "at_home" "at_home" "health" ...
## $ Fjob : chr "teacher" "other" "other" "services" ...
## $ reason : chr "course" "course" "other" "home" ...
## $ guardian : chr "mother" "father" "mother" "mother" ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 3 0 0 0 0 0 0 0 ...
## $ schoolsup : chr "yes" "no" "yes" "no" ...
## $ famsup : chr "no" "yes" "no" "yes" ...
## $ paid : chr "no" "no" "yes" "yes" ...
## $ activities: chr "no" "no" "no" "yes" ...
## $ nursery : chr "yes" "no" "yes" "yes" ...
## $ higher : chr "yes" "yes" "yes" "yes" ...
## $ internet : chr "no" "yes" "yes" "yes" ...
## $ romantic : chr "no" "no" "no" "yes" ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 6 4 10 2 4 10 0 6 0 0 ...
## $ G1 : int 5 5 7 15 6 15 12 6 16 14 ...
## $ G2 : int 6 5 8 14 10 15 12 5 18 15 ...
## $ G3 : int 6 6 10 15 10 15 11 6 19 15 ...
head(data)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## 1 GP F 18 U GT3 A 4 4 at_home teacher course
## 2 GP F 17 U GT3 T 1 1 at_home other course
## 3 GP F 15 U LE3 T 1 1 at_home other other
## 4 GP F 15 U GT3 T 4 2 health services home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 services other reputation
## guardian traveltime studytime failures schoolsup famsup paid activities
## 1 mother 2 2 0 yes no no no
## 2 father 1 2 0 no yes no no
## 3 mother 1 2 3 yes no yes no
## 4 mother 1 3 0 no yes yes yes
## 5 father 1 2 0 no yes yes no
## 6 mother 1 2 0 no yes yes yes
## nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1 yes yes no no 4 3 4 1 1 3
## 2 no yes yes no 5 3 3 1 1 3
## 3 yes yes yes no 4 3 2 2 3 3
## 4 yes yes yes yes 3 2 2 1 1 5
## 5 yes yes no no 4 3 2 1 2 5
## 6 yes yes yes no 5 4 2 1 2 5
## absences G1 G2 G3
## 1 6 5 6 6
## 2 4 5 5 6
## 3 10 7 8 10
## 4 2 15 14 15
## 5 4 6 10 10
## 6 10 15 15 15
dim(data)
## [1] 395 33
names(data)
## [1] "school" "sex" "age" "address" "famsize"
## [6] "Pstatus" "Medu" "Fedu" "Mjob" "Fjob"
## [11] "reason" "guardian" "traveltime" "studytime" "failures"
## [16] "schoolsup" "famsup" "paid" "activities" "nursery"
## [21] "higher" "internet" "romantic" "famrel" "freetime"
## [26] "goout" "Dalc" "Walc" "health" "absences"
## [31] "G1" "G2" "G3"
data$school <- as.factor(data$school)
data$studytime <- as.numeric(data$studytime)
data$G1 <- as.numeric(data$G1)
data$G2 <- as.numeric(data$G2)
data$G3 <- as.numeric(data$G3)
str(data)
## 'data.frame': 395 obs. of 33 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : chr "F" "F" "F" "F" ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : chr "U" "U" "U" "U" ...
## $ famsize : chr "GT3" "GT3" "LE3" "GT3" ...
## $ Pstatus : chr "A" "T" "T" "T" ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : chr "at_home" "at_home" "at_home" "health" ...
## $ Fjob : chr "teacher" "other" "other" "services" ...
## $ reason : chr "course" "course" "other" "home" ...
## $ guardian : chr "mother" "father" "mother" "mother" ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : num 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 3 0 0 0 0 0 0 0 ...
## $ schoolsup : chr "yes" "no" "yes" "no" ...
## $ famsup : chr "no" "yes" "no" "yes" ...
## $ paid : chr "no" "no" "yes" "yes" ...
## $ activities: chr "no" "no" "no" "yes" ...
## $ nursery : chr "yes" "no" "yes" "yes" ...
## $ higher : chr "yes" "yes" "yes" "yes" ...
## $ internet : chr "no" "yes" "yes" "yes" ...
## $ romantic : chr "no" "no" "no" "yes" ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 6 4 10 2 4 10 0 6 0 0 ...
## $ G1 : num 5 5 7 15 6 15 12 6 16 14 ...
## $ G2 : num 6 5 8 14 10 15 12 5 18 15 ...
## $ G3 : num 6 6 10 15 10 15 11 6 19 15 ...
Bagian ini digunakan untuk melihat gambaran umum data sebelum dilakukan analisis lebih lanjut.
summary(data[, c("studytime", "G1", "G2", "G3")])
## studytime G1 G2 G3
## Min. :1.000 Min. : 3.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:1.000 1st Qu.: 8.00 1st Qu.: 9.00 1st Qu.: 8.00
## Median :2.000 Median :11.00 Median :11.00 Median :11.00
## Mean :2.035 Mean :10.91 Mean :10.71 Mean :10.42
## 3rd Qu.:2.000 3rd Qu.:13.00 3rd Qu.:13.00 3rd Qu.:14.00
## Max. :4.000 Max. :19.00 Max. :19.00 Max. :20.00
table(data$school)
##
## GP MS
## 349 46
aggregate(cbind(G1, G2, G3) ~ school, data = data, mean)
## school G1 G2 G3
## 1 GP 10.93983 10.78223 10.489971
## 2 MS 10.67391 10.19565 9.847826
aggregate(cbind(G1, G2, G3) ~ school, data = data, sd)
## school G1 G2 G3
## 1 GP 3.319109 3.808434 4.625397
## 2 MS 3.347001 3.377175 4.237229
ANOVA digunakan untuk mengetahui apakah terdapat perbedaan nilai akhir (G3) berdasarkan school.
Tujuan analisis ini adalah menguji apakah rata-rata G3 berbeda antara kelompok sekolah.
anova_model <- aov(G3 ~ school, data = data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 17 16.76 0.798 0.372
## Residuals 393 8253 21.00
shapiro.test(residuals(anova_model))
##
## Shapiro-Wilk normality test
##
## data: residuals(anova_model)
## W = 0.93032, p-value = 1.297e-12
leveneTest(G3 ~ school, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.8377 0.3606
## 393
boxplot(G3 ~ school, data = data,
col = c("skyblue", "pink"),
main = "Boxplot G3 berdasarkan School",
xlab = "School",
ylab = "Nilai Akhir (G3)")
ANCOVA digunakan untuk mengetahui apakah terdapat perbedaan nilai akhir (G3) berdasarkan school setelah pengaruh studytime dikontrol.
Tujuan analisis ini adalah menguji apakah perbedaan G3 antar sekolah tetap ada setelah mempertimbangkan studytime.
ancova_model <- aov(G3 ~ school + studytime, data = data)
summary(ancova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 17 16.76 0.803 0.3707
## studytime 1 73 73.27 3.511 0.0617 .
## Residuals 392 8180 20.87
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ancova_check <- aov(G3 ~ school * studytime, data = data)
summary(ancova_check)
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 17 16.76 0.807 0.3696
## studytime 1 73 73.27 3.528 0.0611 .
## school:studytime 1 59 59.43 2.862 0.0915 .
## Residuals 391 8120 20.77
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
shapiro.test(residuals(ancova_model))
##
## Shapiro-Wilk normality test
##
## data: residuals(ancova_model)
## W = 0.93742, p-value = 7.792e-12
leveneTest(G3 ~ school, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.8377 0.3606
## 393
cols <- c("blue", "red")
plot(data$studytime, data$G3,
col = cols[data$school],
pch = 19,
xlab = "Studytime",
ylab = "G3",
main = "Scatter Plot G3 dan Studytime berdasarkan School")
legend("topleft",
legend = c("GP", "MS"),
col = c("blue", "red"),
pch = 19)
abline(lm(G3 ~ studytime, data = subset(data, school == "GP")),
col = "blue", lwd = 2)
abline(lm(G3 ~ studytime, data = subset(data, school == "MS")),
col = "red", lwd = 2)
MANOVA digunakan untuk mengetahui apakah terdapat perbedaan gabungan G1, G2, dan G3 berdasarkan school.
Tujuan analisis ini adalah menguji apakah kombinasi nilai G1, G2, dan G3 berbeda antar sekolah.
manova_model <- manova(cbind(G1, G2, G3) ~ school, data = data)
summary(manova_model, test = "Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## school 1 0.99643 0.46712 3 391 0.7054
## Residuals 393
summary.aov(manova_model)
## Response G1 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 2.9 2.8739 0.2604 0.6102
## Residuals 393 4337.8 11.0378
##
## Response G2 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 14.0 13.984 0.9883 0.3208
## Residuals 393 5560.7 14.149
##
## Response G3 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 16.8 16.759 0.798 0.3722
## Residuals 393 8253.1 21.000
cor(data[, c("G1", "G2", "G3")])
## G1 G2 G3
## G1 1.0000000 0.8521181 0.8014679
## G2 0.8521181 1.0000000 0.9048680
## G3 0.8014679 0.9048680 1.0000000
boxM(data[, c("G1", "G2", "G3")], data$school)
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: data[, c("G1", "G2", "G3")]
## Chi-Sq (approx.) = 28.297, df = 6, p-value = 8.261e-05
par(mfrow = c(1, 3))
boxplot(G1 ~ school, data = data,
col = c("lightblue", "lightpink"),
main = "G1 berdasarkan School",
xlab = "School", ylab = "G1")
boxplot(G2 ~ school, data = data,
col = c("lightblue", "lightpink"),
main = "G2 berdasarkan School",
xlab = "School", ylab = "G2")
boxplot(G3 ~ school, data = data,
col = c("lightblue", "lightpink"),
main = "G3 berdasarkan School",
xlab = "School", ylab = "G3")
par(mfrow = c(1, 1))
MANCOVA digunakan untuk mengetahui apakah terdapat perbedaan gabungan G1, G2, dan G3 berdasarkan school setelah pengaruh studytime dikontrol.
Tujuan analisis ini adalah menguji apakah kombinasi nilai G1, G2, dan G3 berbeda antar sekolah setelah mempertimbangkan studytime.
mancova_model <- manova(cbind(G1, G2, G3) ~ school + studytime, data = data)
summary(mancova_model, test = "Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## school 1 0.99642 0.4675 3 390 0.705090
## studytime 1 0.96949 4.0908 3 390 0.007053 **
## Residuals 392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.aov(mancova_model)
## Response G1 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 2.9 2.874 0.2664 0.606020
## studytime 1 109.6 109.646 10.1654 0.001546 **
## Residuals 392 4228.2 10.786
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response G2 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 14.0 13.984 1.0033 0.317125
## studytime 1 97.0 96.959 6.9564 0.008684 **
## Residuals 392 5463.7 13.938
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response G3 :
## Df Sum Sq Mean Sq F value Pr(>F)
## school 1 16.8 16.759 0.8031 0.3707
## studytime 1 73.3 73.268 3.5112 0.0617 .
## Residuals 392 8179.9 20.867
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxM(data[, c("G1", "G2", "G3")], data$school)
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: data[, c("G1", "G2", "G3")]
## Chi-Sq (approx.) = 28.297, df = 6, p-value = 8.261e-05
par(mfrow = c(1, 3))
plot(data$studytime, data$G1,
col = cols[data$school], pch = 19,
main = "G1 vs Studytime",
xlab = "Studytime", ylab = "G1")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)
plot(data$studytime, data$G2,
col = cols[data$school], pch = 19,
main = "G2 vs Studytime",
xlab = "Studytime", ylab = "G2")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)
plot(data$studytime, data$G3,
col = cols[data$school], pch = 19,
main = "G3 vs Studytime",
xlab = "Studytime", ylab = "G3")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)
par(mfrow = c(1, 1))
Berdasarkan keempat metode tersebut, peneliti dapat menilai apakah terdapat perbedaan prestasi siswa berdasarkan school, baik untuk satu variabel dependen maupun beberapa variabel dependen sekaligus, dengan atau tanpa kontrol terhadap studytime.