Pendahuluan

Dataset yang digunakan adalah Student Performance (Mathematics).
Analisis ini bertujuan untuk melihat perbedaan prestasi siswa berdasarkan school, dengan studytime sebagai covariate, serta G1, G2, dan G3 sebagai indikator nilai.

Setup

library(car)
library(biotools)

Membaca dan Menyiapkan Data

Bagian ini bertujuan untuk membaca data, memeriksa struktur data, dan menyesuaikan tipe variabel yang akan digunakan pada analisis.

data <- read.csv2(
  "E:/FILE SEMESTER 4/DATA MULTIVARIAT/tugas 3_Modul Anova_Manova/student+performance/student/student-mat.csv",
  header = TRUE
)

str(data)
## 'data.frame':    395 obs. of  33 variables:
##  $ school    : chr  "GP" "GP" "GP" "GP" ...
##  $ sex       : chr  "F" "F" "F" "F" ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : chr  "U" "U" "U" "U" ...
##  $ famsize   : chr  "GT3" "GT3" "LE3" "GT3" ...
##  $ Pstatus   : chr  "A" "T" "T" "T" ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : chr  "at_home" "at_home" "at_home" "health" ...
##  $ Fjob      : chr  "teacher" "other" "other" "services" ...
##  $ reason    : chr  "course" "course" "other" "home" ...
##  $ guardian  : chr  "mother" "father" "mother" "mother" ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 3 0 0 0 0 0 0 0 ...
##  $ schoolsup : chr  "yes" "no" "yes" "no" ...
##  $ famsup    : chr  "no" "yes" "no" "yes" ...
##  $ paid      : chr  "no" "no" "yes" "yes" ...
##  $ activities: chr  "no" "no" "no" "yes" ...
##  $ nursery   : chr  "yes" "no" "yes" "yes" ...
##  $ higher    : chr  "yes" "yes" "yes" "yes" ...
##  $ internet  : chr  "no" "yes" "yes" "yes" ...
##  $ romantic  : chr  "no" "no" "no" "yes" ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  6 4 10 2 4 10 0 6 0 0 ...
##  $ G1        : int  5 5 7 15 6 15 12 6 16 14 ...
##  $ G2        : int  6 5 8 14 10 15 12 5 18 15 ...
##  $ G3        : int  6 6 10 15 10 15 11 6 19 15 ...
head(data)
##   school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob     reason
## 1     GP   F  18       U     GT3       A    4    4  at_home  teacher     course
## 2     GP   F  17       U     GT3       T    1    1  at_home    other     course
## 3     GP   F  15       U     LE3       T    1    1  at_home    other      other
## 4     GP   F  15       U     GT3       T    4    2   health services       home
## 5     GP   F  16       U     GT3       T    3    3    other    other       home
## 6     GP   M  16       U     LE3       T    4    3 services    other reputation
##   guardian traveltime studytime failures schoolsup famsup paid activities
## 1   mother          2         2        0       yes     no   no         no
## 2   father          1         2        0        no    yes   no         no
## 3   mother          1         2        3       yes     no  yes         no
## 4   mother          1         3        0        no    yes  yes        yes
## 5   father          1         2        0        no    yes  yes         no
## 6   mother          1         2        0        no    yes  yes        yes
##   nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1     yes    yes       no       no      4        3     4    1    1      3
## 2      no    yes      yes       no      5        3     3    1    1      3
## 3     yes    yes      yes       no      4        3     2    2    3      3
## 4     yes    yes      yes      yes      3        2     2    1    1      5
## 5     yes    yes       no       no      4        3     2    1    2      5
## 6     yes    yes      yes       no      5        4     2    1    2      5
##   absences G1 G2 G3
## 1        6  5  6  6
## 2        4  5  5  6
## 3       10  7  8 10
## 4        2 15 14 15
## 5        4  6 10 10
## 6       10 15 15 15
dim(data)
## [1] 395  33
names(data)
##  [1] "school"     "sex"        "age"        "address"    "famsize"   
##  [6] "Pstatus"    "Medu"       "Fedu"       "Mjob"       "Fjob"      
## [11] "reason"     "guardian"   "traveltime" "studytime"  "failures"  
## [16] "schoolsup"  "famsup"     "paid"       "activities" "nursery"   
## [21] "higher"     "internet"   "romantic"   "famrel"     "freetime"  
## [26] "goout"      "Dalc"       "Walc"       "health"     "absences"  
## [31] "G1"         "G2"         "G3"
data$school <- as.factor(data$school)
data$studytime <- as.numeric(data$studytime)
data$G1 <- as.numeric(data$G1)
data$G2 <- as.numeric(data$G2)
data$G3 <- as.numeric(data$G3)

str(data)
## 'data.frame':    395 obs. of  33 variables:
##  $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex       : chr  "F" "F" "F" "F" ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : chr  "U" "U" "U" "U" ...
##  $ famsize   : chr  "GT3" "GT3" "LE3" "GT3" ...
##  $ Pstatus   : chr  "A" "T" "T" "T" ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : chr  "at_home" "at_home" "at_home" "health" ...
##  $ Fjob      : chr  "teacher" "other" "other" "services" ...
##  $ reason    : chr  "course" "course" "other" "home" ...
##  $ guardian  : chr  "mother" "father" "mother" "mother" ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : num  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 3 0 0 0 0 0 0 0 ...
##  $ schoolsup : chr  "yes" "no" "yes" "no" ...
##  $ famsup    : chr  "no" "yes" "no" "yes" ...
##  $ paid      : chr  "no" "no" "yes" "yes" ...
##  $ activities: chr  "no" "no" "no" "yes" ...
##  $ nursery   : chr  "yes" "no" "yes" "yes" ...
##  $ higher    : chr  "yes" "yes" "yes" "yes" ...
##  $ internet  : chr  "no" "yes" "yes" "yes" ...
##  $ romantic  : chr  "no" "no" "no" "yes" ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  6 4 10 2 4 10 0 6 0 0 ...
##  $ G1        : num  5 5 7 15 6 15 12 6 16 14 ...
##  $ G2        : num  6 5 8 14 10 15 12 5 18 15 ...
##  $ G3        : num  6 6 10 15 10 15 11 6 19 15 ...

Deskripsi Data

Bagian ini digunakan untuk melihat gambaran umum data sebelum dilakukan analisis lebih lanjut.

summary(data[, c("studytime", "G1", "G2", "G3")])
##    studytime           G1              G2              G3       
##  Min.   :1.000   Min.   : 3.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.:1.000   1st Qu.: 8.00   1st Qu.: 9.00   1st Qu.: 8.00  
##  Median :2.000   Median :11.00   Median :11.00   Median :11.00  
##  Mean   :2.035   Mean   :10.91   Mean   :10.71   Mean   :10.42  
##  3rd Qu.:2.000   3rd Qu.:13.00   3rd Qu.:13.00   3rd Qu.:14.00  
##  Max.   :4.000   Max.   :19.00   Max.   :19.00   Max.   :20.00
table(data$school)
## 
##  GP  MS 
## 349  46
aggregate(cbind(G1, G2, G3) ~ school, data = data, mean)
##   school       G1       G2        G3
## 1     GP 10.93983 10.78223 10.489971
## 2     MS 10.67391 10.19565  9.847826
aggregate(cbind(G1, G2, G3) ~ school, data = data, sd)
##   school       G1       G2       G3
## 1     GP 3.319109 3.808434 4.625397
## 2     MS 3.347001 3.377175 4.237229

1. ANOVA

Deskripsi

ANOVA digunakan untuk mengetahui apakah terdapat perbedaan nilai akhir (G3) berdasarkan school.

Tujuan

Tujuan analisis ini adalah menguji apakah rata-rata G3 berbeda antara kelompok sekolah.

Kode Analisis

anova_model <- aov(G3 ~ school, data = data)
summary(anova_model)
##              Df Sum Sq Mean Sq F value Pr(>F)
## school        1     17   16.76   0.798  0.372
## Residuals   393   8253   21.00

Uji Asumsi

shapiro.test(residuals(anova_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(anova_model)
## W = 0.93032, p-value = 1.297e-12
leveneTest(G3 ~ school, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.8377 0.3606
##       393

Visualisasi

boxplot(G3 ~ school, data = data,
        col = c("skyblue", "pink"),
        main = "Boxplot G3 berdasarkan School",
        xlab = "School",
        ylab = "Nilai Akhir (G3)")

2. ANCOVA

Deskripsi

ANCOVA digunakan untuk mengetahui apakah terdapat perbedaan nilai akhir (G3) berdasarkan school setelah pengaruh studytime dikontrol.

Tujuan

Tujuan analisis ini adalah menguji apakah perbedaan G3 antar sekolah tetap ada setelah mempertimbangkan studytime.

Kode Analisis

ancova_model <- aov(G3 ~ school + studytime, data = data)
summary(ancova_model)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## school        1     17   16.76   0.803 0.3707  
## studytime     1     73   73.27   3.511 0.0617 .
## Residuals   392   8180   20.87                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Uji Asumsi

ancova_check <- aov(G3 ~ school * studytime, data = data)
summary(ancova_check)
##                   Df Sum Sq Mean Sq F value Pr(>F)  
## school             1     17   16.76   0.807 0.3696  
## studytime          1     73   73.27   3.528 0.0611 .
## school:studytime   1     59   59.43   2.862 0.0915 .
## Residuals        391   8120   20.77                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
shapiro.test(residuals(ancova_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ancova_model)
## W = 0.93742, p-value = 7.792e-12
leveneTest(G3 ~ school, data = data)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.8377 0.3606
##       393

Visualisasi

cols <- c("blue", "red")

plot(data$studytime, data$G3,
     col = cols[data$school],
     pch = 19,
     xlab = "Studytime",
     ylab = "G3",
     main = "Scatter Plot G3 dan Studytime berdasarkan School")

legend("topleft",
       legend = c("GP", "MS"),
       col = c("blue", "red"),
       pch = 19)

abline(lm(G3 ~ studytime, data = subset(data, school == "GP")),
       col = "blue", lwd = 2)

abline(lm(G3 ~ studytime, data = subset(data, school == "MS")),
       col = "red", lwd = 2)

3. MANOVA

Deskripsi

MANOVA digunakan untuk mengetahui apakah terdapat perbedaan gabungan G1, G2, dan G3 berdasarkan school.

Tujuan

Tujuan analisis ini adalah menguji apakah kombinasi nilai G1, G2, dan G3 berbeda antar sekolah.

Kode Analisis

manova_model <- manova(cbind(G1, G2, G3) ~ school, data = data)
summary(manova_model, test = "Wilks")
##            Df   Wilks approx F num Df den Df Pr(>F)
## school      1 0.99643  0.46712      3    391 0.7054
## Residuals 393
summary.aov(manova_model)
##  Response G1 :
##              Df Sum Sq Mean Sq F value Pr(>F)
## school        1    2.9  2.8739  0.2604 0.6102
## Residuals   393 4337.8 11.0378               
## 
##  Response G2 :
##              Df Sum Sq Mean Sq F value Pr(>F)
## school        1   14.0  13.984  0.9883 0.3208
## Residuals   393 5560.7  14.149               
## 
##  Response G3 :
##              Df Sum Sq Mean Sq F value Pr(>F)
## school        1   16.8  16.759   0.798 0.3722
## Residuals   393 8253.1  21.000

Uji Asumsi

cor(data[, c("G1", "G2", "G3")])
##           G1        G2        G3
## G1 1.0000000 0.8521181 0.8014679
## G2 0.8521181 1.0000000 0.9048680
## G3 0.8014679 0.9048680 1.0000000
boxM(data[, c("G1", "G2", "G3")], data$school)
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  data[, c("G1", "G2", "G3")]
## Chi-Sq (approx.) = 28.297, df = 6, p-value = 8.261e-05

Visualisasi

par(mfrow = c(1, 3))

boxplot(G1 ~ school, data = data,
        col = c("lightblue", "lightpink"),
        main = "G1 berdasarkan School",
        xlab = "School", ylab = "G1")

boxplot(G2 ~ school, data = data,
        col = c("lightblue", "lightpink"),
        main = "G2 berdasarkan School",
        xlab = "School", ylab = "G2")

boxplot(G3 ~ school, data = data,
        col = c("lightblue", "lightpink"),
        main = "G3 berdasarkan School",
        xlab = "School", ylab = "G3")

par(mfrow = c(1, 1))

4. MANCOVA

Deskripsi

MANCOVA digunakan untuk mengetahui apakah terdapat perbedaan gabungan G1, G2, dan G3 berdasarkan school setelah pengaruh studytime dikontrol.

Tujuan

Tujuan analisis ini adalah menguji apakah kombinasi nilai G1, G2, dan G3 berbeda antar sekolah setelah mempertimbangkan studytime.

Kode Analisis

mancova_model <- manova(cbind(G1, G2, G3) ~ school + studytime, data = data)
summary(mancova_model, test = "Wilks")
##            Df   Wilks approx F num Df den Df   Pr(>F)   
## school      1 0.99642   0.4675      3    390 0.705090   
## studytime   1 0.96949   4.0908      3    390 0.007053 **
## Residuals 392                                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.aov(mancova_model)
##  Response G1 :
##              Df Sum Sq Mean Sq F value   Pr(>F)   
## school        1    2.9   2.874  0.2664 0.606020   
## studytime     1  109.6 109.646 10.1654 0.001546 **
## Residuals   392 4228.2  10.786                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response G2 :
##              Df Sum Sq Mean Sq F value   Pr(>F)   
## school        1   14.0  13.984  1.0033 0.317125   
## studytime     1   97.0  96.959  6.9564 0.008684 **
## Residuals   392 5463.7  13.938                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response G3 :
##              Df Sum Sq Mean Sq F value Pr(>F)  
## school        1   16.8  16.759  0.8031 0.3707  
## studytime     1   73.3  73.268  3.5112 0.0617 .
## Residuals   392 8179.9  20.867                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Uji Asumsi

boxM(data[, c("G1", "G2", "G3")], data$school)
## 
##  Box's M-test for Homogeneity of Covariance Matrices
## 
## data:  data[, c("G1", "G2", "G3")]
## Chi-Sq (approx.) = 28.297, df = 6, p-value = 8.261e-05

Visualisasi

par(mfrow = c(1, 3))

plot(data$studytime, data$G1,
     col = cols[data$school], pch = 19,
     main = "G1 vs Studytime",
     xlab = "Studytime", ylab = "G1")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)

plot(data$studytime, data$G2,
     col = cols[data$school], pch = 19,
     main = "G2 vs Studytime",
     xlab = "Studytime", ylab = "G2")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)

plot(data$studytime, data$G3,
     col = cols[data$school], pch = 19,
     main = "G3 vs Studytime",
     xlab = "Studytime", ylab = "G3")
legend("topleft", legend = levels(data$school), col = cols, pch = 19)

par(mfrow = c(1, 1))

Penutup

Berdasarkan keempat metode tersebut, peneliti dapat menilai apakah terdapat perbedaan prestasi siswa berdasarkan school, baik untuk satu variabel dependen maupun beberapa variabel dependen sekaligus, dengan atau tanpa kontrol terhadap studytime.