Introduction

Understanding factors of student academic performance is something that could not be taken with a grain of salt, and understanding why students leave their programs has always been a complex issue. A Research by by Al Husaini and Ahmad Shukor (2023), has identified several key determinants, including low entry grades, family support, accommodation, gender, previous assessment grades, and e-learning activity, all of which significantly affect students’ academic success.

In order to discover this pattern, I found a dataset that tracks various student attributes — demographics, academic performance, and socioeconomic factors. Since this dataset is usually used for classification algorithm, thus it has a “Target” features within it.

Due to the underlying usage of this dataset, I structured my analysis into two phases. Phase 1 mostly focuses on exploring general patterns within student attributes, such as grades, attendance, and financial status, without immediately tying them to the target outcome. This helps me see which factors naturally group together and how students with similar characteristics behave. Then, in Phase 2, I shift the focus specifically to graduation and dropout outcomes, identifying which combinations of factors are most strongly linked to each.

Libraries Import

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(moments)
library(arulesViz)
library(arulesCBA)
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger

The dataset that I will be using has 36 features and 4424 observations. It has various student attributes, including: demographics, academic performance, and other socioeconomic factors.
Before preceeding, I will check whether the data is well-structure enough to proceed to Association Rules algorithm.
Link to the dataset: https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success

Data Import

data <- read.csv("predict_student_dropout.csv",header=TRUE,sep=";")
head(data, 7)
##   Marital.status Application.mode Application.order Course
## 1              1               17                 5    171
## 2              1               15                 1   9254
## 3              1                1                 5   9070
## 4              1               17                 2   9773
## 5              2               39                 1   8014
## 6              2               39                 1   9991
## 7              1                1                 1   9500
##   Daytime.evening.attendance. Previous.qualification
## 1                           1                      1
## 2                           1                      1
## 3                           1                      1
## 4                           1                      1
## 5                           0                      1
## 6                           0                     19
## 7                           1                      1
##   Previous.qualification..grade. Nacionality Mother.s.qualification
## 1                          122.0           1                     19
## 2                          160.0           1                      1
## 3                          122.0           1                     37
## 4                          122.0           1                     38
## 5                          100.0           1                     37
## 6                          133.1           1                     37
## 7                          142.0           1                     19
##   Father.s.qualification Mother.s.occupation Father.s.occupation
## 1                     12                   5                   9
## 2                      3                   3                   3
## 3                     37                   9                   9
## 4                     37                   5                   3
## 5                     38                   9                   9
## 6                     37                   9                   7
## 7                     38                   7                  10
##   Admission.grade Displaced Educational.special.needs Debtor
## 1           127.3         1                         0      0
## 2           142.5         1                         0      0
## 3           124.8         1                         0      0
## 4           119.6         1                         0      0
## 5           141.5         0                         0      0
## 6           114.8         0                         0      1
## 7           128.4         1                         0      0
##   Tuition.fees.up.to.date Gender Scholarship.holder Age.at.enrollment
## 1                       1      1                  0                20
## 2                       0      1                  0                19
## 3                       0      1                  0                19
## 4                       1      0                  0                20
## 5                       1      0                  0                45
## 6                       1      1                  0                50
## 7                       1      0                  1                18
##   International Curricular.units.1st.sem..credited.
## 1             0                                   0
## 2             0                                   0
## 3             0                                   0
## 4             0                                   0
## 5             0                                   0
## 6             0                                   0
## 7             0                                   0
##   Curricular.units.1st.sem..enrolled. Curricular.units.1st.sem..evaluations.
## 1                                   0                                      0
## 2                                   6                                      6
## 3                                   6                                      0
## 4                                   6                                      8
## 5                                   6                                      9
## 6                                   5                                     10
## 7                                   7                                      9
##   Curricular.units.1st.sem..approved. Curricular.units.1st.sem..grade.
## 1                                   0                          0.00000
## 2                                   6                         14.00000
## 3                                   0                          0.00000
## 4                                   6                         13.42857
## 5                                   5                         12.33333
## 6                                   5                         11.85714
## 7                                   7                         13.30000
##   Curricular.units.1st.sem..without.evaluations.
## 1                                              0
## 2                                              0
## 3                                              0
## 4                                              0
## 5                                              0
## 6                                              0
## 7                                              0
##   Curricular.units.2nd.sem..credited. Curricular.units.2nd.sem..enrolled.
## 1                                   0                                   0
## 2                                   0                                   6
## 3                                   0                                   6
## 4                                   0                                   6
## 5                                   0                                   6
## 6                                   0                                   5
## 7                                   0                                   8
##   Curricular.units.2nd.sem..evaluations. Curricular.units.2nd.sem..approved.
## 1                                      0                                   0
## 2                                      6                                   6
## 3                                      0                                   0
## 4                                     10                                   5
## 5                                      6                                   6
## 6                                     17                                   5
## 7                                      8                                   8
##   Curricular.units.2nd.sem..grade.
## 1                          0.00000
## 2                         13.66667
## 3                          0.00000
## 4                         12.40000
## 5                         13.00000
## 6                         11.50000
## 7                         14.34500
##   Curricular.units.2nd.sem..without.evaluations. Unemployment.rate
## 1                                              0              10.8
## 2                                              0              13.9
## 3                                              0              10.8
## 4                                              0               9.4
## 5                                              0              13.9
## 6                                              5              16.2
## 7                                              0              15.5
##   Inflation.rate   GDP   Target
## 1            1.4  1.74  Dropout
## 2           -0.3  0.79 Graduate
## 3            1.4  1.74  Dropout
## 4           -0.8 -3.12 Graduate
## 5           -0.3  0.79 Graduate
## 6            0.3 -0.92 Graduate
## 7            2.8 -4.06 Graduate

Preliminary Data Check

str(data)
## 'data.frame':    4424 obs. of  37 variables:
##  $ Marital.status                                : int  1 1 1 1 2 2 1 1 1 1 ...
##  $ Application.mode                              : int  17 15 1 17 39 39 1 18 1 1 ...
##  $ Application.order                             : int  5 1 5 2 1 1 1 4 3 1 ...
##  $ Course                                        : int  171 9254 9070 9773 8014 9991 9500 9254 9238 9238 ...
##  $ Daytime.evening.attendance.                   : int  1 1 1 1 0 0 1 1 1 1 ...
##  $ Previous.qualification                        : int  1 1 1 1 1 19 1 1 1 1 ...
##  $ Previous.qualification..grade.                : num  122 160 122 122 100 ...
##  $ Nacionality                                   : int  1 1 1 1 1 1 1 1 62 1 ...
##  $ Mother.s.qualification                        : int  19 1 37 38 37 37 19 37 1 1 ...
##  $ Father.s.qualification                        : int  12 3 37 37 38 37 38 37 1 19 ...
##  $ Mother.s.occupation                           : int  5 3 9 5 9 9 7 9 9 4 ...
##  $ Father.s.occupation                           : int  9 3 9 3 9 7 10 9 9 7 ...
##  $ Admission.grade                               : num  127 142 125 120 142 ...
##  $ Displaced                                     : int  1 1 1 1 0 0 1 1 0 1 ...
##  $ Educational.special.needs                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Debtor                                        : int  0 0 0 0 0 1 0 0 0 1 ...
##  $ Tuition.fees.up.to.date                       : int  1 0 0 1 1 1 1 0 1 0 ...
##  $ Gender                                        : int  1 1 1 0 0 1 0 1 0 0 ...
##  $ Scholarship.holder                            : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ Age.at.enrollment                             : int  20 19 19 20 45 50 18 22 21 18 ...
##  $ International                                 : int  0 0 0 0 0 0 0 0 1 0 ...
##  $ Curricular.units.1st.sem..credited.           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular.units.1st.sem..enrolled.           : int  0 6 6 6 6 5 7 5 6 6 ...
##  $ Curricular.units.1st.sem..evaluations.        : int  0 6 0 8 9 10 9 5 8 9 ...
##  $ Curricular.units.1st.sem..approved.           : int  0 6 0 6 5 5 7 0 6 5 ...
##  $ Curricular.units.1st.sem..grade.              : num  0 14 0 13.4 12.3 ...
##  $ Curricular.units.1st.sem..without.evaluations.: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular.units.2nd.sem..credited.           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular.units.2nd.sem..enrolled.           : int  0 6 6 6 6 5 8 5 6 6 ...
##  $ Curricular.units.2nd.sem..evaluations.        : int  0 6 0 10 6 17 8 5 7 14 ...
##  $ Curricular.units.2nd.sem..approved.           : int  0 6 0 5 6 5 8 0 6 2 ...
##  $ Curricular.units.2nd.sem..grade.              : num  0 13.7 0 12.4 13 ...
##  $ Curricular.units.2nd.sem..without.evaluations.: int  0 0 0 0 0 5 0 0 0 0 ...
##  $ Unemployment.rate                             : num  10.8 13.9 10.8 9.4 13.9 16.2 15.5 15.5 16.2 8.9 ...
##  $ Inflation.rate                                : num  1.4 -0.3 1.4 -0.8 -0.3 0.3 2.8 2.8 0.3 1.4 ...
##  $ GDP                                           : num  1.74 0.79 1.74 -3.12 0.79 -0.92 -4.06 -4.06 -0.92 3.51 ...
##  $ Target                                        : chr  "Dropout" "Graduate" "Dropout" "Graduate" ...
summary(data)
##  Marital.status  Application.mode Application.order     Course    
##  Min.   :1.000   Min.   : 1.00    Min.   :0.000     Min.   :  33  
##  1st Qu.:1.000   1st Qu.: 1.00    1st Qu.:1.000     1st Qu.:9085  
##  Median :1.000   Median :17.00    Median :1.000     Median :9238  
##  Mean   :1.179   Mean   :18.67    Mean   :1.728     Mean   :8857  
##  3rd Qu.:1.000   3rd Qu.:39.00    3rd Qu.:2.000     3rd Qu.:9556  
##  Max.   :6.000   Max.   :57.00    Max.   :9.000     Max.   :9991  
##  Daytime.evening.attendance. Previous.qualification
##  Min.   :0.0000              Min.   : 1.000        
##  1st Qu.:1.0000              1st Qu.: 1.000        
##  Median :1.0000              Median : 1.000        
##  Mean   :0.8908              Mean   : 4.578        
##  3rd Qu.:1.0000              3rd Qu.: 1.000        
##  Max.   :1.0000              Max.   :43.000        
##  Previous.qualification..grade.  Nacionality      Mother.s.qualification
##  Min.   : 95.0                  Min.   :  1.000   Min.   : 1.00         
##  1st Qu.:125.0                  1st Qu.:  1.000   1st Qu.: 2.00         
##  Median :133.1                  Median :  1.000   Median :19.00         
##  Mean   :132.6                  Mean   :  1.873   Mean   :19.56         
##  3rd Qu.:140.0                  3rd Qu.:  1.000   3rd Qu.:37.00         
##  Max.   :190.0                  Max.   :109.000   Max.   :44.00         
##  Father.s.qualification Mother.s.occupation Father.s.occupation Admission.grade
##  Min.   : 1.00          Min.   :  0.00      Min.   :  0.00      Min.   : 95.0  
##  1st Qu.: 3.00          1st Qu.:  4.00      1st Qu.:  4.00      1st Qu.:117.9  
##  Median :19.00          Median :  5.00      Median :  7.00      Median :126.1  
##  Mean   :22.28          Mean   : 10.96      Mean   : 11.03      Mean   :127.0  
##  3rd Qu.:37.00          3rd Qu.:  9.00      3rd Qu.:  9.00      3rd Qu.:134.8  
##  Max.   :44.00          Max.   :194.00      Max.   :195.00      Max.   :190.0  
##    Displaced      Educational.special.needs     Debtor      
##  Min.   :0.0000   Min.   :0.00000           Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000           1st Qu.:0.0000  
##  Median :1.0000   Median :0.00000           Median :0.0000  
##  Mean   :0.5484   Mean   :0.01153           Mean   :0.1137  
##  3rd Qu.:1.0000   3rd Qu.:0.00000           3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.00000           Max.   :1.0000  
##  Tuition.fees.up.to.date     Gender       Scholarship.holder Age.at.enrollment
##  Min.   :0.0000          Min.   :0.0000   Min.   :0.0000     Min.   :17.00    
##  1st Qu.:1.0000          1st Qu.:0.0000   1st Qu.:0.0000     1st Qu.:19.00    
##  Median :1.0000          Median :0.0000   Median :0.0000     Median :20.00    
##  Mean   :0.8807          Mean   :0.3517   Mean   :0.2484     Mean   :23.27    
##  3rd Qu.:1.0000          3rd Qu.:1.0000   3rd Qu.:0.0000     3rd Qu.:25.00    
##  Max.   :1.0000          Max.   :1.0000   Max.   :1.0000     Max.   :70.00    
##  International     Curricular.units.1st.sem..credited.
##  Min.   :0.00000   Min.   : 0.00                      
##  1st Qu.:0.00000   1st Qu.: 0.00                      
##  Median :0.00000   Median : 0.00                      
##  Mean   :0.02486   Mean   : 0.71                      
##  3rd Qu.:0.00000   3rd Qu.: 0.00                      
##  Max.   :1.00000   Max.   :20.00                      
##  Curricular.units.1st.sem..enrolled. Curricular.units.1st.sem..evaluations.
##  Min.   : 0.000                      Min.   : 0.000                        
##  1st Qu.: 5.000                      1st Qu.: 6.000                        
##  Median : 6.000                      Median : 8.000                        
##  Mean   : 6.271                      Mean   : 8.299                        
##  3rd Qu.: 7.000                      3rd Qu.:10.000                        
##  Max.   :26.000                      Max.   :45.000                        
##  Curricular.units.1st.sem..approved. Curricular.units.1st.sem..grade.
##  Min.   : 0.000                      Min.   : 0.00                   
##  1st Qu.: 3.000                      1st Qu.:11.00                   
##  Median : 5.000                      Median :12.29                   
##  Mean   : 4.707                      Mean   :10.64                   
##  3rd Qu.: 6.000                      3rd Qu.:13.40                   
##  Max.   :26.000                      Max.   :18.88                   
##  Curricular.units.1st.sem..without.evaluations.
##  Min.   : 0.0000                               
##  1st Qu.: 0.0000                               
##  Median : 0.0000                               
##  Mean   : 0.1377                               
##  3rd Qu.: 0.0000                               
##  Max.   :12.0000                               
##  Curricular.units.2nd.sem..credited. Curricular.units.2nd.sem..enrolled.
##  Min.   : 0.0000                     Min.   : 0.000                     
##  1st Qu.: 0.0000                     1st Qu.: 5.000                     
##  Median : 0.0000                     Median : 6.000                     
##  Mean   : 0.5418                     Mean   : 6.232                     
##  3rd Qu.: 0.0000                     3rd Qu.: 7.000                     
##  Max.   :19.0000                     Max.   :23.000                     
##  Curricular.units.2nd.sem..evaluations. Curricular.units.2nd.sem..approved.
##  Min.   : 0.000                         Min.   : 0.000                     
##  1st Qu.: 6.000                         1st Qu.: 2.000                     
##  Median : 8.000                         Median : 5.000                     
##  Mean   : 8.063                         Mean   : 4.436                     
##  3rd Qu.:10.000                         3rd Qu.: 6.000                     
##  Max.   :33.000                         Max.   :20.000                     
##  Curricular.units.2nd.sem..grade.
##  Min.   : 0.00                   
##  1st Qu.:10.75                   
##  Median :12.20                   
##  Mean   :10.23                   
##  3rd Qu.:13.33                   
##  Max.   :18.57                   
##  Curricular.units.2nd.sem..without.evaluations. Unemployment.rate
##  Min.   : 0.0000                                Min.   : 7.60    
##  1st Qu.: 0.0000                                1st Qu.: 9.40    
##  Median : 0.0000                                Median :11.10    
##  Mean   : 0.1503                                Mean   :11.57    
##  3rd Qu.: 0.0000                                3rd Qu.:13.90    
##  Max.   :12.0000                                Max.   :16.20    
##  Inflation.rate        GDP               Target         
##  Min.   :-0.800   Min.   :-4.060000   Length:4424       
##  1st Qu.: 0.300   1st Qu.:-1.700000   Class :character  
##  Median : 1.400   Median : 0.320000   Mode  :character  
##  Mean   : 1.228   Mean   : 0.001969                     
##  3rd Qu.: 2.600   3rd Qu.: 1.790000                     
##  Max.   : 3.700   Max.   : 3.510000

Data Preprocessing

I have noticed that some features hasn’t been coded to factors datatype. I will change those accordingly.
I am taking the label provided from the UCI Machine Learning variables table.

marital.status_map <- c(
  "1" = "Single",
  "2" = "Married",
  "3" = "Widower",
  "4" = "Divorced",
  "5" = "Facto Union",
  "6" = "Legally Separated"
)

application.mode_map <- c(
  "1" = "1st Phase - General Contingent",
  "2" = "Ordinance No. 612/93",
  "5" = "1st Phase - Special Contingent (Azores Island)",
  "7" = "Holders of Other Higher Courses",
  "10" = "Ordinance No. 854-B/99",
  "15" = "International Student (Bachelor)",
  "16" = "1st Phase - Special Contingent (Madeira Island)",
  "17" = "2nd Phase - General Contingent",
  "18" = "3rd Phase - General Contingent",
  "26" = "Ordinance No. 533-A/99, Item b2) (Different Plan)",
  "27" = "Ordinance No. 533-A/99, Item b3) (Other Institution)",
  "39" = "Over 23 years old",
  "42" = "Transfer",
  "43" = "Change of Course",
  "44" = "Technological Specialization Diploma Holders",
  "51" = "Change of Institution/Course",
  "53" = "Short Cycle Diploma Holders",
  "57" = "Change of Institution/Course (International)"
)

course_map <- c(
  "33" = "Biofuel Production Technologies",
  "171" = "Animation and Multimedia Design",
  "8014" = "Social Service (Evening Attendance)",
  "9003" = "Agronomy",
  "9070" = "Communication Design",
  "9085" = "Veterinary Nursing",
  "9119" = "Informatics Engineering",
  "9130" = "Equinculture",
  "9147" = "Management",
  "9238" = "Social Service",
  "9254" = "Tourism",
  "9500" = "Nursing",
  "9556" = "Oral Hygiene",
  "9670" = "Advertising and Marketing Management",
  "9773" = "Journalism and Communication",
  "9853" = "Basic Education",
  "9991" = "Management (Evening Attendance)"
)

daytime.evening.attendance_map <- c(
  "1" = "Daytime", "0" = "Evening"
)

previous.qualification_map <- c(
  "1" = "Secondary Education",
  "2" = "Higher Education - Bachelor's Degree",
  "3" = "Higher Education - Degree",
  "4" = "Higher Education - Master's",
  "5" = "Higher Education - Doctorate",
  "6" = "Frequency of Higher Education",
  "9" = "12th Year of Schooling - Not Completed",
  "10" = "11th Year of Schooling - Not Completed",
  "12" = "Other - 11th Year of Schooling",
  "14" = "10th Year of Schooling",
  "15" = "10th Year of Schooling - Not Completed",
  "19" = "Basic Education 3rd Cycle (9th/10th/11th Year) or Equivalent",
  "38" = "Basic Education 2nd Cycle (6th/7th/8th Year) or Equivalent",
  "39" = "Technological Specialization Course",
  "40" = "Higher Education - Degree (1st Cycle)",
  "42" = "Professional Higher Technical Course",
  "43" = "Higher Education - Master (2nd Cycle)"
)

nacionality_map <- c(
  "1" = "Portuguese",
  "2" = "German",
  "6" = "Spanish",
  "11" = "Italian",
  "13" = "Dutch",
  "14" = "English",
  "17" = "Lithuanian",
  "21" = "Angolan",
  "22" = "Cape Verdean",
  "24" = "Guinean",
  "25" = "Mozambican",
  "26" = "Santomean",
  "32" = "Turkish",
  "41" = "Brazilian",
  "62" = "Romanian",
  "100" = "Moldova (Republic of)",
  "101" = "Mexican",
  "103" = "Ukrainian",
  "105" = "Russian",
  "108" = "Cuban",
  "109" = "Colombian"
)

mother.s.qualification_map <- c(
  "1" = "Secondary Education - 12th Year of Schooling or Eq.",
  "2" = "Higher Education - Bachelor's Degree",
  "3" = "Higher Education - Degree",
  "4" = "Higher Education - Master's",
  "5" = "Higher Education - Doctorate",
  "6" = "Frequency of Higher Education 9 - 12th Year of Schooling - Not Completed",
  "9" = "12th Year of Schooling - Not Completed",
  "10" = "11th Year of Schooling - Not Completed",
  "11" = "7th Year (Old)",
  "12" = "Other - 11th Year of Schooling",
  "14" = "10th Year of Schooling",
  "18" = "General Commerce Course",
  "19" = "Basic Education 3rd Cycle (9th/10th/11th Year) or Equiv.",
  "22" = "Technical-Professional Course",
  "26" = "7th Year of Schooling",
  "27" = "2nd Cycle of General High School",
  "29" = "9th Year of Schooling - Not Completed",
  "30" = "8th Year of Schooling",
  "34" = "Unknown",
  "35" = "Can't Read or Write",
  "36" = "Can read without having a 4th year of schooling",
  "37" = "Basic education 1st cycle (4th/5th year) or equiv.",
  "38" = "Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.",
  "39" = "Technological Specialization Course",
  "40" = "Higher Education - Degree (1st Cycle)",
  "41" = "Specialized higher studies course",
  "42" = "Professional higher technical course",
  "43" = "Higher Education - Master's (2nd Cycle)",
  "44" = "Higher Education - Doctorate (3rd cycle)"
)

father.s.qualification_map <- c(
  "1" = "Secondary Education - 12th Year of Schooling or Eq.",
  "2" = "Higher Education - Bachelor's Degree",
  "3" = "Higher Education - Degree",
  "4" = "Higher Education - Master's",
  "5" = "Higher Education - Doctorate",
  "6" = "Frequency of Higher Education 9 - 12th Year of Schooling - Not Completed",
  "9" = "12th Year of Schooling - Not Completed",
  "10" = "11th Year of Schooling - Not Completed",
  "11" = "7th Year (Old)",
  "12" = "Other - 11th Year of Schooling",
  "14" = "10th Year of Schooling",
  "18" = "General Commerce Course",
  "19" = "Basic Education 3rd Cycle (9th/10th/11th Year) or Equiv.",
  "22" = "Technical-Professional Course",
  "26" = "7th Year of Schooling",
  "27" = "2nd Cycle of General High School",
  "29" = "9th Year of Schooling - Not Completed",
  "30" = "8th Year of Schooling",
  "34" = "Unknown",
  "35" = "Can't Read or Write",
  "36" = "Can read without having a 4th year of schooling",
  "37" = "Basic education 1st cycle (4th/5th year) or equiv.",
  "38" = "Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.",
  "39" = "Technological Specialization Course",
  "40" = "Higher Education - Degree (1st Cycle)",
  "41" = "Specialized higher studies course",
  "42" = "Professional higher technical course",
  "43" = "Higher Education - Master's (2nd Cycle)",
  "44" = "Higher Education - Doctorate (3rd cycle)",
  "13" = "2nd year complementary high school",
  "20" = "Complementary High School Course",
  "25" = "Complementary High School Course - not concluded",
  "31" = "General Course of Administration and Commerce",
  "33" = "Supplementary Accounting and Administration"
)

mother.s.occupation_map <- c(
  "0" = "Student",
  "1" = "Representatives of the Legislative Power and Executive Bodies, Directors, Directors and Executive Managers",
  "2" = "Specialists in Intellectual and Scientific Activities",
  "3" = "Intermediate Level Technicians and Professions",
  "4" = "Administrative Staff",
  "5" = "Personal Services, Security and Safety Workers and Sellers",
  "6" = "Farmers and Skilled Workers in Agriculture, Fisheries and Forestry",
  "7" = "Skilled Workers in Industry, Construction and Craftsmen",
  "8" = "Installation and Machine Operators and Assembly Workers",
  "9" = "Unskilled Workers",
  "10" = "Armed Forces",
  "90" = "Other Situation",
  "99" = "(Blank)",
  "122" = "Health professionals",
  "123" = "teachers",
  "125" = "Specialists in information and communication technologies (ICT)",
  "131" = "Intermediate level science and engineering technicians and professions",
  "132" = "Technicians and professionals, of intermediate level of health",
  "134" = "Intermediate level technicians from legal, social, sports, cultural and similar services",
  "141" = "Office workers, secretaries in general and data processing operators",
  "143" = "Data, accounting, statistical, financial services and registry-related operators",
  "144" = "Other administrative support staff",
  "151" = "personal service workers",
  "152" = "sellers",
  "153" = "Personal care workers and the like",
  "171" = "Skilled construction workers and the like, except electricians",
  "173" = "Skilled workers in printing, precision instrument manufacturing, jewelers, artisans and the like",
  "175" = "Workers in food processing, woodworking, clothing and other industries and crafts",
  "191" = "cleaning workers",
  "192" = "Unskilled workers in agriculture, animal production, fisheries and forestry",
  "193" = "Unskilled workers in extractive industry, construction, manufacturing and transport",
  "194" = "Meal preparation assistants"
)

father.s.occupation_map <- c(
  "0" = "Student",
  "1" = "Representatives of the Legislative Power and Executive Bodies, Directors, Directors and Executive Managers",
  "2" = "Specialists in Intellectual and Scientific Activities",
  "3" = "Intermediate Level Technicians and Professions",
  "4" = "Administrative Staff",
  "5" = "Personal Services, Security and Safety Workers and Sellers",
  "6" = "Farmers and Skilled Workers in Agriculture, Fisheries and Forestry",
  "7" = "Skilled Workers in Industry, Construction and Craftsmen",
  "8" = "Installation and Machine Operators and Assembly Workers",
  "9" = "Unskilled Workers",
  "10" = "Armed Forces",
  "90" = "Other Situation",
  "99" = "(Blank)",
  "122" = "Health professionals",
  "123" = "teachers",
  "125" = "Specialists in information and communication technologies (ICT)",
  "131" = "Intermediate level science and engineering technicians and professions",
  "132" = "Technicians and professionals, of intermediate level of health",
  "134" = "Intermediate level technicians from legal, social, sports, cultural and similar services",
  "141" = "Office workers, secretaries in general and data processing operators",
  "143" = "Data, accounting, statistical, financial services and registry-related operators",
  "144" = "Other administrative support staff",
  "151" = "personal service workers",
  "152" = "sellers",
  "153" = "Personal care workers and the like",
  "171" = "Skilled construction workers and the like, except electricians",
  "173" = "Skilled workers in printing, precision instrument manufacturing, jewelers, artisans and the like",
  "175" = "Workers in food processing, woodworking, clothing and other industries and crafts",
  "191" = "cleaning workers",
  "192" = "Unskilled workers in agriculture, animal production, fisheries and forestry",
  "193" = "Unskilled workers in extractive industry, construction, manufacturing and transport",
  "194" = "Meal preparation assistants",
  "101" = "Armed Forces Officers",
  "102" = "Armed Forces Sergeants",
  "103" = "Other Armed Forces personnel",
  "112" = "Directors of administrative and commercial services",
  "114" = "Hotel, catering, trade and other services directors",
  "121" = "Specialists in the physical sciences, mathematics, engineering and related techniques",
  "124" = "Specialists in finance, accounting, administrative organization, public and commercial relations",
  "135" = "Information and communication technology technicians",
  "154" = "Protection and security services personnel",
  "161" = "Market-oriented farmers and skilled agricultural and animal production workers",
  "163" = "Farmers, livestock keepers, fishermen, hunters and gatherers, subsistence",
  "172" = "Skilled workers in metallurgy, metalworking and similar",
  "174" = "Skilled workers in electricity and electronics",
  "181" = "Fixed plant and machine operators",
  "182" = "assembly workers",
  "183" = "Vehicle drivers and mobile equipment operators",
  "195" = "Street vendors (except food) and street service providers"
)

binary_map <- c("1" = "Yes", "0" = "No")

gender_map <- c("1" = "Male", "0" = "Female")
# Apply mappings to categorical columns
data$Marital.status <- as.character(marital.status_map[as.character(data$Marital.status)])
data$Application.mode <- as.character(application.mode_map[as.character(data$Application.mode)])
data$Course <- as.character(course_map[as.character(data$Course)])
data$Daytime.evening.attendance. <- as.character(daytime.evening.attendance_map[as.character(data$Daytime.evening.attendance.)])
data$Previous.qualification <- as.character(previous.qualification_map[as.character(data$Previous.qualification)])
data$Nacionality <- as.character(nacionality_map[as.character(data$Nacionality)])
data$Mother.s.qualification <- as.character(mother.s.qualification_map[as.character(data$Mother.s.qualification)])
data$Mother.s.occupation <- as.character(mother.s.occupation_map[as.character(data$Mother.s.occupation)])
data$Father.s.qualification <- as.character(father.s.qualification_map[as.character(data$Father.s.qualification)])
data$Father.s.occupation <- as.character(father.s.occupation_map[as.character(data$Father.s.occupation)])
data$Displaced <- as.character(binary_map[as.character(data$Displaced)])
data$Educational.special.needs <- as.character(binary_map[as.character(data$Educational.special.needs)])
data$Debtor <- as.character(binary_map[as.character(data$Debtor)])
data$Tuition.fees.up.to.date <- as.character(binary_map[as.character(data$Tuition.fees.up.to.date)])
data$Gender <- as.character(gender_map[as.character(data$Gender)])
data$Scholarship.holder <- as.character(binary_map[as.character(data$Scholarship.holder)])
data$International <- as.character(binary_map[as.character(data$International)])

Checking Null Values

Checking null values in all variables within the dataset.

colSums(is.na(data))
##                                 Marital.status 
##                                              0 
##                               Application.mode 
##                                              0 
##                              Application.order 
##                                              0 
##                                         Course 
##                                              0 
##                    Daytime.evening.attendance. 
##                                              0 
##                         Previous.qualification 
##                                              0 
##                 Previous.qualification..grade. 
##                                              0 
##                                    Nacionality 
##                                              0 
##                         Mother.s.qualification 
##                                              0 
##                         Father.s.qualification 
##                                              0 
##                            Mother.s.occupation 
##                                              0 
##                            Father.s.occupation 
##                                              0 
##                                Admission.grade 
##                                              0 
##                                      Displaced 
##                                              0 
##                      Educational.special.needs 
##                                              0 
##                                         Debtor 
##                                              0 
##                        Tuition.fees.up.to.date 
##                                              0 
##                                         Gender 
##                                              0 
##                             Scholarship.holder 
##                                              0 
##                              Age.at.enrollment 
##                                              0 
##                                  International 
##                                              0 
##            Curricular.units.1st.sem..credited. 
##                                              0 
##            Curricular.units.1st.sem..enrolled. 
##                                              0 
##         Curricular.units.1st.sem..evaluations. 
##                                              0 
##            Curricular.units.1st.sem..approved. 
##                                              0 
##               Curricular.units.1st.sem..grade. 
##                                              0 
## Curricular.units.1st.sem..without.evaluations. 
##                                              0 
##            Curricular.units.2nd.sem..credited. 
##                                              0 
##            Curricular.units.2nd.sem..enrolled. 
##                                              0 
##         Curricular.units.2nd.sem..evaluations. 
##                                              0 
##            Curricular.units.2nd.sem..approved. 
##                                              0 
##               Curricular.units.2nd.sem..grade. 
##                                              0 
## Curricular.units.2nd.sem..without.evaluations. 
##                                              0 
##                              Unemployment.rate 
##                                              0 
##                                 Inflation.rate 
##                                              0 
##                                            GDP 
##                                              0 
##                                         Target 
##                                              0

Checking Duplicated Data

check_duplicates <- duplicated(data)
data[check_duplicates, ]
##  [1] Marital.status                                
##  [2] Application.mode                              
##  [3] Application.order                             
##  [4] Course                                        
##  [5] Daytime.evening.attendance.                   
##  [6] Previous.qualification                        
##  [7] Previous.qualification..grade.                
##  [8] Nacionality                                   
##  [9] Mother.s.qualification                        
## [10] Father.s.qualification                        
## [11] Mother.s.occupation                           
## [12] Father.s.occupation                           
## [13] Admission.grade                               
## [14] Displaced                                     
## [15] Educational.special.needs                     
## [16] Debtor                                        
## [17] Tuition.fees.up.to.date                       
## [18] Gender                                        
## [19] Scholarship.holder                            
## [20] Age.at.enrollment                             
## [21] International                                 
## [22] Curricular.units.1st.sem..credited.           
## [23] Curricular.units.1st.sem..enrolled.           
## [24] Curricular.units.1st.sem..evaluations.        
## [25] Curricular.units.1st.sem..approved.           
## [26] Curricular.units.1st.sem..grade.              
## [27] Curricular.units.1st.sem..without.evaluations.
## [28] Curricular.units.2nd.sem..credited.           
## [29] Curricular.units.2nd.sem..enrolled.           
## [30] Curricular.units.2nd.sem..evaluations.        
## [31] Curricular.units.2nd.sem..approved.           
## [32] Curricular.units.2nd.sem..grade.              
## [33] Curricular.units.2nd.sem..without.evaluations.
## [34] Unemployment.rate                             
## [35] Inflation.rate                                
## [36] GDP                                           
## [37] Target                                        
## <0 rows> (or 0-length row.names)

Discreditisation

Check Distribution

Before binning, I am checking the distribution of each numerical variables mentioned above, and also their descriptive statistics.

continuous_vars <- c(
  "Previous.qualification..grade.", 
  "Admission.grade", 
  "Curricular.units.1st.sem..grade.", 
  "Curricular.units.1st.sem..credited.",
  "Curricular.units.1st.sem..enrolled.",
  "Curricular.units.1st.sem..evaluations.",
  "Curricular.units.1st.sem..approved.",
  "Curricular.units.1st.sem..without.evaluations.",
  "Curricular.units.2nd.sem..grade.", 
  "Curricular.units.2nd.sem..credited.",
  "Curricular.units.2nd.sem..enrolled.",
  "Curricular.units.2nd.sem..evaluations.",
  "Curricular.units.2nd.sem..approved.",
  "Curricular.units.2nd.sem..without.evaluations.",
  "Unemployment.rate", 
  "Inflation.rate", 
  "GDP",
  "Age.at.enrollment"
)

# Loop through variables and plot histograms
par(mfrow = c(3, 3))  # Arrange plots in a grid
for (var in continuous_vars) {
  hist(data[[var]], main = paste("Histogram of", var), xlab = var, col = "lightblue", border = "black")
}

for (var in continuous_vars) {
  cat("\nSummary of", var, ":\n")
  print(summary(data[[var]]))
}
## 
## Summary of Previous.qualification..grade. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    95.0   125.0   133.1   132.6   140.0   190.0 
## 
## Summary of Admission.grade :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    95.0   117.9   126.1   127.0   134.8   190.0 
## 
## Summary of Curricular.units.1st.sem..grade. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   11.00   12.29   10.64   13.40   18.88 
## 
## Summary of Curricular.units.1st.sem..credited. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    0.71    0.00   20.00 
## 
## Summary of Curricular.units.1st.sem..enrolled. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.000   6.000   6.271   7.000  26.000 
## 
## Summary of Curricular.units.1st.sem..evaluations. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   6.000   8.000   8.299  10.000  45.000 
## 
## Summary of Curricular.units.1st.sem..approved. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.000   5.000   4.707   6.000  26.000 
## 
## Summary of Curricular.units.1st.sem..without.evaluations. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1377  0.0000 12.0000 
## 
## Summary of Curricular.units.2nd.sem..grade. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   10.75   12.20   10.23   13.33   18.57 
## 
## Summary of Curricular.units.2nd.sem..credited. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.5418  0.0000 19.0000 
## 
## Summary of Curricular.units.2nd.sem..enrolled. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   5.000   6.000   6.232   7.000  23.000 
## 
## Summary of Curricular.units.2nd.sem..evaluations. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   6.000   8.000   8.063  10.000  33.000 
## 
## Summary of Curricular.units.2nd.sem..approved. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   5.000   4.436   6.000  20.000 
## 
## Summary of Curricular.units.2nd.sem..without.evaluations. :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1503  0.0000 12.0000 
## 
## Summary of Unemployment.rate :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.60    9.40   11.10   11.57   13.90   16.20 
## 
## Summary of Inflation.rate :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.800   0.300   1.400   1.228   2.600   3.700 
## 
## Summary of GDP :
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -4.060000 -1.700000  0.320000  0.001969  1.790000  3.510000 
## 
## Summary of Age.at.enrollment :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.00   19.00   20.00   23.27   25.00   70.00
for (var in continuous_vars) {
  cat("\nSkewness of", var, ":", skewness(data[[var]], na.rm = TRUE), "\n")
}
## 
## Skewness of Previous.qualification..grade. : 0.3127614 
## 
## Skewness of Admission.grade : 0.5304199 
## 
## Skewness of Curricular.units.1st.sem..grade. : -1.567614 
## 
## Skewness of Curricular.units.1st.sem..credited. : 4.167635 
## 
## Skewness of Curricular.units.1st.sem..enrolled. : 1.618492 
## 
## Skewness of Curricular.units.1st.sem..evaluations. : 0.9763055 
## 
## Skewness of Curricular.units.1st.sem..approved. : 0.7660026 
## 
## Skewness of Curricular.units.1st.sem..without.evaluations. : 8.20462 
## 
## Skewness of Curricular.units.2nd.sem..grade. : -1.313205 
## 
## Skewness of Curricular.units.2nd.sem..credited. : 4.633248 
## 
## Skewness of Curricular.units.2nd.sem..enrolled. : 0.7878463 
## 
## Skewness of Curricular.units.2nd.sem..evaluations. : 0.3363831 
## 
## Skewness of Curricular.units.2nd.sem..approved. : 0.3061755 
## 
## Skewness of Curricular.units.2nd.sem..without.evaluations. : 7.265236 
## 
## Skewness of Unemployment.rate : 0.2119791 
## 
## Skewness of Inflation.rate : 0.2522898 
## 
## Skewness of GDP : -0.3939346 
## 
## Skewness of Age.at.enrollment : 2.054292

We can see that several features within the dataset have different distributions and varying level of skewness, this does impact on how we categorise them.
For roughly normal distribution variables, I will implement equal width binning, which divides value into fixed intervals.
For skewed variables, I will implement equal frequency binning, which is suited best when some values are more common than the other.
For variables with extreme outliers, I will implement quantile based discreditisation.
Aside from the mentioned, I will implement custom method of discreditisation.

Discreditisation Method

#Previous.qualification..grade. - EQUAL WIDTH
data$Previous.qualification..grade. <- cut(
  data$Previous.qualification..grade., 
  breaks = 3, 
  labels = c("Low", "Medium", "High"),
  include.lowest = TRUE
)

#Admission.grade - EQUAL FREQ
data$Admission.grade <- discretize(
  data$Admission.grade, 
  method = "frequency", 
  breaks = 3, 
  labels = c("Low", "Medium", "High")
)

# Age at Enrollment - EQUAL FREQ
data$Age.at.enrollment <- discretize(
  data$Age.at.enrollment, 
  method = "frequency", 
  breaks = 3, 
  labels = c("Young", "Middle-Age", "Old")
)

# # Curricular Units 1st Credited - EQUAL FREQ
data$Curricular.units.1st.sem..credited. <- cut(
  data$Curricular.units.1st.sem..credited., 
  breaks = c(-Inf, 0, 5, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)

# Curricular Units 1st Enrolled - EQUAL FREQ
data$Curricular.units.1st.sem..enrolled. <- cut(
  data$Curricular.units.1st.sem..enrolled., 
  breaks = c(-Inf, 0, 6, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)


# Curricular Units 1st Evaluations - EQUAL FREQ
data$Curricular.units.1st.sem..evaluations. <- discretize(
  data$Curricular.units.1st.sem..evaluations., 
  method = "frequency", 
  breaks = 3, 
  labels = c("Low", "Medium", "High")
)

# Curricular Units 1st Approved - EQUAL FREQ
data$Curricular.units.1st.sem..approved. <- discretize(
  data$Curricular.units.1st.sem..approved., 
  method = "frequency", 
  breaks = 3, 
  labels = c("Little", "Moderate", "Many")
)

#Curricular.units.1st.sem..grade -  QUANTILE BASED
data$Curricular.units.1st.sem..grade. <- cut(
  data$Curricular.units.1st.sem..grade., 
  breaks = quantile(data$Curricular.units.1st.sem..grade., probs = seq(0, 1, 0.33), na.rm = TRUE),
  include.lowest = TRUE,
  labels = c("Low", "Medium", "High")
)

# #Curricular.units.1st.sem..without eval -  CUSTOM BINNING
data$Curricular.units.1st.sem..without.evaluations. <- cut(
  data$Curricular.units.1st.sem..without.evaluations., 
  breaks = c(-Inf, 0, 5, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)

# # Curricular Units 2nd Credited - EQUAL FREQ
data$Curricular.units.2nd.sem..credited. <- cut(
  data$Curricular.units.2nd.sem..credited., 
  breaks = c(-Inf, 0, 5, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)

# Curricular Units 2nd Enrolled - EQUAL FREQ
data$Curricular.units.2nd.sem..enrolled. <- cut(
  data$Curricular.units.2nd.sem..enrolled., 
  breaks = c(-Inf, 0, 6, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)

# Curricular Units 2nd Evaluations - EQUAL FREQ
data$Curricular.units.2nd.sem..evaluations. <- discretize(
  data$Curricular.units.2nd.sem..evaluations., 
  method = "frequency", 
  breaks = 3, 
  labels = c("Low", "Medium", "High")
)

# Curricular Units 2nd Approved - EQUAL FREQ
data$Curricular.units.2nd.sem..approved. <- discretize(
  data$Curricular.units.2nd.sem..approved., 
  method = "frequency", 
  breaks = 3, 
  labels = c("Little", "Moderate", "Many")
)

# Curricular.units.2nd.sem..grade. - QUANTILE BASED
data$Curricular.units.2nd.sem..grade. <- cut(
  data$Curricular.units.2nd.sem..grade., 
  breaks = quantile(data$Curricular.units.2nd.sem..grade., probs = seq(0, 1, 0.33), na.rm = TRUE),
  include.lowest = TRUE,
  labels = c("Low", "Medium", "High")
)

# # Curricular.units.2nd.sem..without eval - CUSTOM BIN
data$Curricular.units.2nd.sem..without.evaluations. <- cut(
  data$Curricular.units.2nd.sem..without.evaluations., 
  breaks = c(-Inf, 0, 5, Inf), 
  labels = c("None", "Few", "Many"),
  include.lowest = TRUE
)

# Unemployment.rate - EQUAL WIDTH
data$Unemployment.rate <- cut(
  data$Unemployment.rate, 
  breaks = 3, 
  labels = c("Low", "Medium", "High"),
  include.lowest = TRUE
)

# Inflation Rate - EQUAL WIDTH
data$Inflation.rate <- cut(
  data$Inflation.rate, 
  breaks = 3, 
  labels = c("Low", "Medium", "High"),
  include.lowest = TRUE
)

# GDP - EQUAL WIDTH
data$GDP <- cut(
  data$GDP, 
  breaks = 3, 
  labels = c("Low", "Medium", "High"),
  include.lowest = TRUE
)
# Check the discreditisation
for (i in continuous_vars){
  print(paste("Table for:", i))
  print(table(data[[i]]))
}
## [1] "Table for: Previous.qualification..grade."
## 
##    Low Medium   High 
##   1306   2914    204 
## [1] "Table for: Admission.grade"
## 
##    Low Medium   High 
##   1332   1596   1496 
## [1] "Table for: Curricular.units.1st.sem..grade."
## 
##    Low Medium   High 
##   1466   1554   1363 
## [1] "Table for: Curricular.units.1st.sem..credited."
## 
## None  Few Many 
## 3847  336  241 
## [1] "Table for: Curricular.units.1st.sem..enrolled."
## 
## None  Few Many 
##  180 2967 1277 
## [1] "Table for: Curricular.units.1st.sem..evaluations."
## 
##    Low Medium   High 
##   1206   1494   1724 
## [1] "Table for: Curricular.units.1st.sem..approved."
## 
##   Little Moderate     Many 
##     1274     1156     1994 
## [1] "Table for: Curricular.units.1st.sem..without.evaluations."
## 
## None  Few Many 
## 4130  275   19 
## [1] "Table for: Curricular.units.2nd.sem..grade."
## 
##    Low Medium   High 
##   1472   1586   1326 
## [1] "Table for: Curricular.units.2nd.sem..credited."
## 
## None  Few Many 
## 3894  394  136 
## [1] "Table for: Curricular.units.2nd.sem..enrolled."
## 
## None  Few Many 
##  180 2995 1249 
## [1] "Table for: Curricular.units.2nd.sem..evaluations."
## 
##    Low Medium   High 
##   1322   1355   1747 
## [1] "Table for: Curricular.units.2nd.sem..approved."
## 
##   Little Moderate     Many 
##     1467     1140     1817 
## [1] "Table for: Curricular.units.2nd.sem..without.evaluations."
## 
## None  Few Many 
## 4142  261   21 
## [1] "Table for: Unemployment.rate"
## 
##    Low Medium   High 
##   1472   1803   1149 
## [1] "Table for: Inflation.rate"
## 
##    Low Medium   High 
##   2144    893   1387 
## [1] "Table for: GDP"
## 
##    Low Medium   High 
##   1349   1323   1752 
## [1] "Table for: Age.at.enrollment"
## 
##      Young Middle-Age        Old 
##       1041       1832       1551

Transaction Transformation

Factorisation of All Variables

Before proceeding to converting the dataset into transaction format, I need to

character_columns <- sapply(data, is.character)
data[, character_columns] <- lapply(data[, character_columns], as.factor)

numerical_columns <- sapply(data, is.numeric)
data[, numerical_columns] <- lapply(data[, numerical_columns], as.factor)
## Warning in `[<-.data.frame`(`*tmp*`, , numerical_columns, value =
## list(structure(1L, levels = "5", class = "factor"), : provided 4424 variables
## to replace 1 variables
str(data)
## 'data.frame':    4424 obs. of  37 variables:
##  $ Marital.status                                : Factor w/ 6 levels "Divorced","Facto Union",..: 5 5 5 5 4 4 5 5 5 5 ...
##  $ Application.mode                              : Factor w/ 18 levels "1st Phase - General Contingent",..: 4 10 1 4 15 15 1 5 1 1 ...
##  $ Application.order                             : Factor w/ 1 level "5": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Course                                        : Factor w/ 17 levels "Advertising and Marketing Management",..: 3 16 6 9 15 11 12 16 14 14 ...
##  $ Daytime.evening.attendance.                   : Factor w/ 2 levels "Daytime","Evening": 1 1 1 1 2 2 1 1 1 1 ...
##  $ Previous.qualification                        : Factor w/ 17 levels "10th Year of Schooling",..: 16 16 16 16 16 6 16 16 16 16 ...
##  $ Previous.qualification..grade.                : Factor w/ 3 levels "Low","Medium",..: 1 3 1 1 1 2 2 1 2 2 ...
##  $ Nacionality                                   : Factor w/ 21 levels "Angolan","Brazilian",..: 15 15 15 15 15 15 15 15 16 15 ...
##  $ Mother.s.qualification                        : Factor w/ 29 levels "10th Year of Schooling",..: 11 25 9 10 9 9 11 9 25 25 ...
##  $ Father.s.qualification                        : Factor w/ 34 levels "10th Year of Schooling",..: 27 21 10 10 11 10 11 10 29 12 ...
##  $ Mother.s.occupation                           : Factor w/ 32 levels "(Blank)","Administrative Staff",..: 18 10 29 18 29 29 22 29 29 2 ...
##  $ Father.s.occupation                           : Factor w/ 46 levels "(Blank)","Administrative Staff",..: 42 17 42 17 42 33 3 42 42 33 ...
##  $ Admission.grade                               : Factor w/ 3 levels "Low","Medium",..: 2 3 2 1 3 1 2 1 2 2 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 95 120 131 190
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ Displaced                                     : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 1 2 2 1 2 ...
##  $ Educational.special.needs                     : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Debtor                                        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 2 ...
##  $ Tuition.fees.up.to.date                       : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 1 2 1 ...
##  $ Gender                                        : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 2 1 2 1 1 ...
##  $ Scholarship.holder                            : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 2 1 ...
##  $ Age.at.enrollment                             : Factor w/ 3 levels "Young","Middle-Age",..: 2 2 2 2 3 3 1 3 2 1 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 17 19 22 70
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ International                                 : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 2 1 ...
##  $ Curricular.units.1st.sem..credited.           : Factor w/ 3 levels "None","Few","Many": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Curricular.units.1st.sem..enrolled.           : Factor w/ 3 levels "None","Few","Many": 1 2 2 2 2 2 3 2 2 2 ...
##  $ Curricular.units.1st.sem..evaluations.        : Factor w/ 3 levels "Low","Medium",..: 1 1 1 2 3 3 3 1 2 3 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 0 7 9 45
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ Curricular.units.1st.sem..approved.           : Factor w/ 3 levels "Little","Moderate",..: 1 3 1 3 2 2 3 1 3 2 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 0 4 6 26
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ Curricular.units.1st.sem..grade.              : Factor w/ 3 levels "Low","Medium",..: 1 3 1 3 2 2 3 1 3 1 ...
##  $ Curricular.units.1st.sem..without.evaluations.: Factor w/ 3 levels "None","Few","Many": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Curricular.units.2nd.sem..credited.           : Factor w/ 3 levels "None","Few","Many": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Curricular.units.2nd.sem..enrolled.           : Factor w/ 3 levels "None","Few","Many": 1 2 2 2 2 2 3 2 2 2 ...
##  $ Curricular.units.2nd.sem..evaluations.        : Factor w/ 3 levels "Low","Medium",..: 1 1 1 3 1 3 2 1 2 3 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 0 7 9 33
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ Curricular.units.2nd.sem..approved.           : Factor w/ 3 levels "Little","Moderate",..: 1 3 1 2 3 2 3 1 3 1 ...
##   ..- attr(*, "discretized:breaks")= num [1:4] 0 4 6 20
##   ..- attr(*, "discretized:method")= chr "frequency"
##  $ Curricular.units.2nd.sem..grade.              : Factor w/ 3 levels "Low","Medium",..: 1 3 1 2 2 2 3 1 3 3 ...
##  $ Curricular.units.2nd.sem..without.evaluations.: Factor w/ 3 levels "None","Few","Many": 1 1 1 1 1 2 1 1 1 1 ...
##  $ Unemployment.rate                             : Factor w/ 3 levels "Low","Medium",..: 2 3 2 1 3 3 3 3 3 1 ...
##  $ Inflation.rate                                : Factor w/ 3 levels "Low","Medium",..: 2 1 2 1 1 1 3 3 1 2 ...
##  $ GDP                                           : Factor w/ 3 levels "Low","Medium",..: 3 2 3 1 2 2 1 1 2 3 ...
##  $ Target                                        : Factor w/ 3 levels "Dropout","Enrolled",..: 1 3 1 3 3 3 3 1 3 1 ...

Variables Selection

Since we will be working for two different phases, I will be including all of the features for the phase 2 and omit three variables in phase 1 (which are: application order, nationality, and target). The “target” variables dictates whether student graduate or not.

Variables for Phase 1

data_general <- data[, !colnames(data) %in% c("Application.order", "Nacionality", "Target")]

Variables for Phase 2

data_with_target <- data

Verifying whether or not the required variables are included in the variables for both phases.

names(data_general)
##  [1] "Marital.status"                                
##  [2] "Application.mode"                              
##  [3] "Course"                                        
##  [4] "Daytime.evening.attendance."                   
##  [5] "Previous.qualification"                        
##  [6] "Previous.qualification..grade."                
##  [7] "Mother.s.qualification"                        
##  [8] "Father.s.qualification"                        
##  [9] "Mother.s.occupation"                           
## [10] "Father.s.occupation"                           
## [11] "Admission.grade"                               
## [12] "Displaced"                                     
## [13] "Educational.special.needs"                     
## [14] "Debtor"                                        
## [15] "Tuition.fees.up.to.date"                       
## [16] "Gender"                                        
## [17] "Scholarship.holder"                            
## [18] "Age.at.enrollment"                             
## [19] "International"                                 
## [20] "Curricular.units.1st.sem..credited."           
## [21] "Curricular.units.1st.sem..enrolled."           
## [22] "Curricular.units.1st.sem..evaluations."        
## [23] "Curricular.units.1st.sem..approved."           
## [24] "Curricular.units.1st.sem..grade."              
## [25] "Curricular.units.1st.sem..without.evaluations."
## [26] "Curricular.units.2nd.sem..credited."           
## [27] "Curricular.units.2nd.sem..enrolled."           
## [28] "Curricular.units.2nd.sem..evaluations."        
## [29] "Curricular.units.2nd.sem..approved."           
## [30] "Curricular.units.2nd.sem..grade."              
## [31] "Curricular.units.2nd.sem..without.evaluations."
## [32] "Unemployment.rate"                             
## [33] "Inflation.rate"                                
## [34] "GDP"
names(data_with_target)
##  [1] "Marital.status"                                
##  [2] "Application.mode"                              
##  [3] "Application.order"                             
##  [4] "Course"                                        
##  [5] "Daytime.evening.attendance."                   
##  [6] "Previous.qualification"                        
##  [7] "Previous.qualification..grade."                
##  [8] "Nacionality"                                   
##  [9] "Mother.s.qualification"                        
## [10] "Father.s.qualification"                        
## [11] "Mother.s.occupation"                           
## [12] "Father.s.occupation"                           
## [13] "Admission.grade"                               
## [14] "Displaced"                                     
## [15] "Educational.special.needs"                     
## [16] "Debtor"                                        
## [17] "Tuition.fees.up.to.date"                       
## [18] "Gender"                                        
## [19] "Scholarship.holder"                            
## [20] "Age.at.enrollment"                             
## [21] "International"                                 
## [22] "Curricular.units.1st.sem..credited."           
## [23] "Curricular.units.1st.sem..enrolled."           
## [24] "Curricular.units.1st.sem..evaluations."        
## [25] "Curricular.units.1st.sem..approved."           
## [26] "Curricular.units.1st.sem..grade."              
## [27] "Curricular.units.1st.sem..without.evaluations."
## [28] "Curricular.units.2nd.sem..credited."           
## [29] "Curricular.units.2nd.sem..enrolled."           
## [30] "Curricular.units.2nd.sem..evaluations."        
## [31] "Curricular.units.2nd.sem..approved."           
## [32] "Curricular.units.2nd.sem..grade."              
## [33] "Curricular.units.2nd.sem..without.evaluations."
## [34] "Unemployment.rate"                             
## [35] "Inflation.rate"                                
## [36] "GDP"                                           
## [37] "Target"

Transform Data into Transaction Format

# General dataset
transactions_general <- as(data_general, "transactions")

# Target-specific dataset
transactions_with_target <- as(data_with_target, "transactions")

Checking first five transactions for the phase one dataset.

inspect(head(transactions_general,5))
##     items                                                                              transactionID
## [1] {Marital.status=Single,                                                                         
##      Application.mode=2nd Phase - General Contingent,                                               
##      Course=Animation and Multimedia Design,                                                        
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Mother.s.qualification=Basic Education 3rd Cycle (9th/10th/11th Year) or Equiv.,               
##      Father.s.qualification=Other - 11th Year of Schooling,                                         
##      Mother.s.occupation=Personal Services, Security and Safety Workers and Sellers,                
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=Medium,                                                                        
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=None,                                                      
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Little,                                                    
##      Curricular.units.1st.sem..grade.=Low,                                                          
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=None,                                                      
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Little,                                                    
##      Curricular.units.2nd.sem..grade.=Low,                                                          
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Medium,                                                                      
##      Inflation.rate=Medium,                                                                         
##      GDP=High}                                                                                     1
## [2] {Marital.status=Single,                                                                         
##      Application.mode=International Student (Bachelor),                                             
##      Course=Tourism,                                                                                
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=High,                                                           
##      Mother.s.qualification=Secondary Education - 12th Year of Schooling or Eq.,                    
##      Father.s.qualification=Higher Education - Degree,                                              
##      Mother.s.occupation=Intermediate Level Technicians and Professions,                            
##      Father.s.occupation=Intermediate Level Technicians and Professions,                            
##      Admission.grade=High,                                                                          
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=No,                                                                    
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Many,                                                      
##      Curricular.units.1st.sem..grade.=High,                                                         
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Many,                                                      
##      Curricular.units.2nd.sem..grade.=High,                                                         
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=High,                                                                        
##      Inflation.rate=Low,                                                                            
##      GDP=Medium}                                                                                   2
## [3] {Marital.status=Single,                                                                         
##      Application.mode=1st Phase - General Contingent,                                               
##      Course=Communication Design,                                                                   
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Mother.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Father.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Mother.s.occupation=Unskilled Workers,                                                         
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=Medium,                                                                        
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=No,                                                                    
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Little,                                                    
##      Curricular.units.1st.sem..grade.=Low,                                                          
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Little,                                                    
##      Curricular.units.2nd.sem..grade.=Low,                                                          
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Medium,                                                                      
##      Inflation.rate=Medium,                                                                         
##      GDP=High}                                                                                     3
## [4] {Marital.status=Single,                                                                         
##      Application.mode=2nd Phase - General Contingent,                                               
##      Course=Journalism and Communication,                                                           
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Mother.s.qualification=Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.,                 
##      Father.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Mother.s.occupation=Personal Services, Security and Safety Workers and Sellers,                
##      Father.s.occupation=Intermediate Level Technicians and Professions,                            
##      Admission.grade=Low,                                                                           
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Female,                                                                                 
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Medium,                                                 
##      Curricular.units.1st.sem..approved.=Many,                                                      
##      Curricular.units.1st.sem..grade.=High,                                                         
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=High,                                                   
##      Curricular.units.2nd.sem..approved.=Moderate,                                                  
##      Curricular.units.2nd.sem..grade.=Medium,                                                       
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Low,                                                                         
##      Inflation.rate=Low,                                                                            
##      GDP=Low}                                                                                      4
## [5] {Marital.status=Married,                                                                        
##      Application.mode=Over 23 years old,                                                            
##      Course=Social Service (Evening Attendance),                                                    
##      Daytime.evening.attendance.=Evening,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Mother.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Father.s.qualification=Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.,                 
##      Mother.s.occupation=Unskilled Workers,                                                         
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=High,                                                                          
##      Displaced=No,                                                                                  
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Female,                                                                                 
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Old,                                                                         
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=High,                                                   
##      Curricular.units.1st.sem..approved.=Moderate,                                                  
##      Curricular.units.1st.sem..grade.=Medium,                                                       
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Many,                                                      
##      Curricular.units.2nd.sem..grade.=Medium,                                                       
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=High,                                                                        
##      Inflation.rate=Low,                                                                            
##      GDP=Medium}                                                                                   5

Checking first five transactions for the phase two dataset.

inspect(head(transactions_with_target,5))
##     items                                                                              transactionID
## [1] {Marital.status=Single,                                                                         
##      Application.mode=2nd Phase - General Contingent,                                               
##      Application.order=5,                                                                           
##      Course=Animation and Multimedia Design,                                                        
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Nacionality=Portuguese,                                                                        
##      Mother.s.qualification=Basic Education 3rd Cycle (9th/10th/11th Year) or Equiv.,               
##      Father.s.qualification=Other - 11th Year of Schooling,                                         
##      Mother.s.occupation=Personal Services, Security and Safety Workers and Sellers,                
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=Medium,                                                                        
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=None,                                                      
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Little,                                                    
##      Curricular.units.1st.sem..grade.=Low,                                                          
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=None,                                                      
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Little,                                                    
##      Curricular.units.2nd.sem..grade.=Low,                                                          
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Medium,                                                                      
##      Inflation.rate=Medium,                                                                         
##      GDP=High,                                                                                      
##      Target=Dropout}                                                                               1
## [2] {Marital.status=Single,                                                                         
##      Application.mode=International Student (Bachelor),                                             
##      Application.order=5,                                                                           
##      Course=Tourism,                                                                                
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=High,                                                           
##      Nacionality=Portuguese,                                                                        
##      Mother.s.qualification=Secondary Education - 12th Year of Schooling or Eq.,                    
##      Father.s.qualification=Higher Education - Degree,                                              
##      Mother.s.occupation=Intermediate Level Technicians and Professions,                            
##      Father.s.occupation=Intermediate Level Technicians and Professions,                            
##      Admission.grade=High,                                                                          
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=No,                                                                    
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Many,                                                      
##      Curricular.units.1st.sem..grade.=High,                                                         
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Many,                                                      
##      Curricular.units.2nd.sem..grade.=High,                                                         
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=High,                                                                        
##      Inflation.rate=Low,                                                                            
##      GDP=Medium,                                                                                    
##      Target=Graduate}                                                                              2
## [3] {Marital.status=Single,                                                                         
##      Application.mode=1st Phase - General Contingent,                                               
##      Application.order=5,                                                                           
##      Course=Communication Design,                                                                   
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Nacionality=Portuguese,                                                                        
##      Mother.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Father.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Mother.s.occupation=Unskilled Workers,                                                         
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=Medium,                                                                        
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=No,                                                                    
##      Gender=Male,                                                                                   
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Low,                                                    
##      Curricular.units.1st.sem..approved.=Little,                                                    
##      Curricular.units.1st.sem..grade.=Low,                                                          
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Little,                                                    
##      Curricular.units.2nd.sem..grade.=Low,                                                          
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Medium,                                                                      
##      Inflation.rate=Medium,                                                                         
##      GDP=High,                                                                                      
##      Target=Dropout}                                                                               3
## [4] {Marital.status=Single,                                                                         
##      Application.mode=2nd Phase - General Contingent,                                               
##      Application.order=5,                                                                           
##      Course=Journalism and Communication,                                                           
##      Daytime.evening.attendance.=Daytime,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Nacionality=Portuguese,                                                                        
##      Mother.s.qualification=Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.,                 
##      Father.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Mother.s.occupation=Personal Services, Security and Safety Workers and Sellers,                
##      Father.s.occupation=Intermediate Level Technicians and Professions,                            
##      Admission.grade=Low,                                                                           
##      Displaced=Yes,                                                                                 
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Female,                                                                                 
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Middle-Age,                                                                  
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=Medium,                                                 
##      Curricular.units.1st.sem..approved.=Many,                                                      
##      Curricular.units.1st.sem..grade.=High,                                                         
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=High,                                                   
##      Curricular.units.2nd.sem..approved.=Moderate,                                                  
##      Curricular.units.2nd.sem..grade.=Medium,                                                       
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=Low,                                                                         
##      Inflation.rate=Low,                                                                            
##      GDP=Low,                                                                                       
##      Target=Graduate}                                                                              4
## [5] {Marital.status=Married,                                                                        
##      Application.mode=Over 23 years old,                                                            
##      Application.order=5,                                                                           
##      Course=Social Service (Evening Attendance),                                                    
##      Daytime.evening.attendance.=Evening,                                                           
##      Previous.qualification=Secondary Education,                                                    
##      Previous.qualification..grade.=Low,                                                            
##      Nacionality=Portuguese,                                                                        
##      Mother.s.qualification=Basic education 1st cycle (4th/5th year) or equiv.,                     
##      Father.s.qualification=Basic Education 2nd Cycle (6th/7th/8th Year) or Equiv.,                 
##      Mother.s.occupation=Unskilled Workers,                                                         
##      Father.s.occupation=Unskilled Workers,                                                         
##      Admission.grade=High,                                                                          
##      Displaced=No,                                                                                  
##      Educational.special.needs=No,                                                                  
##      Debtor=No,                                                                                     
##      Tuition.fees.up.to.date=Yes,                                                                   
##      Gender=Female,                                                                                 
##      Scholarship.holder=No,                                                                         
##      Age.at.enrollment=Old,                                                                         
##      International=No,                                                                              
##      Curricular.units.1st.sem..credited.=None,                                                      
##      Curricular.units.1st.sem..enrolled.=Few,                                                       
##      Curricular.units.1st.sem..evaluations.=High,                                                   
##      Curricular.units.1st.sem..approved.=Moderate,                                                  
##      Curricular.units.1st.sem..grade.=Medium,                                                       
##      Curricular.units.1st.sem..without.evaluations.=None,                                           
##      Curricular.units.2nd.sem..credited.=None,                                                      
##      Curricular.units.2nd.sem..enrolled.=Few,                                                       
##      Curricular.units.2nd.sem..evaluations.=Low,                                                    
##      Curricular.units.2nd.sem..approved.=Many,                                                      
##      Curricular.units.2nd.sem..grade.=Medium,                                                       
##      Curricular.units.2nd.sem..without.evaluations.=None,                                           
##      Unemployment.rate=High,                                                                        
##      Inflation.rate=Low,                                                                            
##      GDP=Medium,                                                                                    
##      Target=Graduate}                                                                              5
itemFrequencyPlot(transactions_general, topN=20, type="absolute", main="Top 25 Most Frequent Items")

This bar chart above displays the top 25 most frequent items found in the dataset, representing the most common attributes among students. The x-axis lists these attributes, while the y-axis shows their absolute frequency across all transactions. The most frequent attributes include “Educational.special.needs=No”, “International=No”, and “Daytime.evening.attendance.=Daytime”, indicating that the majority of students do not have special educational needs, are not international students, and attend daytime classes. Additionally, features such as “Debtor=No” and “Tuition.fees.up.to.date=Yes” suggest that most students have their tuition paid on time.
On the academic side, “Curricular.units.2nd.sem..enrolled.=Few” and “Curricular.units.1st.sem..enrolled.=Few” are relatively common, which might indicate that many students enroll in only a few courses per semester. This frequency plot helps in understanding the dominant characteristics within the dataset and provides a foundation for exploring patterns related to student performance and outcomes.

Apriori Algorithm

Apriori - First Phase

Since I did not set a specific consequent for the rules, a large number of association rules were generated. To refine the results and focus on more meaningful patterns, I need to adjust the support and confidence parameters. By increasing the minimum support, I can filter out rules that apply to only a small fraction of the dataset, ensuring that the discovered relationships are more widely applicable.
Similarly, by raising the confidence threshold, I can prioritize rules that have a stronger predictive power, reducing the likelihood of weak or coincidental associations.

rules_general <- apriori(
  transactions_general,
  parameter = list(supp = 0.05, conf = 0.8, minlen = 2),
  control = list(verbose = FALSE))
inspect(sort(rules_general, by = "confidence", decreasing = TRUE)[1:20])
##      lhs                                              rhs                                              support confidence   coverage     lift count
## [1]  {Course=Communication Design}                 => {Daytime.evening.attendance.=Daytime}         0.05108499          1 0.05108499 1.122558   226
## [2]  {Curricular.units.1st.sem..credited.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05447559          1 0.05447559 3.464370   241
## [3]  {Curricular.units.1st.sem..credited.=Many}    => {Educational.special.needs=No}                0.05447559          1 0.05447559 1.011662   241
## [4]  {Course=Tourism}                              => {Daytime.evening.attendance.=Daytime}         0.05696203          1 0.05696203 1.122558   252
## [5]  {Course=Tourism}                              => {Educational.special.needs=No}                0.05696203          1 0.05696203 1.011662   252
## [6]  {Course=Management (Evening Attendance)}      => {Daytime.evening.attendance.=Evening}         0.06057866          1 0.06057866 9.159420   268
## [7]  {Course=Advertising and Marketing Management} => {Daytime.evening.attendance.=Daytime}         0.06057866          1 0.06057866 1.122558   268
## [8]  {Course=Journalism and Communication}         => {Daytime.evening.attendance.=Daytime}         0.07481917          1 0.07481917 1.122558   331
## [9]  {Course=Veterinary Nursing}                   => {Daytime.evening.attendance.=Daytime}         0.07617541          1 0.07617541 1.122558   337
## [10] {Course=Social Service}                       => {Daytime.evening.attendance.=Daytime}         0.08024412          1 0.08024412 1.122558   355
## [11] {Course=Management}                           => {Daytime.evening.attendance.=Daytime}         0.08589512          1 0.08589512 1.122558   380
## [12] {Course=Nursing}                              => {Daytime.evening.attendance.=Daytime}         0.17314647          1 0.17314647 1.122558   766
## [13] {Application.mode=Over 23 years old}          => {Age.at.enrollment=Old}                       0.17744123          1 0.17744123 2.852353   785
## [14] {Inflation.rate=Medium}                       => {GDP=High}                                    0.20185353          1 0.20185353 2.525114   893
## [15] {Course=Communication Design,                                                                                                                 
##       International=No}                            => {Daytime.evening.attendance.=Daytime}         0.05063291          1 0.05063291 1.122558   224
## [16] {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.2nd.sem..enrolled.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05357143          1 0.05357143 3.464370   237
## [17] {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.2nd.sem..enrolled.=Many}    => {Curricular.units.1st.sem..evaluations.=High} 0.05357143          1 0.05357143 2.566125   237
## [18] {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.2nd.sem..enrolled.=Many}    => {Educational.special.needs=No}                0.05357143          1 0.05357143 1.011662   237
## [19] {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.1st.sem..evaluations.=High} => {Curricular.units.1st.sem..enrolled.=Many}    0.05424955          1 0.05424955 3.464370   240
## [20] {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.1st.sem..approved.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05402351          1 0.05402351 3.464370   239

The first phase does not us new insight that much since it’s pretty much obvious. Many students in a vert particular courses, especially non-STEM, attend daytime classes - which is expected for them and pretty much confirms the program structure patterns. Moreover, there are other notable patterns highlight academic behaviors, such as students who earn many first-semester credits frequently enrolling in many courses or undergoing more evaluations,which I believe suggests that there’s a link between early academic performance and higher engagement.

A few rules seem less relevant to my analysis, such as the correlation between medium inflation rates and high GDP, which isn’t directly actionable for understanding student performance. Overall, these insights give me a broad overview of student behaviors, but they don’t yet help me identify the key predictors of dropout or graduation. That will be my main focus in Phase 2.

But before we continue to phase 2, I want to see if I could make any relevant rules more apparent by filtering out the weak and irrelevant ones. I’m doing this by introducting lift threshold within the parameter of the apriori rule. Since lift measures how much likely the rule consequent (RHS) occurs when rule antecedent (LHS). Thus by setting the lift to be above 1.2, we’re filtering out those rules who do not produce meaningful relationship.

rules_filtered <- subset(rules_general, lift > 1.2)
inspect(sort(rules_filtered, by = "confidence", decreasing = TRUE)[1:20])
##      lhs                                              rhs                                              support confidence   coverage     lift count
## [1]  {Curricular.units.1st.sem..credited.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05447559          1 0.05447559 3.464370   241
## [2]  {Course=Management (Evening Attendance)}      => {Daytime.evening.attendance.=Evening}         0.06057866          1 0.06057866 9.159420   268
## [3]  {Application.mode=Over 23 years old}          => {Age.at.enrollment=Old}                       0.17744123          1 0.17744123 2.852353   785
## [4]  {Inflation.rate=Medium}                       => {GDP=High}                                    0.20185353          1 0.20185353 2.525114   893
## [5]  {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.2nd.sem..enrolled.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05357143          1 0.05357143 3.464370   237
## [6]  {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.2nd.sem..enrolled.=Many}    => {Curricular.units.1st.sem..evaluations.=High} 0.05357143          1 0.05357143 2.566125   237
## [7]  {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.1st.sem..evaluations.=High} => {Curricular.units.1st.sem..enrolled.=Many}    0.05424955          1 0.05424955 3.464370   240
## [8]  {Curricular.units.1st.sem..credited.=Many,                                                                                                    
##       Curricular.units.1st.sem..approved.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05402351          1 0.05402351 3.464370   239
## [9]  {International=No,                                                                                                                            
##       Curricular.units.1st.sem..credited.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05289331          1 0.05289331 3.464370   234
## [10] {Educational.special.needs=No,                                                                                                                
##       Curricular.units.1st.sem..credited.=Many}    => {Curricular.units.1st.sem..enrolled.=Many}    0.05447559          1 0.05447559 3.464370   241
## [11] {International=No,                                                                                                                            
##       Curricular.units.1st.sem..credited.=Many}    => {Curricular.units.1st.sem..evaluations.=High} 0.05289331          1 0.05289331 2.566125   234
## [12] {Course=Tourism,                                                                                                                              
##       Curricular.units.1st.sem..enrolled.=Few}     => {Curricular.units.2nd.sem..enrolled.=Few}     0.05447559          1 0.05447559 1.477129   241
## [13] {Course=Tourism,                                                                                                                              
##       Curricular.units.1st.sem..credited.=None}    => {Curricular.units.1st.sem..enrolled.=Few}     0.05402351          1 0.05402351 1.491068   239
## [14] {Course=Tourism,                                                                                                                              
##       Curricular.units.2nd.sem..credited.=None}    => {Curricular.units.1st.sem..enrolled.=Few}     0.05424955          1 0.05424955 1.491068   240
## [15] {Course=Tourism,                                                                                                                              
##       Curricular.units.1st.sem..credited.=None}    => {Curricular.units.2nd.sem..enrolled.=Few}     0.05402351          1 0.05402351 1.477129   239
## [16] {Course=Tourism,                                                                                                                              
##       Curricular.units.2nd.sem..credited.=None}    => {Curricular.units.2nd.sem..enrolled.=Few}     0.05424955          1 0.05424955 1.477129   240
## [17] {Course=Management (Evening Attendance),                                                                                                      
##       Age.at.enrollment=Old}                       => {Daytime.evening.attendance.=Evening}         0.05877034          1 0.05877034 9.159420   260
## [18] {Course=Management (Evening Attendance),                                                                                                      
##       Curricular.units.1st.sem..enrolled.=Few}     => {Daytime.evening.attendance.=Evening}         0.05018083          1 0.05018083 9.159420   222
## [19] {Course=Management (Evening Attendance),                                                                                                      
##       Curricular.units.2nd.sem..enrolled.=Few}     => {Daytime.evening.attendance.=Evening}         0.05040687          1 0.05040687 9.159420   223
## [20] {Course=Management (Evening Attendance),                                                                                                      
##       Scholarship.holder=No}                       => {Daytime.evening.attendance.=Evening}         0.05583183          1 0.05583183 9.159420   247

Even after doing the filtering, I can see that some of the trivial relationship are still present. For instance, if a student is enrolled in an Evening Attendance program, of course, they will attend in the evening. But rules that highlight academic behaviours, such as curricular units credited, enrolled, and evaluated for the first and second semester. From here we can say that, the rulee suggest that high-performing students may take on heavier workloads.

plot(rules_filtered[1:10], method = "graph", engine = "htmlwidget")
rules_general_top <- head(sort(rules_filtered, by="confidence"), 50)
plot(rules_general_top, method = "grouped", control = list(k = 10))

Apriori - Second Phase: Target of Dropout

Now we move on to generating rules for the the second phase.

rules_dropout <- apriori(
  transactions_with_target,
  parameter = list(supp = 0.05, conf = 0.8, minlen = 2),
  appearance = list(rhs = "Target=Dropout", default = "lhs"),
  control = list(verbose = FALSE))
length(rules_dropout)
## [1] 73966
inspect(sort(rules_dropout, by = "confidence", decreasing = TRUE)[1:20])
##      lhs                                             rhs                 support confidence   coverage     lift count
## [1]  {Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05244123  0.9830508 0.05334539 3.060533   232
## [2]  {Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05244123  0.9830508 0.05334539 3.060533   232
## [3]  {Application.order=5,                                                                                           
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05244123  0.9830508 0.05334539 3.060533   232
## [4]  {Application.order=5,                                                                                           
##       Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05244123  0.9830508 0.05334539 3.060533   232
## [5]  {Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [6]  {Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [7]  {Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [8]  {Nacionality=Portuguese,                                                                                        
##       Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [9]  {Application.order=5,                                                                                           
##       Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [10] {Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [11] {Application.order=5,                                                                                           
##       Tuition.fees.up.to.date=No,                                                                                    
##       Scholarship.holder=No,                                                                                         
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..approved.=Little} => {Target=Dropout} 0.05108499  0.9826087 0.05198915 3.059156   226
## [12] {Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [13] {Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [14] {Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [15] {Nacionality=Portuguese,                                                                                        
##       Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [16] {Application.order=5,                                                                                           
##       Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [17] {Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [18] {Application.order=5,                                                                                           
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [19] {Nacionality=Portuguese,                                                                                        
##       Educational.special.needs=No,                                                                                  
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
## [20] {Application.order=5,                                                                                           
##       Nacionality=Portuguese,                                                                                        
##       Tuition.fees.up.to.date=No,                                                                                    
##       Age.at.enrollment=Old,                                                                                         
##       International=No,                                                                                              
##       Curricular.units.2nd.sem..grade.=Low}       => {Target=Dropout} 0.05040687  0.9823789 0.05131103 3.058441   223
inspect(sort(rules_dropout, by = "support", decreasing = TRUE)[1:20])
##      lhs                                              rhs                support confidence  coverage     lift count
## [1]  {Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1844485  0.8007851 0.2303345 2.493085   816
## [2]  {Application.order=5,                                                                                          
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1844485  0.8007851 0.2303345 2.493085   816
## [3]  {Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1828662  0.8025794 0.2278481 2.498671   809
## [4]  {Application.order=5,                                                                                          
##       Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1828662  0.8025794 0.2278481 2.498671   809
## [5]  {Scholarship.holder=No,                                                                                        
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..grade.=Low}        => {Target=Dropout} 0.1808318  0.8000000 0.2260398 2.490640   800
## [6]  {Application.order=5,                                                                                          
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..grade.=Low}        => {Target=Dropout} 0.1808318  0.8000000 0.2260398 2.490640   800
## [7]  {Nacionality=Portuguese,                                                                                       
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [8]  {Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [9]  {Nacionality=Portuguese,                                                                                       
##       Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [10] {Application.order=5,                                                                                          
##       Nacionality=Portuguese,                                                                                       
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [11] {Application.order=5,                                                                                          
##       Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [12] {Application.order=5,                                                                                          
##       Nacionality=Portuguese,                                                                                       
##       Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1803797  0.8020101 0.2249096 2.496898   798
## [13] {Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1794756  0.8004032 0.2242315 2.491896   794
## [14] {Application.order=5,                                                                                          
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1794756  0.8004032 0.2242315 2.491896   794
## [15] {Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..grade.=Low}        => {Target=Dropout} 0.1790235  0.8016194 0.2233273 2.495682   792
## [16] {Application.order=5,                                                                                          
##       Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.2nd.sem..credited.=None,                                                                     
##       Curricular.units.2nd.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..grade.=Low}        => {Target=Dropout} 0.1790235  0.8016194 0.2233273 2.495682   792
## [17] {Nacionality=Portuguese,                                                                                       
##       Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1787975  0.8038618 0.2224231 2.502663   791
## [18] {Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1787975  0.8038618 0.2224231 2.502663   791
## [19] {Nacionality=Portuguese,                                                                                       
##       Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       International=No,                                                                                             
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1787975  0.8038618 0.2224231 2.502663   791
## [20] {Application.order=5,                                                                                          
##       Nacionality=Portuguese,                                                                                       
##       Educational.special.needs=No,                                                                                 
##       Scholarship.holder=No,                                                                                        
##       Curricular.units.1st.sem..approved.=Little,                                                                   
##       Curricular.units.2nd.sem..approved.=Little}  => {Target=Dropout} 0.1787975  0.8038618 0.2224231 2.502663   791

The association rules predicting dropout show me that financial instability, academic struggles, and age play a significant role in whether a student is at risk. The most frequent pattern I found indicates that students who aren’t up to date on tuition fees, don’t receive scholarships, are older, and have performed poorly in their second semester are highly likely to drop out. The high confidence values (≥98%) tell me that when these conditions co-occur, dropout is almost certain. I also noticed that nationality (Portuguese students) and having no special educational needs appear in several rules, though I’d need to explore further to fully understand their impact. Since the lift values (~3.06) suggest that these factors increase dropout likelihood well beyond random chance, I can see that financial and academic struggles are clear risk indicators.

I also found that students who applied as their fifth choice, those who weren’t international, and those with low second-semester grades have a strong link to dropping out. This tells me that choosing a program as a lower preference and struggling academically later on are important signals of dropout risk. The insights I’ve gained suggest that financial aid, academic support, and better early engagement strategies could help students at risk.

Moving on, I want to check on how these factors compare to those leading to graduation.

plot(rules_dropout[1:10], method = "graph", engine="htmlwidget")
rules_dropout_top <- head(sort(rules_dropout, by="confidence"), 50)
plot(rules_dropout_top, method = "grouped", control = list(k = 10))

From the plot above, it reveals that students who lack scholarships, have low curricular unit grades, and earn few or no credits in both semesters are highly associated with dropout. Additionally, students who are Portuguese, not international, and have little academic engagement (low enrollment and few approvals) also appear frequently in dropout-related rules. This suggests that academic performance and financial support play a crucial role in student retention, with those struggling academically and lacking financial aid being the most at risk of leaving their studies.

Apriori - Second Phase: Target of Graduate

We are now applying apriori algorithm and setting “Graduate” as the rule consequent.

rules_graduate <- apriori(
  transactions_with_target, 
  parameter = list(supp = 0.05, conf = 0.8, minlen = 2),
  appearance = list(rhs = "Target=Graduate", default = "lhs"),  
  control = list(verbose = FALSE)
)
length(rules_graduate)
## [1] 657750
inspect(sort(rules_graduate, by = "confidence", decreasing = TRUE)[1:20])
##      lhs                                                       rhs                  support confidence   coverage     lift count
## [1]  {Previous.qualification..grade.=Medium,                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06193490  0.9681979 0.06396926 1.939026   274
## [2]  {Previous.qualification..grade.=Medium,                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06125678  0.9678571 0.06329114 1.938343   271
## [3]  {Previous.qualification..grade.=Medium,                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.06125678  0.9678571 0.06329114 1.938343   271
## [4]  {Tuition.fees.up.to.date=Yes,                                                                                              
##       Gender=Female,                                                                                                            
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.06690778  0.9673203 0.06916817 1.937268   296
## [5]  {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                                 
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07346293  0.9672619 0.07594937 1.937151   325
## [6]  {Previous.qualification..grade.=Medium,                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High}                => {Target=Graduate} 0.05266727  0.9668050 0.05447559 1.936236   233
## [7]  {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.07820976  0.9664804 0.08092224 1.935586   346
## [8]  {Previous.qualification..grade.=Medium,                                                                                    
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.05176311  0.9662447 0.05357143 1.935114   229
## [9]  {Previous.qualification..grade.=Medium,                                                                                    
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..grade.=High,                                                                                    
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High}                => {Target=Graduate} 0.05153707  0.9661017 0.05334539 1.934827   228
## [10] {Nacionality=Portuguese,                                                                                                   
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07707957  0.9660057 0.07979204 1.934635   341
## [11] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       International=No,                                                                                                         
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07707957  0.9660057 0.07979204 1.934635   341
## [12] {Daytime.evening.attendance.=Daytime,                                                                                      
##       Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07662749  0.9658120 0.07933996 1.934247   339
## [13] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..credited.=None,                                                                                 
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07617541  0.9656160 0.07888788 1.933855   337
## [14] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..credited.=None,                                                                                 
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.07572333  0.9654179 0.07843580 1.933458   335
## [15] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..evaluations.=Medium,                                                                            
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.05018083  0.9652174 0.05198915 1.933056   222
## [16] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.07414105  0.9647059 0.07685353 1.932032   328
## [17] {Tuition.fees.up.to.date=Yes,                                                                                              
##       Gender=Female,                                                                                                            
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..evaluations.=Medium,                                                                            
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06781193  0.9646302 0.07029837 1.931881   300
## [18] {Previous.qualification..grade.=Medium,                                                                                    
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06148282  0.9645390 0.06374322 1.931698   272
## [19] {Application.order=5,                                                                                                      
##       Previous.qualification..grade.=Medium,                                                                                    
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06148282  0.9645390 0.06374322 1.931698   272
## [20] {Previous.qualification..grade.=Medium,                                                                                    
##       Scholarship.holder=Yes,                                                                                                   
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                      
##       Curricular.units.2nd.sem..approved.=Many,                                                                                 
##       Curricular.units.2nd.sem..grade.=High,                                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.06125678  0.9644128 0.06351718 1.931445   271
inspect(sort(rules_graduate, by = "support", decreasing = TRUE)[1:20])
##      lhs                                                       rhs                 support confidence  coverage     lift count
## [1]  {Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3508137  0.8112912 0.4324141 1.624786  1552
## [2]  {Application.order=5,                                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3508137  0.8112912 0.4324141 1.624786  1552
## [3]  {Educational.special.needs=No,                                                                                           
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3471971  0.8114105 0.4278933 1.625025  1536
## [4]  {Application.order=5,                                                                                                    
##       Educational.special.needs=No,                                                                                           
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3471971  0.8114105 0.4278933 1.625025  1536
## [5]  {Nacionality=Portuguese,                                                                                                 
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [6]  {Tuition.fees.up.to.date=Yes,                                                                                            
##       International=No,                                                                                                       
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [7]  {Nacionality=Portuguese,                                                                                                 
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       International=No,                                                                                                       
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [8]  {Application.order=5,                                                                                                    
##       Nacionality=Portuguese,                                                                                                 
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [9]  {Application.order=5,                                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       International=No,                                                                                                       
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [10] {Application.order=5,                                                                                                    
##       Nacionality=Portuguese,                                                                                                 
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       International=No,                                                                                                       
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3431284  0.8117647 0.4226944 1.625734  1518
## [11] {Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.3415461  0.8167568 0.4181736 1.635732  1511
## [12] {Application.order=5,                                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.3415461  0.8167568 0.4181736 1.635732  1511
## [13] {Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.3413201  0.8023379 0.4254069 1.606855  1510
## [14] {Application.order=5,                                                                                                    
##       Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.1st.sem..without.evaluations.=None,                                                                    
##       Curricular.units.2nd.sem..without.evaluations.=None}  => {Target=Graduate} 0.3413201  0.8023379 0.4254069 1.606855  1510
## [15] {Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.1st.sem..without.evaluations.=None}  => {Target=Graduate} 0.3401899  0.8179348 0.4159132 1.638091  1505
## [16] {Application.order=5,                                                                                                    
##       Tuition.fees.up.to.date=Yes,                                                                                            
##       Curricular.units.1st.sem..approved.=Many,                                                                               
##       Curricular.units.1st.sem..without.evaluations.=None}  => {Target=Graduate} 0.3401899  0.8179348 0.4159132 1.638091  1505
## [17] {Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.3397378  0.8271877 0.4107143 1.656622  1503
## [18] {Application.order=5,                                                                                                    
##       Curricular.units.2nd.sem..approved.=Many}             => {Target=Graduate} 0.3397378  0.8271877 0.4107143 1.656622  1503
## [19] {Debtor=No,                                                                                                              
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3397378  0.8111171 0.4188517 1.624437  1503
## [20] {Application.order=5,                                                                                                    
##       Debtor=No,                                                                                                              
##       Curricular.units.1st.sem..approved.=Many}             => {Target=Graduate} 0.3397378  0.8111171 0.4188517 1.624437  1503

The association rules predicting graduation show me that consistent academic performance, financial stability, and prior qualification grades play a key role in student success. The most frequent pattern I found suggests that students who paid their tuition on time, received scholarships, had medium-level previous qualification grades, and performed well in both semesters are highly likely to graduate. The high confidence values (≥96%) tell me that when these conditions are met, graduation is almost guaranteed. Additionally, students who had no missing evaluations, medium first-semester evaluations, and many approved second-semester courses also show strong associations with graduation. Since the lift values (~1.93) suggest that these factors significantly improve graduation likelihood, I can see that academic engagement and financial support are strong predictors of success.

I also noticed that female students who received scholarships, performed consistently, and had no missing coursework evaluations were strongly linked to graduation. This tells us that scholarship programs and structured academic support may be key contributors to student success. Comparing this to the dropout rules, it’s clear that students with financial security and stable academic performance throughout their coursework have a much higher chance of graduating.

The insights I have gained, could suggest that universities should focus on targeted financial aid programs and academic tracking to support students at risk of dropout.

plot(rules_graduate[1:10], method = "graph", engine="htmlwidget")
rules_graduate_top <- head(sort(rules_graduate, by="confidence"), 50)
plot(rules_graduate_top, method = "grouped", control = list(k = 10))

From the plot above, I can see that factors such as tuition fees being up-to-date, previous qualification grades being medium, high curricular unit grades, and scholarship holding play a significant role in graduation outcomes.
Additionally, demographic aspects like being Portuguese, female, and having no educational special needs also appear frequently in rules leading to graduation. This confirms that a mix of financial stability, academic performance, and background characteristics are key indicators of student success.

Conclusion

With all things considered, the project gave insightful information about the variables affecting student outcomes and suggested possible areas for institutions to intervene in order to increase retention rates. Institutions can improve student success by taking proactive measures like financial aid programs, academic support systems, and early dropout risk detection if they have a better grasp of these trends. Another research also shows that institutional support and engagement are essential in reducing university dropout rates. A systematic review highlighted that vocational guidance, academic support, and strong institutional backing are vital for improving student retention and success.

References

Al Husaini, A., & Ahmad Shukor, S. (2023). Factors affecting students’ academic performance: A review. ResearchGate. Retrieved February 21, 2025, from https://www.researchgate.net/publication/367360842_Factors_Affecting_Students%27_Academic_Performance_A_review

Scaler. (n.d.). Binning in data mining. Scaler Topics. Retrieved February 21, 2025, from https://www.scaler.com/topics/binning-in-data-mining

Quincho Apumayta, R., Carrillo Cayllahua, J., Ccencho Pari, A., Inga Choque, V., Cárdenas Valverde, J. C., & Huamán Ataypoma, D. (2024). University dropout: A systematic review of the main determinant factors (2020–2024). F1000Research, 13, 942. https://doi.org/10.12688/f1000research.154263.2