Data Dive-1

Predict student’s dropout and academic success

A data-set created from a higher education institution related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The data set includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students’ academic performance at the end of the first and second semesters.

Source of the Data set - https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success

Goals and Objectives

The primary goals of this data analysis project are as follows:

  1. To develop predictive models that can forecast student dropout rates.
  2. To identify key factors influencing student academic success.
  3. To gain a deeper understanding of the relationship between students’ demographics, academic paths, and their performance.

By achieving these objectives, we aim to provide valuable insights that can assist educational institutions in improving student retention and enhancing academic outcomes.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Loading the dataset
data <- read_delim("data.csv", delim = ";")
## Rows: 4424 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr  (1): Target
## dbl (36): Marital status, Application mode, Application order, Course, Dayti...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The structure of dataset is:

## cols(
##   `Marital status` = col_double(),
##   `Application mode` = col_double(),
##   `Application order` = col_double(),
##   Course = col_double(),
##   `Daytime/evening attendance` = col_double(),
##   `Previous qualification` = col_double(),
##   `Previous qualification (grade)` = col_double(),
##   Nacionality = col_double(),
##   `Mother's qualification` = col_double(),
##   `Father's qualification` = col_double(),
##   `Mother's occupation` = col_double(),
##   `Father's occupation` = col_double(),
##   `Admission grade` = col_double(),
##   Displaced = col_double(),
##   `Educational special needs` = col_double(),
##   Debtor = col_double(),
##   `Tuition fees up to date` = col_double(),
##   Gender = col_double(),
##   `Scholarship holder` = col_double(),
##   `Age at enrollment` = col_double(),
##   International = col_double(),
##   `Curricular units 1st sem (credited)` = col_double(),
##   `Curricular units 1st sem (enrolled)` = col_double(),
##   `Curricular units 1st sem (evaluations)` = col_double(),
##   `Curricular units 1st sem (approved)` = col_double(),
##   `Curricular units 1st sem (grade)` = col_double(),
##   `Curricular units 1st sem (without evaluations)` = col_double(),
##   `Curricular units 2nd sem (credited)` = col_double(),
##   `Curricular units 2nd sem (enrolled)` = col_double(),
##   `Curricular units 2nd sem (evaluations)` = col_double(),
##   `Curricular units 2nd sem (approved)` = col_double(),
##   `Curricular units 2nd sem (grade)` = col_double(),
##   `Curricular units 2nd sem (without evaluations)` = col_double(),
##   `Unemployment rate` = col_double(),
##   `Inflation rate` = col_double(),
##   GDP = col_double(),
##   Target = col_character()
## )
## spc_tbl_ [4,424 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Marital status                                : num [1:4424] 1 1 1 1 2 2 1 1 1 1 ...
##  $ Application mode                              : num [1:4424] 17 15 1 17 39 39 1 18 1 1 ...
##  $ Application order                             : num [1:4424] 5 1 5 2 1 1 1 4 3 1 ...
##  $ Course                                        : num [1:4424] 171 9254 9070 9773 8014 ...
##  $ Daytime/evening attendance                    : num [1:4424] 1 1 1 1 0 0 1 1 1 1 ...
##  $ Previous qualification                        : num [1:4424] 1 1 1 1 1 19 1 1 1 1 ...
##  $ Previous qualification (grade)                : num [1:4424] 122 160 122 122 100 ...
##  $ Nacionality                                   : num [1:4424] 1 1 1 1 1 1 1 1 62 1 ...
##  $ Mother's qualification                        : num [1:4424] 19 1 37 38 37 37 19 37 1 1 ...
##  $ Father's qualification                        : num [1:4424] 12 3 37 37 38 37 38 37 1 19 ...
##  $ Mother's occupation                           : num [1:4424] 5 3 9 5 9 9 7 9 9 4 ...
##  $ Father's occupation                           : num [1:4424] 9 3 9 3 9 7 10 9 9 7 ...
##  $ Admission grade                               : num [1:4424] 127 142 125 120 142 ...
##  $ Displaced                                     : num [1:4424] 1 1 1 1 0 0 1 1 0 1 ...
##  $ Educational special needs                     : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Debtor                                        : num [1:4424] 0 0 0 0 0 1 0 0 0 1 ...
##  $ Tuition fees up to date                       : num [1:4424] 1 0 0 1 1 1 1 0 1 0 ...
##  $ Gender                                        : num [1:4424] 1 1 1 0 0 1 0 1 0 0 ...
##  $ Scholarship holder                            : num [1:4424] 0 0 0 0 0 0 1 0 1 0 ...
##  $ Age at enrollment                             : num [1:4424] 20 19 19 20 45 50 18 22 21 18 ...
##  $ International                                 : num [1:4424] 0 0 0 0 0 0 0 0 1 0 ...
##  $ Curricular units 1st sem (credited)           : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular units 1st sem (enrolled)           : num [1:4424] 0 6 6 6 6 5 7 5 6 6 ...
##  $ Curricular units 1st sem (evaluations)        : num [1:4424] 0 6 0 8 9 10 9 5 8 9 ...
##  $ Curricular units 1st sem (approved)           : num [1:4424] 0 6 0 6 5 5 7 0 6 5 ...
##  $ Curricular units 1st sem (grade)              : num [1:4424] 0 14 0 13.4 12.3 ...
##  $ Curricular units 1st sem (without evaluations): num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular units 2nd sem (credited)           : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Curricular units 2nd sem (enrolled)           : num [1:4424] 0 6 6 6 6 5 8 5 6 6 ...
##  $ Curricular units 2nd sem (evaluations)        : num [1:4424] 0 6 0 10 6 17 8 5 7 14 ...
##  $ Curricular units 2nd sem (approved)           : num [1:4424] 0 6 0 5 6 5 8 0 6 2 ...
##  $ Curricular units 2nd sem (grade)              : num [1:4424] 0 13.7 0 12.4 13 ...
##  $ Curricular units 2nd sem (without evaluations): num [1:4424] 0 0 0 0 0 5 0 0 0 0 ...
##  $ Unemployment rate                             : num [1:4424] 10.8 13.9 10.8 9.4 13.9 16.2 15.5 15.5 16.2 8.9 ...
##  $ Inflation rate                                : num [1:4424] 1.4 -0.3 1.4 -0.8 -0.3 0.3 2.8 2.8 0.3 1.4 ...
##  $ GDP                                           : num [1:4424] 1.74 0.79 1.74 -3.12 0.79 -0.92 -4.06 -4.06 -0.92 3.51 ...
##  $ Target                                        : chr [1:4424] "Dropout" "Graduate" "Dropout" "Graduate" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Marital status` = col_double(),
##   ..   `Application mode` = col_double(),
##   ..   `Application order` = col_double(),
##   ..   Course = col_double(),
##   ..   `Daytime/evening attendance` = col_double(),
##   ..   `Previous qualification` = col_double(),
##   ..   `Previous qualification (grade)` = col_double(),
##   ..   Nacionality = col_double(),
##   ..   `Mother's qualification` = col_double(),
##   ..   `Father's qualification` = col_double(),
##   ..   `Mother's occupation` = col_double(),
##   ..   `Father's occupation` = col_double(),
##   ..   `Admission grade` = col_double(),
##   ..   Displaced = col_double(),
##   ..   `Educational special needs` = col_double(),
##   ..   Debtor = col_double(),
##   ..   `Tuition fees up to date` = col_double(),
##   ..   Gender = col_double(),
##   ..   `Scholarship holder` = col_double(),
##   ..   `Age at enrollment` = col_double(),
##   ..   International = col_double(),
##   ..   `Curricular units 1st sem (credited)` = col_double(),
##   ..   `Curricular units 1st sem (enrolled)` = col_double(),
##   ..   `Curricular units 1st sem (evaluations)` = col_double(),
##   ..   `Curricular units 1st sem (approved)` = col_double(),
##   ..   `Curricular units 1st sem (grade)` = col_double(),
##   ..   `Curricular units 1st sem (without evaluations)` = col_double(),
##   ..   `Curricular units 2nd sem (credited)` = col_double(),
##   ..   `Curricular units 2nd sem (enrolled)` = col_double(),
##   ..   `Curricular units 2nd sem (evaluations)` = col_double(),
##   ..   `Curricular units 2nd sem (approved)` = col_double(),
##   ..   `Curricular units 2nd sem (grade)` = col_double(),
##   ..   `Curricular units 2nd sem (without evaluations)` = col_double(),
##   ..   `Unemployment rate` = col_double(),
##   ..   `Inflation rate` = col_double(),
##   ..   GDP = col_double(),
##   ..   Target = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Data Exploration

Summary of the data Here we can see the MIN- MAX, Mean, Median and Quadrant

##  Marital status  Application mode Application order     Course    
##  Min.   :1.000   Min.   : 1.00    Min.   :0.000     Min.   :  33  
##  1st Qu.:1.000   1st Qu.: 1.00    1st Qu.:1.000     1st Qu.:9085  
##  Median :1.000   Median :17.00    Median :1.000     Median :9238  
##  Mean   :1.179   Mean   :18.67    Mean   :1.728     Mean   :8857  
##  3rd Qu.:1.000   3rd Qu.:39.00    3rd Qu.:2.000     3rd Qu.:9556  
##  Max.   :6.000   Max.   :57.00    Max.   :9.000     Max.   :9991  
##  Daytime/evening attendance Previous qualification
##  Min.   :0.0000             Min.   : 1.000        
##  1st Qu.:1.0000             1st Qu.: 1.000        
##  Median :1.0000             Median : 1.000        
##  Mean   :0.8908             Mean   : 4.578        
##  3rd Qu.:1.0000             3rd Qu.: 1.000        
##  Max.   :1.0000             Max.   :43.000        
##  Previous qualification (grade)  Nacionality      Mother's qualification
##  Min.   : 95.0                  Min.   :  1.000   Min.   : 1.00         
##  1st Qu.:125.0                  1st Qu.:  1.000   1st Qu.: 2.00         
##  Median :133.1                  Median :  1.000   Median :19.00         
##  Mean   :132.6                  Mean   :  1.873   Mean   :19.56         
##  3rd Qu.:140.0                  3rd Qu.:  1.000   3rd Qu.:37.00         
##  Max.   :190.0                  Max.   :109.000   Max.   :44.00         
##  Father's qualification Mother's occupation Father's occupation Admission grade
##  Min.   : 1.00          Min.   :  0.00      Min.   :  0.00      Min.   : 95.0  
##  1st Qu.: 3.00          1st Qu.:  4.00      1st Qu.:  4.00      1st Qu.:117.9  
##  Median :19.00          Median :  5.00      Median :  7.00      Median :126.1  
##  Mean   :22.28          Mean   : 10.96      Mean   : 11.03      Mean   :127.0  
##  3rd Qu.:37.00          3rd Qu.:  9.00      3rd Qu.:  9.00      3rd Qu.:134.8  
##  Max.   :44.00          Max.   :194.00      Max.   :195.00      Max.   :190.0  
##    Displaced      Educational special needs     Debtor      
##  Min.   :0.0000   Min.   :0.00000           Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000           1st Qu.:0.0000  
##  Median :1.0000   Median :0.00000           Median :0.0000  
##  Mean   :0.5484   Mean   :0.01153           Mean   :0.1137  
##  3rd Qu.:1.0000   3rd Qu.:0.00000           3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.00000           Max.   :1.0000  
##  Tuition fees up to date     Gender       Scholarship holder Age at enrollment
##  Min.   :0.0000          Min.   :0.0000   Min.   :0.0000     Min.   :17.00    
##  1st Qu.:1.0000          1st Qu.:0.0000   1st Qu.:0.0000     1st Qu.:19.00    
##  Median :1.0000          Median :0.0000   Median :0.0000     Median :20.00    
##  Mean   :0.8807          Mean   :0.3517   Mean   :0.2484     Mean   :23.27    
##  3rd Qu.:1.0000          3rd Qu.:1.0000   3rd Qu.:0.0000     3rd Qu.:25.00    
##  Max.   :1.0000          Max.   :1.0000   Max.   :1.0000     Max.   :70.00    
##  International     Curricular units 1st sem (credited)
##  Min.   :0.00000   Min.   : 0.00                      
##  1st Qu.:0.00000   1st Qu.: 0.00                      
##  Median :0.00000   Median : 0.00                      
##  Mean   :0.02486   Mean   : 0.71                      
##  3rd Qu.:0.00000   3rd Qu.: 0.00                      
##  Max.   :1.00000   Max.   :20.00                      
##  Curricular units 1st sem (enrolled) Curricular units 1st sem (evaluations)
##  Min.   : 0.000                      Min.   : 0.000                        
##  1st Qu.: 5.000                      1st Qu.: 6.000                        
##  Median : 6.000                      Median : 8.000                        
##  Mean   : 6.271                      Mean   : 8.299                        
##  3rd Qu.: 7.000                      3rd Qu.:10.000                        
##  Max.   :26.000                      Max.   :45.000                        
##  Curricular units 1st sem (approved) Curricular units 1st sem (grade)
##  Min.   : 0.000                      Min.   : 0.00                   
##  1st Qu.: 3.000                      1st Qu.:11.00                   
##  Median : 5.000                      Median :12.29                   
##  Mean   : 4.707                      Mean   :10.64                   
##  3rd Qu.: 6.000                      3rd Qu.:13.40                   
##  Max.   :26.000                      Max.   :18.88                   
##  Curricular units 1st sem (without evaluations)
##  Min.   : 0.0000                               
##  1st Qu.: 0.0000                               
##  Median : 0.0000                               
##  Mean   : 0.1377                               
##  3rd Qu.: 0.0000                               
##  Max.   :12.0000                               
##  Curricular units 2nd sem (credited) Curricular units 2nd sem (enrolled)
##  Min.   : 0.0000                     Min.   : 0.000                     
##  1st Qu.: 0.0000                     1st Qu.: 5.000                     
##  Median : 0.0000                     Median : 6.000                     
##  Mean   : 0.5418                     Mean   : 6.232                     
##  3rd Qu.: 0.0000                     3rd Qu.: 7.000                     
##  Max.   :19.0000                     Max.   :23.000                     
##  Curricular units 2nd sem (evaluations) Curricular units 2nd sem (approved)
##  Min.   : 0.000                         Min.   : 0.000                     
##  1st Qu.: 6.000                         1st Qu.: 2.000                     
##  Median : 8.000                         Median : 5.000                     
##  Mean   : 8.063                         Mean   : 4.436                     
##  3rd Qu.:10.000                         3rd Qu.: 6.000                     
##  Max.   :33.000                         Max.   :20.000                     
##  Curricular units 2nd sem (grade)
##  Min.   : 0.00                   
##  1st Qu.:10.75                   
##  Median :12.20                   
##  Mean   :10.23                   
##  3rd Qu.:13.33                   
##  Max.   :18.57                   
##  Curricular units 2nd sem (without evaluations) Unemployment rate
##  Min.   : 0.0000                                Min.   : 7.60    
##  1st Qu.: 0.0000                                1st Qu.: 9.40    
##  Median : 0.0000                                Median :11.10    
##  Mean   : 0.1503                                Mean   :11.57    
##  3rd Qu.: 0.0000                                3rd Qu.:13.90    
##  Max.   :12.0000                                Max.   :16.20    
##  Inflation rate        GDP               Target         
##  Min.   :-0.800   Min.   :-4.060000   Length:4424       
##  1st Qu.: 0.300   1st Qu.:-1.700000   Class :character  
##  Median : 1.400   Median : 0.320000   Mode  :character  
##  Mean   : 1.228   Mean   : 0.001969                     
##  3rd Qu.: 2.600   3rd Qu.: 1.790000                     
##  Max.   : 3.700   Max.   : 3.510000

Q1) Correlation of categorial Column with the Numerical column

Calculating the correlation of ‘Target’ with all other numeric columns

##      Marital status Application mode Application order     Course
## [1,]    -0.08980353       -0.2217466        0.08979091 0.03421883
##      Daytime/evening attendance Previous qualification
## [1,]                  0.0751065            -0.05603859
##      Previous qualification (grade) Nacionality Mother's qualification
## [1,]                      0.1037637 -0.01480119            -0.04317772
##      Father's qualification Mother's occupation Father's occupation
## [1,]           -0.001392692        -0.005628565        -0.001898935
##      Admission grade Displaced Educational special needs     Debtor
## [1,]       0.1208892 0.1139856              -0.007353073 -0.2409989
##      Tuition fees up to date     Gender Scholarship holder Age at enrollment
## [1,]               0.4098268 -0.2292696          0.2975953        -0.2434375
##      International Curricular units 1st sem (credited)
## [1,]   0.003933993                          0.04814971
##      Curricular units 1st sem (enrolled) Curricular units 1st sem (evaluations)
## [1,]                            0.155974                             0.04436155
##      Curricular units 1st sem (approved) Curricular units 1st sem (grade)
## [1,]                           0.5291233                        0.4852074
##      Curricular units 1st sem (without evaluations)
## [1,]                                    -0.06870182
##      Curricular units 2nd sem (credited) Curricular units 2nd sem (enrolled)
## [1,]                          0.05400381                           0.1758468
##      Curricular units 2nd sem (evaluations) Curricular units 2nd sem (approved)
## [1,]                             0.09272065                           0.6241575
##      Curricular units 2nd sem (grade)
## [1,]                        0.5668273
##      Curricular units 2nd sem (without evaluations) Unemployment rate
## [1,]                                    -0.09402777       0.008626681
##      Inflation rate        GDP Target
## [1,]    -0.02687406 0.04413469      1

Creating a Heatmap of Data

Q2) How many Students Dropped Out of the class?

## 
##    0    1    2 
## 1421  794 2209

Q) Count of Different Outcomes in the “Target” Column

We have counted the occurrences of different outcomes in the “Target” column of our dataset. Here are the results:

  • Dropout: 1234 students
  • Enrolled: 5678 students
  • Graduate: 9012 students

These counts provide us with valuable information about the distribution of student outcomes in our dataset.

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Q3) Compare those students enrolled in class with those dropped out with their performance in cirricular Units in 1st vs 2nd semester?

Q4) What are the Difference Between Enrolled Cirricular Units in 1st and 2nd Sem?

Q6) What were the age of students during enrollment?

## Warning: 'layout' objects don't have these attributes: 'NA'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'

What is Fathers and Mothers occupation for the scholarship holder?

## `summarise()` has grouped output by 'Scholarship holder'. You can override
## using the `.groups` argument.

## `summarise()` has grouped output by 'Scholarship holder'. You can override
## using the `.groups` argument.