A data-set created from a higher education institution related to students enrolled in different undergraduate degrees, such as agronomy, design, education, nursing, journalism, management, social service, and technologies. The data set includes information known at the time of student enrollment (academic path, demographics, and social-economic factors) and the students’ academic performance at the end of the first and second semesters.
Source of the Data set - https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success
The primary goals of this data analysis project are as follows:
By achieving these objectives, we aim to provide valuable insights that can assist educational institutions in improving student retention and enhancing academic outcomes.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Loading the dataset
data <- read_delim("data.csv", delim = ";")
## Rows: 4424 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (1): Target
## dbl (36): Marital status, Application mode, Application order, Course, Dayti...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The structure of dataset is:
## cols(
## `Marital status` = col_double(),
## `Application mode` = col_double(),
## `Application order` = col_double(),
## Course = col_double(),
## `Daytime/evening attendance` = col_double(),
## `Previous qualification` = col_double(),
## `Previous qualification (grade)` = col_double(),
## Nacionality = col_double(),
## `Mother's qualification` = col_double(),
## `Father's qualification` = col_double(),
## `Mother's occupation` = col_double(),
## `Father's occupation` = col_double(),
## `Admission grade` = col_double(),
## Displaced = col_double(),
## `Educational special needs` = col_double(),
## Debtor = col_double(),
## `Tuition fees up to date` = col_double(),
## Gender = col_double(),
## `Scholarship holder` = col_double(),
## `Age at enrollment` = col_double(),
## International = col_double(),
## `Curricular units 1st sem (credited)` = col_double(),
## `Curricular units 1st sem (enrolled)` = col_double(),
## `Curricular units 1st sem (evaluations)` = col_double(),
## `Curricular units 1st sem (approved)` = col_double(),
## `Curricular units 1st sem (grade)` = col_double(),
## `Curricular units 1st sem (without evaluations)` = col_double(),
## `Curricular units 2nd sem (credited)` = col_double(),
## `Curricular units 2nd sem (enrolled)` = col_double(),
## `Curricular units 2nd sem (evaluations)` = col_double(),
## `Curricular units 2nd sem (approved)` = col_double(),
## `Curricular units 2nd sem (grade)` = col_double(),
## `Curricular units 2nd sem (without evaluations)` = col_double(),
## `Unemployment rate` = col_double(),
## `Inflation rate` = col_double(),
## GDP = col_double(),
## Target = col_character()
## )
## spc_tbl_ [4,424 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Marital status : num [1:4424] 1 1 1 1 2 2 1 1 1 1 ...
## $ Application mode : num [1:4424] 17 15 1 17 39 39 1 18 1 1 ...
## $ Application order : num [1:4424] 5 1 5 2 1 1 1 4 3 1 ...
## $ Course : num [1:4424] 171 9254 9070 9773 8014 ...
## $ Daytime/evening attendance : num [1:4424] 1 1 1 1 0 0 1 1 1 1 ...
## $ Previous qualification : num [1:4424] 1 1 1 1 1 19 1 1 1 1 ...
## $ Previous qualification (grade) : num [1:4424] 122 160 122 122 100 ...
## $ Nacionality : num [1:4424] 1 1 1 1 1 1 1 1 62 1 ...
## $ Mother's qualification : num [1:4424] 19 1 37 38 37 37 19 37 1 1 ...
## $ Father's qualification : num [1:4424] 12 3 37 37 38 37 38 37 1 19 ...
## $ Mother's occupation : num [1:4424] 5 3 9 5 9 9 7 9 9 4 ...
## $ Father's occupation : num [1:4424] 9 3 9 3 9 7 10 9 9 7 ...
## $ Admission grade : num [1:4424] 127 142 125 120 142 ...
## $ Displaced : num [1:4424] 1 1 1 1 0 0 1 1 0 1 ...
## $ Educational special needs : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
## $ Debtor : num [1:4424] 0 0 0 0 0 1 0 0 0 1 ...
## $ Tuition fees up to date : num [1:4424] 1 0 0 1 1 1 1 0 1 0 ...
## $ Gender : num [1:4424] 1 1 1 0 0 1 0 1 0 0 ...
## $ Scholarship holder : num [1:4424] 0 0 0 0 0 0 1 0 1 0 ...
## $ Age at enrollment : num [1:4424] 20 19 19 20 45 50 18 22 21 18 ...
## $ International : num [1:4424] 0 0 0 0 0 0 0 0 1 0 ...
## $ Curricular units 1st sem (credited) : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
## $ Curricular units 1st sem (enrolled) : num [1:4424] 0 6 6 6 6 5 7 5 6 6 ...
## $ Curricular units 1st sem (evaluations) : num [1:4424] 0 6 0 8 9 10 9 5 8 9 ...
## $ Curricular units 1st sem (approved) : num [1:4424] 0 6 0 6 5 5 7 0 6 5 ...
## $ Curricular units 1st sem (grade) : num [1:4424] 0 14 0 13.4 12.3 ...
## $ Curricular units 1st sem (without evaluations): num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
## $ Curricular units 2nd sem (credited) : num [1:4424] 0 0 0 0 0 0 0 0 0 0 ...
## $ Curricular units 2nd sem (enrolled) : num [1:4424] 0 6 6 6 6 5 8 5 6 6 ...
## $ Curricular units 2nd sem (evaluations) : num [1:4424] 0 6 0 10 6 17 8 5 7 14 ...
## $ Curricular units 2nd sem (approved) : num [1:4424] 0 6 0 5 6 5 8 0 6 2 ...
## $ Curricular units 2nd sem (grade) : num [1:4424] 0 13.7 0 12.4 13 ...
## $ Curricular units 2nd sem (without evaluations): num [1:4424] 0 0 0 0 0 5 0 0 0 0 ...
## $ Unemployment rate : num [1:4424] 10.8 13.9 10.8 9.4 13.9 16.2 15.5 15.5 16.2 8.9 ...
## $ Inflation rate : num [1:4424] 1.4 -0.3 1.4 -0.8 -0.3 0.3 2.8 2.8 0.3 1.4 ...
## $ GDP : num [1:4424] 1.74 0.79 1.74 -3.12 0.79 -0.92 -4.06 -4.06 -0.92 3.51 ...
## $ Target : chr [1:4424] "Dropout" "Graduate" "Dropout" "Graduate" ...
## - attr(*, "spec")=
## .. cols(
## .. `Marital status` = col_double(),
## .. `Application mode` = col_double(),
## .. `Application order` = col_double(),
## .. Course = col_double(),
## .. `Daytime/evening attendance` = col_double(),
## .. `Previous qualification` = col_double(),
## .. `Previous qualification (grade)` = col_double(),
## .. Nacionality = col_double(),
## .. `Mother's qualification` = col_double(),
## .. `Father's qualification` = col_double(),
## .. `Mother's occupation` = col_double(),
## .. `Father's occupation` = col_double(),
## .. `Admission grade` = col_double(),
## .. Displaced = col_double(),
## .. `Educational special needs` = col_double(),
## .. Debtor = col_double(),
## .. `Tuition fees up to date` = col_double(),
## .. Gender = col_double(),
## .. `Scholarship holder` = col_double(),
## .. `Age at enrollment` = col_double(),
## .. International = col_double(),
## .. `Curricular units 1st sem (credited)` = col_double(),
## .. `Curricular units 1st sem (enrolled)` = col_double(),
## .. `Curricular units 1st sem (evaluations)` = col_double(),
## .. `Curricular units 1st sem (approved)` = col_double(),
## .. `Curricular units 1st sem (grade)` = col_double(),
## .. `Curricular units 1st sem (without evaluations)` = col_double(),
## .. `Curricular units 2nd sem (credited)` = col_double(),
## .. `Curricular units 2nd sem (enrolled)` = col_double(),
## .. `Curricular units 2nd sem (evaluations)` = col_double(),
## .. `Curricular units 2nd sem (approved)` = col_double(),
## .. `Curricular units 2nd sem (grade)` = col_double(),
## .. `Curricular units 2nd sem (without evaluations)` = col_double(),
## .. `Unemployment rate` = col_double(),
## .. `Inflation rate` = col_double(),
## .. GDP = col_double(),
## .. Target = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Summary of the data Here we can see the MIN- MAX, Mean, Median and Quadrant
## Marital status Application mode Application order Course
## Min. :1.000 Min. : 1.00 Min. :0.000 Min. : 33
## 1st Qu.:1.000 1st Qu.: 1.00 1st Qu.:1.000 1st Qu.:9085
## Median :1.000 Median :17.00 Median :1.000 Median :9238
## Mean :1.179 Mean :18.67 Mean :1.728 Mean :8857
## 3rd Qu.:1.000 3rd Qu.:39.00 3rd Qu.:2.000 3rd Qu.:9556
## Max. :6.000 Max. :57.00 Max. :9.000 Max. :9991
## Daytime/evening attendance Previous qualification
## Min. :0.0000 Min. : 1.000
## 1st Qu.:1.0000 1st Qu.: 1.000
## Median :1.0000 Median : 1.000
## Mean :0.8908 Mean : 4.578
## 3rd Qu.:1.0000 3rd Qu.: 1.000
## Max. :1.0000 Max. :43.000
## Previous qualification (grade) Nacionality Mother's qualification
## Min. : 95.0 Min. : 1.000 Min. : 1.00
## 1st Qu.:125.0 1st Qu.: 1.000 1st Qu.: 2.00
## Median :133.1 Median : 1.000 Median :19.00
## Mean :132.6 Mean : 1.873 Mean :19.56
## 3rd Qu.:140.0 3rd Qu.: 1.000 3rd Qu.:37.00
## Max. :190.0 Max. :109.000 Max. :44.00
## Father's qualification Mother's occupation Father's occupation Admission grade
## Min. : 1.00 Min. : 0.00 Min. : 0.00 Min. : 95.0
## 1st Qu.: 3.00 1st Qu.: 4.00 1st Qu.: 4.00 1st Qu.:117.9
## Median :19.00 Median : 5.00 Median : 7.00 Median :126.1
## Mean :22.28 Mean : 10.96 Mean : 11.03 Mean :127.0
## 3rd Qu.:37.00 3rd Qu.: 9.00 3rd Qu.: 9.00 3rd Qu.:134.8
## Max. :44.00 Max. :194.00 Max. :195.00 Max. :190.0
## Displaced Educational special needs Debtor
## Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :1.0000 Median :0.00000 Median :0.0000
## Mean :0.5484 Mean :0.01153 Mean :0.1137
## 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.00000 Max. :1.0000
## Tuition fees up to date Gender Scholarship holder Age at enrollment
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :17.00
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:19.00
## Median :1.0000 Median :0.0000 Median :0.0000 Median :20.00
## Mean :0.8807 Mean :0.3517 Mean :0.2484 Mean :23.27
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:25.00
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :70.00
## International Curricular units 1st sem (credited)
## Min. :0.00000 Min. : 0.00
## 1st Qu.:0.00000 1st Qu.: 0.00
## Median :0.00000 Median : 0.00
## Mean :0.02486 Mean : 0.71
## 3rd Qu.:0.00000 3rd Qu.: 0.00
## Max. :1.00000 Max. :20.00
## Curricular units 1st sem (enrolled) Curricular units 1st sem (evaluations)
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 5.000 1st Qu.: 6.000
## Median : 6.000 Median : 8.000
## Mean : 6.271 Mean : 8.299
## 3rd Qu.: 7.000 3rd Qu.:10.000
## Max. :26.000 Max. :45.000
## Curricular units 1st sem (approved) Curricular units 1st sem (grade)
## Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.000 1st Qu.:11.00
## Median : 5.000 Median :12.29
## Mean : 4.707 Mean :10.64
## 3rd Qu.: 6.000 3rd Qu.:13.40
## Max. :26.000 Max. :18.88
## Curricular units 1st sem (without evaluations)
## Min. : 0.0000
## 1st Qu.: 0.0000
## Median : 0.0000
## Mean : 0.1377
## 3rd Qu.: 0.0000
## Max. :12.0000
## Curricular units 2nd sem (credited) Curricular units 2nd sem (enrolled)
## Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 5.000
## Median : 0.0000 Median : 6.000
## Mean : 0.5418 Mean : 6.232
## 3rd Qu.: 0.0000 3rd Qu.: 7.000
## Max. :19.0000 Max. :23.000
## Curricular units 2nd sem (evaluations) Curricular units 2nd sem (approved)
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 6.000 1st Qu.: 2.000
## Median : 8.000 Median : 5.000
## Mean : 8.063 Mean : 4.436
## 3rd Qu.:10.000 3rd Qu.: 6.000
## Max. :33.000 Max. :20.000
## Curricular units 2nd sem (grade)
## Min. : 0.00
## 1st Qu.:10.75
## Median :12.20
## Mean :10.23
## 3rd Qu.:13.33
## Max. :18.57
## Curricular units 2nd sem (without evaluations) Unemployment rate
## Min. : 0.0000 Min. : 7.60
## 1st Qu.: 0.0000 1st Qu.: 9.40
## Median : 0.0000 Median :11.10
## Mean : 0.1503 Mean :11.57
## 3rd Qu.: 0.0000 3rd Qu.:13.90
## Max. :12.0000 Max. :16.20
## Inflation rate GDP Target
## Min. :-0.800 Min. :-4.060000 Length:4424
## 1st Qu.: 0.300 1st Qu.:-1.700000 Class :character
## Median : 1.400 Median : 0.320000 Mode :character
## Mean : 1.228 Mean : 0.001969
## 3rd Qu.: 2.600 3rd Qu.: 1.790000
## Max. : 3.700 Max. : 3.510000
Calculating the correlation of ‘Target’ with all other numeric columns
## Marital status Application mode Application order Course
## [1,] -0.08980353 -0.2217466 0.08979091 0.03421883
## Daytime/evening attendance Previous qualification
## [1,] 0.0751065 -0.05603859
## Previous qualification (grade) Nacionality Mother's qualification
## [1,] 0.1037637 -0.01480119 -0.04317772
## Father's qualification Mother's occupation Father's occupation
## [1,] -0.001392692 -0.005628565 -0.001898935
## Admission grade Displaced Educational special needs Debtor
## [1,] 0.1208892 0.1139856 -0.007353073 -0.2409989
## Tuition fees up to date Gender Scholarship holder Age at enrollment
## [1,] 0.4098268 -0.2292696 0.2975953 -0.2434375
## International Curricular units 1st sem (credited)
## [1,] 0.003933993 0.04814971
## Curricular units 1st sem (enrolled) Curricular units 1st sem (evaluations)
## [1,] 0.155974 0.04436155
## Curricular units 1st sem (approved) Curricular units 1st sem (grade)
## [1,] 0.5291233 0.4852074
## Curricular units 1st sem (without evaluations)
## [1,] -0.06870182
## Curricular units 2nd sem (credited) Curricular units 2nd sem (enrolled)
## [1,] 0.05400381 0.1758468
## Curricular units 2nd sem (evaluations) Curricular units 2nd sem (approved)
## [1,] 0.09272065 0.6241575
## Curricular units 2nd sem (grade)
## [1,] 0.5668273
## Curricular units 2nd sem (without evaluations) Unemployment rate
## [1,] -0.09402777 0.008626681
## Inflation rate GDP Target
## [1,] -0.02687406 0.04413469 1
##
## 0 1 2
## 1421 794 2209
We have counted the occurrences of different outcomes in the “Target” column of our dataset. Here are the results:
These counts provide us with valuable information about the distribution of student outcomes in our dataset.
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Warning: 'layout' objects don't have these attributes: 'NA'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'
## `summarise()` has grouped output by 'Scholarship holder'. You can override
## using the `.groups` argument.
## `summarise()` has grouped output by 'Scholarship holder'. You can override
## using the `.groups` argument.