Context:

The data were obtained in a survey of students math and portuguese language courses in secondary school. It contains a lot of interesting social, gender and study information about students.

Content:

Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:

school - student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira) sex - student’s sex (binary: ‘F’ - female or ‘M’ - male) age - student’s age (numeric: from 15 to 22) address - student’s home address type (binary: ‘U’ - urban or ‘R’ - rural) famsize - family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3) Pstatus - parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart) Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education) Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education) Mjob - mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’) Fjob - father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’) reason - reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’) guardian - student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’) traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) failures - number of past class failures (numeric: n if 1<=n<3, else 4) schoolsup - extra educational support (binary: yes or no) famsup - family educational support (binary: yes or no) paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) activities - extra-curricular activities (binary: yes or no) nursery - attended nursery school (binary: yes or no) higher - wants to take higher education (binary: yes or no) internet - Internet access at home (binary: yes or no) romantic - with a romantic relationship (binary: yes or no) famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent) freetime - free time after school (numeric: from 1 - very low to 5 - very high) goout - going out with friends (numeric: from 1 - very low to 5 - very high) Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high) Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) health - current health status (numeric: from 1 - very bad to 5 - very good) absences - number of school absences (numeric: from 0 to 93)

Source Information:

P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.

https://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.3.3
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.3.3
## Warning: package 'tibble' was built under R version 3.3.3
## Warning: package 'tidyr' was built under R version 3.3.3
## Warning: package 'readr' was built under R version 3.3.3
## Warning: package 'purrr' was built under R version 3.3.3
## Warning: package 'dplyr' was built under R version 3.3.3
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(plyr)
## Warning: package 'plyr' was built under R version 3.3.3
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
library(readr)
library(ggplot2)
student_info <- read_csv("~/2 MSSA/463/datasets/student-por.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   age = col_integer(),
##   Medu = col_integer(),
##   Fedu = col_integer(),
##   traveltime = col_integer(),
##   studytime = col_integer(),
##   failures = col_integer(),
##   famrel = col_integer(),
##   freetime = col_integer(),
##   goout = col_integer(),
##   Dalc = col_integer(),
##   Walc = col_integer(),
##   health = col_integer(),
##   absences = col_integer(),
##   G1 = col_integer(),
##   G2 = col_integer(),
##   G3 = col_integer()
## )
## See spec(...) for full column specifications.
View(student_info)
glimpse(student_info)
## Observations: 649
## Variables: 33
## $ school     <chr> "GP", "GP", "GP", "GP", "GP", "GP", "GP", "GP", "GP...
## $ sex        <chr> "F", "F", "F", "F", "F", "M", "M", "F", "M", "M", "...
## $ age        <int> 18, 17, 15, 15, 16, 16, 16, 17, 15, 15, 15, 15, 15,...
## $ address    <chr> "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", "...
## $ famsize    <chr> "GT3", "GT3", "LE3", "GT3", "GT3", "LE3", "LE3", "G...
## $ Pstatus    <chr> "A", "T", "T", "T", "T", "T", "T", "A", "A", "T", "...
## $ Medu       <int> 4, 1, 1, 4, 3, 4, 2, 4, 3, 3, 4, 2, 4, 4, 2, 4, 4, ...
## $ Fedu       <int> 4, 1, 1, 2, 3, 3, 2, 4, 2, 4, 4, 1, 4, 3, 2, 4, 4, ...
## $ Mjob       <chr> "at_home", "at_home", "at_home", "health", "other",...
## $ Fjob       <chr> "teacher", "other", "other", "services", "other", "...
## $ reason     <chr> "course", "course", "other", "home", "home", "reput...
## $ guardian   <chr> "mother", "father", "mother", "mother", "father", "...
## $ traveltime <int> 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 3, 1, 2, 1, 1, 1, ...
## $ studytime  <int> 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 1, 2, 3, 1, 3, ...
## $ failures   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ schoolsup  <chr> "yes", "no", "yes", "no", "no", "no", "no", "yes", ...
## $ famsup     <chr> "no", "yes", "no", "yes", "yes", "yes", "no", "yes"...
## $ paid       <chr> "no", "no", "no", "no", "no", "no", "no", "no", "no...
## $ activities <chr> "no", "no", "no", "yes", "no", "yes", "no", "no", "...
## $ nursery    <chr> "yes", "no", "yes", "yes", "yes", "yes", "yes", "ye...
## $ higher     <chr> "yes", "yes", "yes", "yes", "yes", "yes", "yes", "y...
## $ internet   <chr> "no", "yes", "yes", "yes", "no", "yes", "yes", "no"...
## $ romantic   <chr> "no", "no", "no", "yes", "no", "no", "no", "no", "n...
## $ famrel     <int> 4, 5, 4, 3, 4, 5, 4, 4, 4, 5, 3, 5, 4, 5, 4, 4, 3, ...
## $ freetime   <int> 3, 3, 3, 2, 3, 4, 4, 1, 2, 5, 3, 2, 3, 4, 5, 4, 2, ...
## $ goout      <int> 4, 3, 2, 2, 2, 2, 4, 4, 2, 1, 3, 2, 3, 3, 2, 4, 3, ...
## $ Dalc       <int> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ Walc       <int> 1, 1, 3, 1, 2, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1, 2, 2, ...
## $ health     <int> 3, 3, 3, 5, 5, 5, 3, 1, 1, 5, 2, 4, 5, 3, 3, 2, 2, ...
## $ absences   <int> 4, 2, 6, 0, 0, 6, 0, 2, 0, 0, 2, 0, 0, 0, 0, 6, 10,...
## $ G1         <int> 0, 9, 12, 14, 11, 12, 13, 10, 15, 12, 14, 10, 12, 1...
## $ G2         <int> 11, 11, 13, 14, 13, 12, 12, 13, 16, 12, 14, 12, 13,...
## $ G3         <int> 11, 11, 12, 14, 13, 13, 13, 13, 17, 13, 14, 13, 12,...

I changed the values of the of the variable Weekend alcohol consumption from int to factor and changes the values to 1 - very low to 5 - very high.

student_info$Walc <- as.factor(student_info$Walc)      
student_info$Walc <- mapvalues(student_info$Walc, 
                              from = 1:5, 
                              to = c("Very Low", "Low", "Medium", "High", "Very High"))
ggplot(student_info, aes(x=age, fill=Walc))+
      geom_histogram(binwidth=1, colour="black")+
      facet_grid(~Walc)+
      theme_minimal() +
      ggtitle("Weekend alcohol consumption per age")+
      xlab("Student's age")  

ggplot(student_info,  aes(x=age,  y= freetime, color = age )) +
  geom_jitter()+
  facet_grid(~reason, scales = "free") +
  scale_colour_gradientn(colours=rainbow(7))