Does alchohol consumption effect final grade?

library(ggplot2 )
library(corrplot)
library(DT)
library(knitr)
studentData <- read.csv(file= "student-mat.csv" , header=TRUE, sep=";" )
#Original data set source: https://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION

str(studentData)
## 'data.frame':    395 obs. of  33 variables:
##  $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex       : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
##  $ famsize   : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
##  $ Pstatus   : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
##  $ Fjob      : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
##  $ reason    : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
##  $ guardian  : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 3 0 0 0 0 0 0 0 ...
##  $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
##  $ famsup    : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ paid      : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
##  $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
##  $ nursery   : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
##  $ higher    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ internet  : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ romantic  : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  6 4 10 2 4 10 0 6 0 0 ...
##  $ G1        : int  5 5 7 15 6 15 12 6 16 14 ...
##  $ G2        : int  6 5 8 14 10 15 12 5 18 15 ...
##  $ G3        : int  6 6 10 15 10 15 11 6 19 15 ...
datatable(studentData)

Subsetting the

studentData <- studentData[,c("Dalc", "Walc", "G3")]
kable(head(studentData) ) 
Dalc Walc G3
1 1 6
1 1 6
2 3 10
1 1 15
1 2 10
1 2 15
names(studentData)[1]<-"WeekdayAlcoholConsumption"
names(studentData)[2]<-"WeekendAlcoholConsumption"
names(studentData)[3]<-"FinalMathGrade"

Adding Average Alchol consumption, weighting weekday at twice weekend, because there are more days in the week then weekend

studentData$AverageAlcoholConsumption <- ((studentData$WeekdayAlcoholConsumption*2) + studentData$WeekendAlcoholConsumption)/3
kable(head(studentData))
WeekdayAlcoholConsumption WeekendAlcoholConsumption FinalMathGrade AverageAlcoholConsumption
1 1 6 1.000000
1 1 6 1.000000
2 3 10 2.333333
1 1 15 1.000000
1 2 10 1.333333
1 2 15 1.333333

Summary Statistics

summary(studentData)
##  WeekdayAlcoholConsumption WeekendAlcoholConsumption FinalMathGrade 
##  Min.   :1.000             Min.   :1.000             Min.   : 0.00  
##  1st Qu.:1.000             1st Qu.:1.000             1st Qu.: 8.00  
##  Median :1.000             Median :2.000             Median :11.00  
##  Mean   :1.481             Mean   :2.291             Mean   :10.42  
##  3rd Qu.:2.000             3rd Qu.:3.000             3rd Qu.:14.00  
##  Max.   :5.000             Max.   :5.000             Max.   :20.00  
##  AverageAlcoholConsumption
##  Min.   :1.000            
##  1st Qu.:1.000            
##  Median :1.333            
##  Mean   :1.751            
##  3rd Qu.:2.333            
##  Max.   :5.000

histograms

ggplot (studentData ,  aes( FinalMathGrade, colour = as.factor( WeekdayAlcoholConsumption )   )) +geom_freqpoly(binwidth = 1)  

ggplot (studentData ,  aes( FinalMathGrade, colour = as.factor( WeekendAlcoholConsumption )   )) +geom_freqpoly(binwidth = 1)  

boxplots

ggplot (studentData ,  aes( factor ( WeekdayAlcoholConsumption ) , FinalMathGrade  )) +geom_boxplot()  

ggplot (studentData ,  aes( factor ( WeekendAlcoholConsumption ) , FinalMathGrade  )) +geom_boxplot()  

Scatterplot

ggplot (studentData ,  aes( x=FinalMathGrade,   y=AverageAlcoholConsumption ))  +geom_point()

Correlation Matrix Plot

M <- cor(studentData)
corrplot(M, method = "ellipse") 

While the collelation matrix proves that final grade is not correlated to the alchol consumption, it does show that alchol consumption on the weedays is correlated to the alchol consuption on the weekends.

Here are a plots that demonstrate that

ggplot (studentData ,  aes( WeekendAlcoholConsumption, colour = as.factor( WeekdayAlcoholConsumption )   )) +geom_freqpoly(binwidth = 1)