Does alchohol consumption effect final grade?
library(ggplot2 )
library(corrplot)
library(DT)
library(knitr)
studentData <- read.csv(file= "student-mat.csv" , header=TRUE, sep=";" )
#Original data set source: https://archive.ics.uci.edu/ml/datasets/STUDENT+ALCOHOL+CONSUMPTION
str(studentData)
## 'data.frame': 395 obs. of 33 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
## $ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
## $ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
## $ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
## $ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
## $ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 3 0 0 0 0 0 0 0 ...
## $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
## $ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ paid : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
## $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
## $ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 6 4 10 2 4 10 0 6 0 0 ...
## $ G1 : int 5 5 7 15 6 15 12 6 16 14 ...
## $ G2 : int 6 5 8 14 10 15 12 5 18 15 ...
## $ G3 : int 6 6 10 15 10 15 11 6 19 15 ...
datatable(studentData)
Subsetting the
studentData <- studentData[,c("Dalc", "Walc", "G3")]
kable(head(studentData) )
| Dalc | Walc | G3 |
|---|---|---|
| 1 | 1 | 6 |
| 1 | 1 | 6 |
| 2 | 3 | 10 |
| 1 | 1 | 15 |
| 1 | 2 | 10 |
| 1 | 2 | 15 |
names(studentData)[1]<-"WeekdayAlcoholConsumption"
names(studentData)[2]<-"WeekendAlcoholConsumption"
names(studentData)[3]<-"FinalMathGrade"
Adding Average Alchol consumption, weighting weekday at twice weekend, because there are more days in the week then weekend
studentData$AverageAlcoholConsumption <- ((studentData$WeekdayAlcoholConsumption*2) + studentData$WeekendAlcoholConsumption)/3
kable(head(studentData))
| WeekdayAlcoholConsumption | WeekendAlcoholConsumption | FinalMathGrade | AverageAlcoholConsumption |
|---|---|---|---|
| 1 | 1 | 6 | 1.000000 |
| 1 | 1 | 6 | 1.000000 |
| 2 | 3 | 10 | 2.333333 |
| 1 | 1 | 15 | 1.000000 |
| 1 | 2 | 10 | 1.333333 |
| 1 | 2 | 15 | 1.333333 |
Summary Statistics
summary(studentData)
## WeekdayAlcoholConsumption WeekendAlcoholConsumption FinalMathGrade
## Min. :1.000 Min. :1.000 Min. : 0.00
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 8.00
## Median :1.000 Median :2.000 Median :11.00
## Mean :1.481 Mean :2.291 Mean :10.42
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:14.00
## Max. :5.000 Max. :5.000 Max. :20.00
## AverageAlcoholConsumption
## Min. :1.000
## 1st Qu.:1.000
## Median :1.333
## Mean :1.751
## 3rd Qu.:2.333
## Max. :5.000
histograms
ggplot (studentData , aes( FinalMathGrade, colour = as.factor( WeekdayAlcoholConsumption ) )) +geom_freqpoly(binwidth = 1)
ggplot (studentData , aes( FinalMathGrade, colour = as.factor( WeekendAlcoholConsumption ) )) +geom_freqpoly(binwidth = 1)
boxplots
ggplot (studentData , aes( factor ( WeekdayAlcoholConsumption ) , FinalMathGrade )) +geom_boxplot()
ggplot (studentData , aes( factor ( WeekendAlcoholConsumption ) , FinalMathGrade )) +geom_boxplot()
Scatterplot
ggplot (studentData , aes( x=FinalMathGrade, y=AverageAlcoholConsumption )) +geom_point()
Correlation Matrix Plot
M <- cor(studentData)
corrplot(M, method = "ellipse")
While the collelation matrix proves that final grade is not correlated to the alchol consumption, it does show that alchol consumption on the weedays is correlated to the alchol consuption on the weekends.
Here are a plots that demonstrate that
ggplot (studentData , aes( WeekendAlcoholConsumption, colour = as.factor( WeekdayAlcoholConsumption ) )) +geom_freqpoly(binwidth = 1)