Alcohol is something consumed by most of us everyday. There is an unresolved debate as to whether alcohol is good, bad or has no effect on our productivity at work. Today, we try to answer this question in the context of students - i.e whether or not alcohol has a depreciative effect on the grades of students. We answer this question on the basis of data from two schools in Portugal.
The data is a kaggle public dataset (https://www.kaggle.com/uciml/student-alcohol-consumption/data). It consists of data about student grades in Portuguese language. The students belong to two schools of Portugal - Gabriel Pereira and Mousinho da Silveira. Data about their grades in Portuguese language. There are a total of 649 students from both the genders, all kinds of family backgrounds, and with a large variation in daily habits.
school - student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira)
sex - student’s sex (binary: ‘F’ - female or ‘M’ - male)
age - student’s age (numeric: from 15 to 22)
address - student’s home address type (binary: ‘U’ - urban or ‘R’ - rural)
famsize - family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3)
Pstatus - parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart)
Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)
Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)
Mjob - mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
Fjob - father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
reason - reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’)
guardian - student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’)
traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures - number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup - extra educational support (binary: yes or no)
famsup - family educational support (binary: yes or no)
paid - extra paid classes within the course subject (Portuguese) (binary: yes or no)
activities - extra-curricular activities (binary: yes or no)
nursery - attended nursery school (binary: yes or no)
higher - wants to take higher education (binary: yes or no)
internet - Internet access at home (binary: yes or no)
romantic - with a romantic relationship (binary: yes or no)
famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime - free time after school (numeric: from 1 - very low to 5 - very high)
goout - going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health - current health status (numeric: from 1 - very bad to 5 - very good)
absences - number of school absences (numeric: from 0 to 93)
G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target)
mydata <- read.csv(paste("student-por.csv" , sep = ""))
some(mydata)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob
## 132 GP F 18 U GT3 T 2 1 services other
## 245 GP F 17 U LE3 T 4 3 health other
## 350 GP F 17 U GT3 T 3 2 health health
## 352 GP M 20 U GT3 A 3 2 services other
## 458 MS M 17 R LE3 T 1 2 at_home services
## 459 MS F 16 R GT3 T 1 1 other other
## 471 MS F 15 R GT3 T 3 3 services other
## 491 MS F 18 R GT3 T 1 1 at_home at_home
## 494 MS F 17 U GT3 T 0 1 other at_home
## 648 MS M 17 U LE3 T 3 1 services services
## reason guardian traveltime studytime failures schoolsup famsup
## 132 reputation mother 1 2 3 no yes
## 245 reputation father 1 2 0 no no
## 350 reputation father 1 4 0 no yes
## 352 course other 1 1 2 no no
## 458 reputation mother 1 1 0 no yes
## 459 home father 4 4 0 no yes
## 471 reputation mother 1 2 0 no yes
## 491 course mother 2 1 1 no no
## 494 course father 2 1 0 no no
## 648 course mother 2 1 0 no no
## paid activities nursery higher internet romantic famrel freetime goout
## 132 no yes yes no yes yes 5 4 5
## 245 no yes yes yes yes yes 3 2 3
## 350 no yes no yes yes no 5 2 2
## 352 no yes yes yes no no 5 5 3
## 458 no yes yes yes yes no 5 5 5
## 459 no no no yes yes no 4 3 2
## 471 no no yes yes yes yes 4 5 4
## 491 no no no no yes yes 3 2 3
## 494 no yes no yes no no 2 4 4
## 648 no no no yes yes no 2 4 5
## Dalc Walc health absences G1 G2 G3
## 132 1 3 5 10 10 9 8
## 245 1 2 3 0 14 12 12
## 350 1 2 5 0 18 18 18
## 352 1 1 5 0 14 15 15
## 458 5 5 3 4 10 11 11
## 459 1 1 1 0 13 10 13
## 471 1 1 1 4 13 12 12
## 491 1 1 2 4 9 11 10
## 494 3 5 5 5 9 9 10
## 648 3 4 2 6 10 10 10
Hypothesis H1 : Consumption of Alcohol negatively affects the grades.
So, the associated nulll hypothesis(H0) will be,
H0 : Consumption of alcohol has no effect whatsoever on the grades of students
m1 <- G3 ~ weekly + studytime + absences + romantic + internet + famrel
fit1 <- lm(m1 , data = mydata)
summary(fit1)
##
## Call:
## lm(formula = m1, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.1477 -1.6349 0.0674 1.8483 7.4583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.38775 0.69848 14.872 < 2e-16 ***
## weekly -0.26708 0.06234 -4.284 2.11e-05 ***
## studytime 0.81601 0.14789 5.518 4.99e-08 ***
## absences -0.02579 0.02646 -0.975 0.33001
## romanticyes -0.64310 0.24855 -2.587 0.00989 **
## internetyes 1.18731 0.28506 4.165 3.54e-05 ***
## famrel 0.09497 0.12652 0.751 0.45316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.038 on 642 degrees of freedom
## Multiple R-squared: 0.1239, Adjusted R-squared: 0.1157
## F-statistic: 15.14 on 6 and 642 DF, p-value: 2.999e-16
coefplot(fit1 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax
Now, within weekly alcohol consumption, we will look at the effect of alcohol consumption on Weekdays and on weekends.
m2 <- G3 ~ Walc + Dalc + studytime + absences + romantic + internet + famrel
fit2 <- lm(m2 , data = mydata)
summary(fit2)
##
## Call:
## lm(formula = m2, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.0565 -1.6515 0.0656 1.8731 7.5826
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.32483 0.69938 14.763 < 2e-16 ***
## Walc -0.12208 0.12049 -1.013 0.31133
## Dalc -0.48222 0.16522 -2.919 0.00364 **
## studytime 0.83177 0.14821 5.612 2.97e-08 ***
## absences -0.02444 0.02646 -0.924 0.35593
## romanticyes -0.61104 0.24940 -2.450 0.01455 *
## internetyes 1.17653 0.28494 4.129 4.13e-05 ***
## famrel 0.09918 0.12646 0.784 0.43316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.036 on 641 degrees of freedom
## Multiple R-squared: 0.1266, Adjusted R-squared: 0.1171
## F-statistic: 13.28 on 7 and 641 DF, p-value: 4.834e-16
coefplot(fit2 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax
We see that the grades obtained by students mainly depend on the following factors :
Out of the above features, the ones marked in red have a negative effect on the grades whereas the ones marked in blue have a positive impact on student grades.
So, on the basis of p-values obtained, we can reject the null hypothesis in favour of the alternate hypothesis.
On further breakdown of the weekly alcohol consumption on the basis of workday consumption(Dalc) and Weekend consumption(Walc), we find that consuming alcohol on workdays is very much more harmful than weekend consumption.
The parents and teachers of these students(considered managers for this case), now have concrete proof that consuming alcohol ruins grades and will probably ruin the future of the student. They need to make their kids understand the harmful effects of excessive alcohol consumption, and should help them get rid of any addictions to alcoholic products.
I obtained this dataset from kaggle(https://www.kaggle.com/uciml/student-alcohol-consumption). It is a public dataset and I found it interesting as wwell as socially and managerially relevant. Also, I would like to thank Prof. Sameer Mathur for guiding me on this road of data analytics and making it possible for me to create this project.
some(mydata)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob
## 1 GP F 18 U GT3 A 4 4 at_home teacher
## 103 GP M 15 U GT3 T 4 4 services other
## 138 GP F 16 U GT3 A 2 2 other other
## 156 GP M 17 U GT3 T 2 1 other other
## 246 GP M 17 R GT3 T 2 2 other other
## 250 GP M 16 U GT3 T 3 2 at_home other
## 316 GP F 18 U GT3 T 2 1 other other
## 422 GP F 20 U GT3 T 1 0 other other
## 452 MS M 16 R GT3 T 1 2 other other
## 521 MS F 16 U LE3 T 1 1 at_home other
## reason guardian traveltime studytime failures schoolsup famsup
## 1 course mother 2 2 0 yes no
## 103 course mother 1 1 0 no yes
## 138 home mother 1 1 1 no no
## 156 home mother 1 1 0 no yes
## 246 course father 2 2 0 no yes
## 250 reputation mother 2 3 0 no no
## 316 home mother 1 2 0 no yes
## 422 reputation mother 2 1 1 yes no
## 452 course father 2 2 0 no no
## 521 other mother 3 2 0 no yes
## paid activities nursery higher internet romantic famrel freetime goout
## 1 no no yes yes no no 4 3 4
## 103 yes yes no yes yes no 5 3 3
## 138 no no yes yes no no 5 3 4
## 156 no no yes yes yes no 5 4 5
## 246 no yes yes yes yes no 4 5 2
## 250 no yes yes yes yes yes 5 3 3
## 316 no no yes yes yes yes 4 2 5
## 422 no no yes yes yes yes 5 3 1
## 452 no no yes yes no no 4 3 3
## 521 no no yes yes yes no 4 3 2
## Dalc Walc health absences G1 G2 G3 weekly
## 1 1 1 3 4 0 11 11 2
## 103 1 1 5 2 12 13 12 2
## 138 1 1 5 12 13 11 11 2
## 156 1 2 5 22 9 7 6 3
## 246 1 1 1 0 12 13 13 2
## 250 1 3 2 0 12 12 12 4
## 316 1 2 1 8 14 14 15 3
## 422 1 1 5 5 8 10 10 2
## 452 1 1 5 0 10 11 11 2
## 521 1 3 5 6 6 8 8 4
No. of Rows in the data
nrow(mydata)
## [1] 649
No. of coloumns in the data
ncol(mydata)
## [1] 34
Description of each coloumn
str(mydata)
## 'data.frame': 649 obs. of 34 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
## $ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
## $ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
## $ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
## $ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
## $ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 0 0 0 0 0 0 0 0 ...
## $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
## $ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ paid : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
## $ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 4 2 6 0 0 6 0 2 0 0 ...
## $ G1 : int 0 9 12 14 11 12 13 10 15 12 ...
## $ G2 : int 11 11 13 14 13 12 12 13 16 12 ...
## $ G3 : int 11 11 12 14 13 13 13 13 17 13 ...
## $ weekly : int 2 2 5 2 3 3 2 2 2 2 ...
Summarizing the data
summary(mydata)
## school sex age address famsize Pstatus
## GP:423 F:383 Min. :15.00 R:197 GT3:457 A: 80
## MS:226 M:266 1st Qu.:16.00 U:452 LE3:192 T:569
## Median :17.00
## Mean :16.74
## 3rd Qu.:18.00
## Max. :22.00
## Medu Fedu Mjob Fjob
## Min. :0.000 Min. :0.000 at_home :135 at_home : 42
## 1st Qu.:2.000 1st Qu.:1.000 health : 48 health : 23
## Median :2.000 Median :2.000 other :258 other :367
## Mean :2.515 Mean :2.307 services:136 services:181
## 3rd Qu.:4.000 3rd Qu.:3.000 teacher : 72 teacher : 36
## Max. :4.000 Max. :4.000
## reason guardian traveltime studytime
## course :285 father:153 Min. :1.000 Min. :1.000
## home :149 mother:455 1st Qu.:1.000 1st Qu.:1.000
## other : 72 other : 41 Median :1.000 Median :2.000
## reputation:143 Mean :1.569 Mean :1.931
## 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.000
## failures schoolsup famsup paid activities nursery
## Min. :0.0000 no :581 no :251 no :610 no :334 no :128
## 1st Qu.:0.0000 yes: 68 yes:398 yes: 39 yes:315 yes:521
## Median :0.0000
## Mean :0.2219
## 3rd Qu.:0.0000
## Max. :3.0000
## higher internet romantic famrel freetime
## no : 69 no :151 no :410 Min. :1.000 Min. :1.00
## yes:580 yes:498 yes:239 1st Qu.:4.000 1st Qu.:3.00
## Median :4.000 Median :3.00
## Mean :3.931 Mean :3.18
## 3rd Qu.:5.000 3rd Qu.:4.00
## Max. :5.000 Max. :5.00
## goout Dalc Walc health
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:2.000
## Median :3.000 Median :1.000 Median :2.00 Median :4.000
## Mean :3.185 Mean :1.502 Mean :2.28 Mean :3.536
## 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
## absences G1 G2 G3
## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.:10.0 1st Qu.:10.00 1st Qu.:10.00
## Median : 2.000 Median :11.0 Median :11.00 Median :12.00
## Mean : 3.659 Mean :11.4 Mean :11.57 Mean :11.91
## 3rd Qu.: 6.000 3rd Qu.:13.0 3rd Qu.:13.00 3rd Qu.:14.00
## Max. :32.000 Max. :19.0 Max. :19.00 Max. :19.00
## weekly
## Min. : 2.000
## 1st Qu.: 2.000
## Median : 3.000
## Mean : 3.783
## 3rd Qu.: 5.000
## Max. :10.000
table(mydata$age)
##
## 15 16 17 18 19 20 21 22
## 112 177 179 140 32 6 2 1
hist(mydata$age , right = FALSE , ylim = c(0,200) , main = "Age distribution of students" , col = rainbow(9) , ylab = "No. of students")
tab1 <- table(mydata$sex)
tab2 <- table(mydata$address)
tab3 <- table(mydata$romantic)
tab4 <- table(mydata$internet)
par(mfrow = c(2,2))
lbls <- paste(names(tab1))
pie3D(tab1 , labels = lbls , explode = 0.1 , col = c("blue", "pink")
, labelcex = 1 , main = "Males vs Females in the sample")
pie3D(tab2 , labels = c("Rural" , "Urban") , labelcex = 1
, explode = 0.1 , col = c("green", "yellow") , main = "Living Environment")
pie3D(tab3 , labels = paste(names(tab3)) , labelcex = 1
, explode = 0.1 , col = c("orange", "blue") , main = "Romantically involved")
pie3D(tab4 , labels = paste(names(tab4)) , explode = 0.1 ,
labelcex = 1 , col = c("violet", "peachpuff") , main = "Internet access at home")
Parents’ Education
#par(mfrow = (c(1,2)))
hist(Fedu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.")
, ylim = c(0,300) , xlab = "Educational level" , main = "Father's Education" , col = rev(rainbow(8)))
hist(Medu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.")
, ylim = c(0,300) , xlab = "Educational level" , main = "Mother's Education" ,
col = (rainbow(8)))
Parents’ Job
histogram(~Fjob , col = rainbow(6) , xlab = "Type of job" , main = "Father's Job")
histogram(~Mjob , col = rainbow(6) , xlab = "Type of job" , main = "Mother's Job")
Relationships in the family
table(famrel)
## famrel
## 1 2 3 4 5
## 22 29 101 317 180
hist(famrel , col = rev(rainbow(10)) , ylim = c(0,350)
, main = "Type of Relationship among family members"
, xlab = "Strength of relations(1 : bad to 5 : very good)")
Health(current) of the respondents
table(mydata$health)
##
## 1 2 3 4 5
## 90 78 124 108 249
hist(mydata$health , main = "Currrent health status"
, xlab = "Health(1: poor to 5 : excellent)" , col = rainbow(20))
Comparison of Weekday and weekend alcohol consumptions
comp <- xtabs(~Dalc + Walc , data = mydata)
ftable(comp)
## Walc 1 2 3 4 5
## Dalc
## 1 241 113 64 28 5
## 2 3 34 43 34 7
## 3 1 1 9 20 12
## 4 1 1 4 5 6
## 5 1 1 0 0 15
mosaic(comp , shade = TRUE)
Probing the existence of any correlation in b/w weekday and weekend alcohol consumption
cor.test(Dalc,Walc)
##
## Pearson's product-moment correlation
##
## data: Dalc and Walc
## t = 19.92, df = 647, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5664804 0.6621049
## sample estimates:
## cor
## 0.6165614
We can conclude that - indeed there is a correlation and a highly postive one - i.e those who drink more on weekdays also tend to drink more on weekends
Plotting the variation of alcohol consumption in the respondents
table(mydata$weekly)
##
## 2 3 4 5 6 7 8 9 10
## 241 116 99 73 50 32 17 6 15
hist(weekly , ylim = c(0,400) , main = "Total weekly alcohol consumption" , xlab = "Level of consumption(0 : minimal to 10 : very heavy)" , col = rainbow(10))
layout(matrix(c(1,2), 1, 2, byrow = TRUE))
table(mydata$Dalc)
##
## 1 2 3 4 5
## 451 121 43 17 17
hist(Dalc , ylim = c(0,500) , main = "Weekday alcohol consumption"
, xlab = "Consumption level(0 : minimal to 5 : extreme)" , col = rainbow(10))
table(mydata$Walc)
##
## 1 2 3 4 5
## 247 150 120 87 45
hist(Walc , ylim = c(0,300) , main = "Weekend alcohol consumption"
, xlab = "Consumption Level(0 : minimal to 5 : extreme)" , col = rainbow(10))
How do various indicators of performance vary with weekly alcohol consumption
aggregate(cbind(failures , absences , G3 , studytime) ~ weekly, data = mydata , FUN = mean)
## weekly failures absences G3 studytime
## 1 2 0.2116183 2.804979 12.36929 2.120332
## 2 3 0.0862069 3.689655 12.62069 1.896552
## 3 4 0.2424242 3.313131 11.66667 1.949495
## 4 5 0.2191781 4.561644 11.91781 1.863014
## 5 6 0.2800000 4.940000 10.50000 1.640000
## 6 7 0.4687500 4.875000 10.59375 1.531250
## 7 8 0.4117647 4.117647 10.52941 1.647059
## 8 9 0.0000000 8.000000 10.16667 1.666667
## 9 10 0.4666667 5.933333 10.20000 1.600000
Grades v/s Alcohol
boxplot(G3 ~ weekly , data = mydata , horizontal = TRUE ,
col = c("lightblue", "lightblue3", "lightblue4", "turquoise",
"#CCFF00FF", "#80FF00FF",
"orange", "darkorange", "red"))
How much do the people involved in higher studies drink?
prop.table(xtabs(~ weekly + higher , data = mydata) , 2) * 100
## higher
## weekly no yes
## 2 30.434783 37.931034
## 3 10.144928 18.793103
## 4 18.840580 14.827586
## 5 10.144928 11.379310
## 6 7.246377 7.758621
## 7 13.043478 3.965517
## 8 4.347826 2.413793
## 9 0.000000 1.034483
## 10 5.797101 1.896552
Visualising it
interaction.plot(weekly , higher , G3 , fun = mean , legend = TRUE , type = "b"
, pch = c(16,18) , col = c("blue","green") ,
main = "Interaction between higher education and weekly alcohol consumption"
, xlab = "Level of Alcohol Consumption" , ylab = "Mean of grade received")
We see that the people pursuing higher studies tend to drink less alcohol.
d2 <- mydata
d2$school <- !(d2$school == "GP")
d2$school[d2$school == TRUE] <- 1
d2$school[d2$school == FALSE] <- 0
d2$school <- as.integer(d2$school)
d2$sex <- !(d2$sex == "M")
d2$sex[d2$sex == TRUE] <- 1
d2$sex[d2$sex == FALSE] <- 0
d2$sex <- as.integer(d2$sex)
d2$address <- !(d2$address == "R")
d2$address[d2$address == TRUE] <- 1
d2$address[d2$address == FALSE] <- 0
d2$address <- as.integer(d2$address)
d2$activities <- !(d2$activities == "no")
d2$activities[d2$activities == TRUE] <- 1
d2$activities[d2$activities == FALSE] <- 0
d2$activities <- as.integer(d2$activities)
d2$internet <- !(d2$internet == "no")
d2$internet[d2$internet == TRUE] <- 1
d2$internet[d2$internet == FALSE] <- 0
d2$internet <- as.integer(d2$internet)
d2$romantic <- mydata$romantic
d2$romantic <- !(d2$romantic == "no")
d2$romantic[d2$romantic == TRUE] <- 1
d2$romantic[d2$romantic == FALSE] <- 0
d2$romantic <- as.integer(d2$romantic)
d2$Pstatus <- !(d2$Pstatus == "A")
d2$Pstatus[d2$Pstatus == TRUE] <- 1
d2$Pstatus[d2$Pstatus == FALSE] <- 0
d2$Pstatus <- as.integer(d2$Pstatus)
cor(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)])
## Medu Fedu traveltime studytime
## Medu 1.000000000 0.6474766091 -0.265079003 0.097005833
## Fedu 0.647476609 1.0000000000 -0.208287978 0.050399648
## traveltime -0.265079003 -0.2082879785 1.000000000 -0.063153904
## studytime 0.097005833 0.0503996477 -0.063153904 1.000000000
## activities 0.119354338 0.0796997847 -0.033375848 0.070080254
## internet 0.266052298 0.1834826715 -0.190826470 0.037528541
## romantic -0.030992129 -0.0676748136 0.004750636 0.033035960
## famrel 0.024420573 0.0202558848 -0.009521185 -0.004127129
## Dalc -0.007018319 0.0000607749 0.092824284 -0.137584739
## Walc -0.019765786 0.0384447003 0.057007178 -0.214925105
## health 0.004614056 0.0449097884 -0.048261206 -0.056432694
## G3 0.240150757 0.2117996791 -0.127172967 0.249788690
## activities internet romantic famrel Dalc
## Medu 0.11935434 0.26605230 -0.030992129 0.024420573 -0.0070183191
## Fedu 0.07969978 0.18348267 -0.067674814 0.020255885 0.0000607749
## traveltime -0.03337585 -0.19082647 0.004750636 -0.009521185 0.0928242836
## studytime 0.07008025 0.03752854 0.033035960 -0.004127129 -0.1375847394
## activities 1.00000000 0.08237483 0.057516633 0.057597473 0.0225920962
## internet 0.08237483 1.00000000 0.034831900 0.082214307 0.0428111958
## romantic 0.05751663 0.03483190 1.000000000 -0.044919757 0.0620421218
## famrel 0.05759747 0.08221431 -0.044919757 1.000000000 -0.0757672250
## Dalc 0.02259210 0.04281120 0.062042122 -0.075767225 1.0000000000
## Walc 0.03282417 0.06065091 -0.019970702 -0.093510806 0.6165613821
## health 0.01300056 -0.02279223 -0.018024906 0.109559217 0.0590674577
## G3 0.05979145 0.15002485 -0.090582884 0.063361128 -0.2047193972
## Walc health G3
## Medu -0.01976579 0.004614056 0.24015076
## Fedu 0.03844470 0.044909788 0.21179968
## traveltime 0.05700718 -0.048261206 -0.12717297
## studytime -0.21492510 -0.056432694 0.24978869
## activities 0.03282417 0.013000559 0.05979145
## internet 0.06065091 -0.022792225 0.15002485
## romantic -0.01997070 -0.018024906 -0.09058288
## famrel -0.09351081 0.109559217 0.06336113
## Dalc 0.61656138 0.059067458 -0.20471940
## Walc 1.00000000 0.114987972 -0.17661887
## health 0.11498797 1.000000000 -0.09885124
## G3 -0.17661887 -0.098851241 1.00000000
corrgram(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)] , upper.panel = panel.pie
, diag.panel = panel.minmax)
Scatterplot of Grades v/s Family Background data
scatterplotMatrix(~ G3 + Fedu + Medu + Pstatus + Mjob + Fjob + famrel, data = d2)
Scatterplot of Grades v/s Drinking Habits data
scatterplotMatrix(~ G3 + weekly + health + failures + absences + activities , data = mydata)
c1 <- d2[,c("G3" , "health" , "weekly" , "Dalc" , "Walc"
, "absences" , "activities" , "studytime" , "freetime" , "goout")]
mat1 <- rcorr(as.matrix(c1))
mat1
## G3 health weekly Dalc Walc absences activities studytime
## G3 1.00 -0.10 -0.21 -0.20 -0.18 -0.09 0.06 0.25
## health -0.10 1.00 0.10 0.06 0.11 -0.03 0.01 -0.06
## weekly -0.21 0.10 1.00 0.86 0.93 0.18 0.03 -0.20
## Dalc -0.20 0.06 0.86 1.00 0.62 0.17 0.02 -0.14
## Walc -0.18 0.11 0.93 0.62 1.00 0.16 0.03 -0.21
## absences -0.09 -0.03 0.18 0.17 0.16 1.00 -0.02 -0.12
## activities 0.06 0.01 0.03 0.02 0.03 -0.02 1.00 0.07
## studytime 0.25 -0.06 -0.20 -0.14 -0.21 -0.12 0.07 1.00
## freetime -0.12 0.08 0.13 0.11 0.12 -0.02 0.15 -0.07
## goout -0.09 -0.02 0.36 0.25 0.39 0.09 0.09 -0.08
## freetime goout
## G3 -0.12 -0.09
## health 0.08 -0.02
## weekly 0.13 0.36
## Dalc 0.11 0.25
## Walc 0.12 0.39
## absences -0.02 0.09
## activities 0.15 0.09
## studytime -0.07 -0.08
## freetime 1.00 0.35
## goout 0.35 1.00
##
## n= 649
##
##
## P
## G3 health weekly Dalc Walc absences activities
## G3 0.0117 0.0000 0.0000 0.0000 0.0199 0.1281
## health 0.0117 0.0096 0.1328 0.0034 0.4419 0.7410
## weekly 0.0000 0.0096 0.0000 0.0000 0.0000 0.4209
## Dalc 0.0000 0.1328 0.0000 0.0000 0.0000 0.5656
## Walc 0.0000 0.0034 0.0000 0.0000 0.0000 0.4038
## absences 0.0199 0.4419 0.0000 0.0000 0.0000 0.7007
## activities 0.1281 0.7410 0.4209 0.5656 0.4038 0.7007
## studytime 0.0000 0.1510 0.0000 0.0004 0.0000 0.0025 0.0744
## freetime 0.0017 0.0313 0.0010 0.0051 0.0022 0.6341 0.0001
## goout 0.0256 0.6890 0.0000 0.0000 0.0000 0.0297 0.0240
## studytime freetime goout
## G3 0.0000 0.0017 0.0256
## health 0.1510 0.0313 0.6890
## weekly 0.0000 0.0010 0.0000
## Dalc 0.0004 0.0051 0.0000
## Walc 0.0000 0.0022 0.0000
## absences 0.0025 0.6341 0.0297
## activities 0.0744 0.0001 0.0240
## studytime 0.0797 0.0547
## freetime 0.0797 0.0000
## goout 0.0547 0.0000
An interesting correlaton to note is that Family relations and grades are postively related, so a nice environment in the house helps students score better.