Introduction

Alcohol is something consumed by most of us everyday. There is an unresolved debate as to whether alcohol is good, bad or has no effect on our productivity at work. Today, we try to answer this question in the context of students - i.e whether or not alcohol has a depreciative effect on the grades of students. We answer this question on the basis of data from two schools in Portugal.

Data

The data is a kaggle public dataset (https://www.kaggle.com/uciml/student-alcohol-consumption/data). It consists of data about student grades in Portuguese language. The students belong to two schools of Portugal - Gabriel Pereira and Mousinho da Silveira. Data about their grades in Portuguese language. There are a total of 649 students from both the genders, all kinds of family backgrounds, and with a large variation in daily habits.

Meaning of each coloumn

school - student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira)

sex - student’s sex (binary: ‘F’ - female or ‘M’ - male)

age - student’s age (numeric: from 15 to 22)

address - student’s home address type (binary: ‘U’ - urban or ‘R’ - rural)

famsize - family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3)

Pstatus - parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart)

Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)

Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)

Mjob - mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)

Fjob - father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)

reason - reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’)

guardian - student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’)

traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)

studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)

failures - number of past class failures (numeric: n if 1<=n<3, else 4)

schoolsup - extra educational support (binary: yes or no)

famsup - family educational support (binary: yes or no)

paid - extra paid classes within the course subject (Portuguese) (binary: yes or no)

activities - extra-curricular activities (binary: yes or no)

nursery - attended nursery school (binary: yes or no)

higher - wants to take higher education (binary: yes or no)

internet - Internet access at home (binary: yes or no)

romantic - with a romantic relationship (binary: yes or no)

famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

freetime - free time after school (numeric: from 1 - very low to 5 - very high)

goout - going out with friends (numeric: from 1 - very low to 5 - very high)

Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

health - current health status (numeric: from 1 - very bad to 5 - very good)

absences - number of school absences (numeric: from 0 to 93)

G1 - first period grade (numeric: from 0 to 20)

G2 - second period grade (numeric: from 0 to 20)

G3 - final grade (numeric: from 0 to 20, output target)

Reading the data

mydata <- read.csv(paste("student-por.csv" , sep = ""))
some(mydata)
##     school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob
## 132     GP   F  18       U     GT3       T    2    1 services    other
## 245     GP   F  17       U     LE3       T    4    3   health    other
## 350     GP   F  17       U     GT3       T    3    2   health   health
## 352     GP   M  20       U     GT3       A    3    2 services    other
## 458     MS   M  17       R     LE3       T    1    2  at_home services
## 459     MS   F  16       R     GT3       T    1    1    other    other
## 471     MS   F  15       R     GT3       T    3    3 services    other
## 491     MS   F  18       R     GT3       T    1    1  at_home  at_home
## 494     MS   F  17       U     GT3       T    0    1    other  at_home
## 648     MS   M  17       U     LE3       T    3    1 services services
##         reason guardian traveltime studytime failures schoolsup famsup
## 132 reputation   mother          1         2        3        no    yes
## 245 reputation   father          1         2        0        no     no
## 350 reputation   father          1         4        0        no    yes
## 352     course    other          1         1        2        no     no
## 458 reputation   mother          1         1        0        no    yes
## 459       home   father          4         4        0        no    yes
## 471 reputation   mother          1         2        0        no    yes
## 491     course   mother          2         1        1        no     no
## 494     course   father          2         1        0        no     no
## 648     course   mother          2         1        0        no     no
##     paid activities nursery higher internet romantic famrel freetime goout
## 132   no        yes     yes     no      yes      yes      5        4     5
## 245   no        yes     yes    yes      yes      yes      3        2     3
## 350   no        yes      no    yes      yes       no      5        2     2
## 352   no        yes     yes    yes       no       no      5        5     3
## 458   no        yes     yes    yes      yes       no      5        5     5
## 459   no         no      no    yes      yes       no      4        3     2
## 471   no         no     yes    yes      yes      yes      4        5     4
## 491   no         no      no     no      yes      yes      3        2     3
## 494   no        yes      no    yes       no       no      2        4     4
## 648   no         no      no    yes      yes       no      2        4     5
##     Dalc Walc health absences G1 G2 G3
## 132    1    3      5       10 10  9  8
## 245    1    2      3        0 14 12 12
## 350    1    2      5        0 18 18 18
## 352    1    1      5        0 14 15 15
## 458    5    5      3        4 10 11 11
## 459    1    1      1        0 13 10 13
## 471    1    1      1        4 13 12 12
## 491    1    1      2        4  9 11 10
## 494    3    5      5        5  9  9 10
## 648    3    4      2        6 10 10 10

An OLS Regression Analysis of the given dataset

Hypothesis H1 : Consumption of Alcohol negatively affects the grades.

So, the associated nulll hypothesis(H0) will be,

H0 : Consumption of alcohol has no effect whatsoever on the grades of students

m1 <- G3 ~ weekly + studytime + absences + romantic + internet + famrel
fit1 <- lm(m1 , data = mydata)
summary(fit1)
## 
## Call:
## lm(formula = m1, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.1477  -1.6349   0.0674   1.8483   7.4583 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.38775    0.69848  14.872  < 2e-16 ***
## weekly      -0.26708    0.06234  -4.284 2.11e-05 ***
## studytime    0.81601    0.14789   5.518 4.99e-08 ***
## absences    -0.02579    0.02646  -0.975  0.33001    
## romanticyes -0.64310    0.24855  -2.587  0.00989 ** 
## internetyes  1.18731    0.28506   4.165 3.54e-05 ***
## famrel       0.09497    0.12652   0.751  0.45316    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.038 on 642 degrees of freedom
## Multiple R-squared:  0.1239, Adjusted R-squared:  0.1157 
## F-statistic: 15.14 on 6 and 642 DF,  p-value: 2.999e-16

Values of the concerned \(\beta\) coefficients

coefplot(fit1 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax

Now, within weekly alcohol consumption, we will look at the effect of alcohol consumption on Weekdays and on weekends.

m2 <- G3 ~ Walc + Dalc + studytime + absences + romantic + internet + famrel
fit2 <- lm(m2 , data = mydata)
summary(fit2)
## 
## Call:
## lm(formula = m2, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0565  -1.6515   0.0656   1.8731   7.5826 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.32483    0.69938  14.763  < 2e-16 ***
## Walc        -0.12208    0.12049  -1.013  0.31133    
## Dalc        -0.48222    0.16522  -2.919  0.00364 ** 
## studytime    0.83177    0.14821   5.612 2.97e-08 ***
## absences    -0.02444    0.02646  -0.924  0.35593    
## romanticyes -0.61104    0.24940  -2.450  0.01455 *  
## internetyes  1.17653    0.28494   4.129 4.13e-05 ***
## famrel       0.09918    0.12646   0.784  0.43316    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.036 on 641 degrees of freedom
## Multiple R-squared:  0.1266, Adjusted R-squared:  0.1171 
## F-statistic: 13.28 on 7 and 641 DF,  p-value: 4.834e-16

Values of the concerned \(\beta\) coefficients

coefplot(fit2 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax

Results :

We see that the grades obtained by students mainly depend on the following factors :

Out of the above features, the ones marked in red have a negative effect on the grades whereas the ones marked in blue have a positive impact on student grades.

So, on the basis of p-values obtained, we can reject the null hypothesis in favour of the alternate hypothesis.

On further breakdown of the weekly alcohol consumption on the basis of workday consumption(Dalc) and Weekend consumption(Walc), we find that consuming alcohol on workdays is very much more harmful than weekend consumption.

Managerial Relevance

The parents and teachers of these students(considered managers for this case), now have concrete proof that consuming alcohol ruins grades and will probably ruin the future of the student. They need to make their kids understand the harmful effects of excessive alcohol consumption, and should help them get rid of any addictions to alcoholic products.

References

I obtained this dataset from kaggle(https://www.kaggle.com/uciml/student-alcohol-consumption). It is a public dataset and I found it interesting as wwell as socially and managerially relevant. Also, I would like to thank Prof. Sameer Mathur for guiding me on this road of data analytics and making it possible for me to create this project.

Appendix 1 : Uncovering the data

some(mydata)
##     school sex age address famsize Pstatus Medu Fedu     Mjob    Fjob
## 1       GP   F  18       U     GT3       A    4    4  at_home teacher
## 103     GP   M  15       U     GT3       T    4    4 services   other
## 138     GP   F  16       U     GT3       A    2    2    other   other
## 156     GP   M  17       U     GT3       T    2    1    other   other
## 246     GP   M  17       R     GT3       T    2    2    other   other
## 250     GP   M  16       U     GT3       T    3    2  at_home   other
## 316     GP   F  18       U     GT3       T    2    1    other   other
## 422     GP   F  20       U     GT3       T    1    0    other   other
## 452     MS   M  16       R     GT3       T    1    2    other   other
## 521     MS   F  16       U     LE3       T    1    1  at_home   other
##         reason guardian traveltime studytime failures schoolsup famsup
## 1       course   mother          2         2        0       yes     no
## 103     course   mother          1         1        0        no    yes
## 138       home   mother          1         1        1        no     no
## 156       home   mother          1         1        0        no    yes
## 246     course   father          2         2        0        no    yes
## 250 reputation   mother          2         3        0        no     no
## 316       home   mother          1         2        0        no    yes
## 422 reputation   mother          2         1        1       yes     no
## 452     course   father          2         2        0        no     no
## 521      other   mother          3         2        0        no    yes
##     paid activities nursery higher internet romantic famrel freetime goout
## 1     no         no     yes    yes       no       no      4        3     4
## 103  yes        yes      no    yes      yes       no      5        3     3
## 138   no         no     yes    yes       no       no      5        3     4
## 156   no         no     yes    yes      yes       no      5        4     5
## 246   no        yes     yes    yes      yes       no      4        5     2
## 250   no        yes     yes    yes      yes      yes      5        3     3
## 316   no         no     yes    yes      yes      yes      4        2     5
## 422   no         no     yes    yes      yes      yes      5        3     1
## 452   no         no     yes    yes       no       no      4        3     3
## 521   no         no     yes    yes      yes       no      4        3     2
##     Dalc Walc health absences G1 G2 G3 weekly
## 1      1    1      3        4  0 11 11      2
## 103    1    1      5        2 12 13 12      2
## 138    1    1      5       12 13 11 11      2
## 156    1    2      5       22  9  7  6      3
## 246    1    1      1        0 12 13 13      2
## 250    1    3      2        0 12 12 12      4
## 316    1    2      1        8 14 14 15      3
## 422    1    1      5        5  8 10 10      2
## 452    1    1      5        0 10 11 11      2
## 521    1    3      5        6  6  8  8      4

Viewing the data

No. of Rows in the data

nrow(mydata)
## [1] 649

No. of coloumns in the data

ncol(mydata)
## [1] 34

Description of each coloumn

str(mydata)
## 'data.frame':    649 obs. of  34 variables:
##  $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex       : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
##  $ famsize   : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
##  $ Pstatus   : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
##  $ Fjob      : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
##  $ reason    : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
##  $ guardian  : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
##  $ famsup    : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ paid      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
##  $ nursery   : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
##  $ higher    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ internet  : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ romantic  : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  4 2 6 0 0 6 0 2 0 0 ...
##  $ G1        : int  0 9 12 14 11 12 13 10 15 12 ...
##  $ G2        : int  11 11 13 14 13 12 12 13 16 12 ...
##  $ G3        : int  11 11 12 14 13 13 13 13 17 13 ...
##  $ weekly    : int  2 2 5 2 3 3 2 2 2 2 ...

Summarizing the data

summary(mydata)
##  school   sex          age        address famsize   Pstatus
##  GP:423   F:383   Min.   :15.00   R:197   GT3:457   A: 80  
##  MS:226   M:266   1st Qu.:16.00   U:452   LE3:192   T:569  
##                   Median :17.00                            
##                   Mean   :16.74                            
##                   3rd Qu.:18.00                            
##                   Max.   :22.00                            
##       Medu            Fedu             Mjob           Fjob    
##  Min.   :0.000   Min.   :0.000   at_home :135   at_home : 42  
##  1st Qu.:2.000   1st Qu.:1.000   health  : 48   health  : 23  
##  Median :2.000   Median :2.000   other   :258   other   :367  
##  Mean   :2.515   Mean   :2.307   services:136   services:181  
##  3rd Qu.:4.000   3rd Qu.:3.000   teacher : 72   teacher : 36  
##  Max.   :4.000   Max.   :4.000                                
##         reason      guardian     traveltime      studytime    
##  course    :285   father:153   Min.   :1.000   Min.   :1.000  
##  home      :149   mother:455   1st Qu.:1.000   1st Qu.:1.000  
##  other     : 72   other : 41   Median :1.000   Median :2.000  
##  reputation:143                Mean   :1.569   Mean   :1.931  
##                                3rd Qu.:2.000   3rd Qu.:2.000  
##                                Max.   :4.000   Max.   :4.000  
##     failures      schoolsup famsup     paid     activities nursery  
##  Min.   :0.0000   no :581   no :251   no :610   no :334    no :128  
##  1st Qu.:0.0000   yes: 68   yes:398   yes: 39   yes:315    yes:521  
##  Median :0.0000                                                     
##  Mean   :0.2219                                                     
##  3rd Qu.:0.0000                                                     
##  Max.   :3.0000                                                     
##  higher    internet  romantic      famrel         freetime   
##  no : 69   no :151   no :410   Min.   :1.000   Min.   :1.00  
##  yes:580   yes:498   yes:239   1st Qu.:4.000   1st Qu.:3.00  
##                                Median :4.000   Median :3.00  
##                                Mean   :3.931   Mean   :3.18  
##                                3rd Qu.:5.000   3rd Qu.:4.00  
##                                Max.   :5.000   Max.   :5.00  
##      goout            Dalc            Walc          health     
##  Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.00   1st Qu.:2.000  
##  Median :3.000   Median :1.000   Median :2.00   Median :4.000  
##  Mean   :3.185   Mean   :1.502   Mean   :2.28   Mean   :3.536  
##  3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:3.00   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.00   Max.   :5.000  
##     absences            G1             G2              G3       
##  Min.   : 0.000   Min.   : 0.0   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.:10.0   1st Qu.:10.00   1st Qu.:10.00  
##  Median : 2.000   Median :11.0   Median :11.00   Median :12.00  
##  Mean   : 3.659   Mean   :11.4   Mean   :11.57   Mean   :11.91  
##  3rd Qu.: 6.000   3rd Qu.:13.0   3rd Qu.:13.00   3rd Qu.:14.00  
##  Max.   :32.000   Max.   :19.0   Max.   :19.00   Max.   :19.00  
##      weekly      
##  Min.   : 2.000  
##  1st Qu.: 2.000  
##  Median : 3.000  
##  Mean   : 3.783  
##  3rd Qu.: 5.000  
##  Max.   :10.000

Appendix 2 : Exploring the relationships in the data

Distribution of students on the basis of age

table(mydata$age)
## 
##  15  16  17  18  19  20  21  22 
## 112 177 179 140  32   6   2   1
hist(mydata$age , right = FALSE , ylim = c(0,200) , main = "Age distribution of students" , col = rainbow(9) , ylab = "No. of students")

Characteristics of the data

tab1 <- table(mydata$sex)
tab2 <- table(mydata$address)
tab3 <- table(mydata$romantic)
tab4 <- table(mydata$internet)
par(mfrow = c(2,2))
lbls <- paste(names(tab1))
pie3D(tab1 , labels = lbls , explode = 0.1 , col = c("blue", "pink") 
      , labelcex = 1 , main = "Males vs Females in the sample")
pie3D(tab2 , labels = c("Rural" , "Urban") , labelcex = 1 
      , explode = 0.1 , col = c("green", "yellow") , main = "Living Environment")
pie3D(tab3 , labels = paste(names(tab3)) , labelcex = 1 
      , explode = 0.1 , col = c("orange", "blue") , main = "Romantically involved")
pie3D(tab4 , labels = paste(names(tab4)) , explode = 0.1 , 
      labelcex = 1 , col = c("violet", "peachpuff") , main = "Internet access at home")

Family background of the students

Parents’ Education

#par(mfrow = (c(1,2)))
hist(Fedu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.") 
     , ylim = c(0,300) , xlab = "Educational level" , main = "Father's Education" , col = rev(rainbow(8)))

hist(Medu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.") 
     , ylim = c(0,300) , xlab = "Educational level" , main = "Mother's Education" , 
     col = (rainbow(8)))

Parents’ Job

histogram(~Fjob , col = rainbow(6) , xlab = "Type of job" , main = "Father's Job")

histogram(~Mjob , col = rainbow(6) , xlab = "Type of job" , main = "Mother's Job")

Relationships in the family

table(famrel)
## famrel
##   1   2   3   4   5 
##  22  29 101 317 180
hist(famrel , col = rev(rainbow(10)) , ylim = c(0,350) 
     , main = "Type of Relationship among family members" 
     , xlab = "Strength of relations(1 : bad to 5 : very good)")

Health(current) of the respondents

table(mydata$health)
## 
##   1   2   3   4   5 
##  90  78 124 108 249
hist(mydata$health , main = "Currrent health status" 
     , xlab = "Health(1: poor to 5 : excellent)" , col = rainbow(20))

Comparison of Weekday and weekend alcohol consumptions

comp <- xtabs(~Dalc + Walc , data = mydata)
ftable(comp)
##      Walc   1   2   3   4   5
## Dalc                         
## 1         241 113  64  28   5
## 2           3  34  43  34   7
## 3           1   1   9  20  12
## 4           1   1   4   5   6
## 5           1   1   0   0  15
mosaic(comp , shade = TRUE)

Probing the existence of any correlation in b/w weekday and weekend alcohol consumption

cor.test(Dalc,Walc)
## 
##  Pearson's product-moment correlation
## 
## data:  Dalc and Walc
## t = 19.92, df = 647, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5664804 0.6621049
## sample estimates:
##       cor 
## 0.6165614

We can conclude that - indeed there is a correlation and a highly postive one - i.e those who drink more on weekdays also tend to drink more on weekends

Plotting the variation of alcohol consumption in the respondents

table(mydata$weekly)
## 
##   2   3   4   5   6   7   8   9  10 
## 241 116  99  73  50  32  17   6  15
hist(weekly , ylim = c(0,400) , main = "Total weekly alcohol consumption" , xlab = "Level of consumption(0 : minimal to 10 : very heavy)" , col = rainbow(10))

layout(matrix(c(1,2), 1, 2, byrow = TRUE))
table(mydata$Dalc)
## 
##   1   2   3   4   5 
## 451 121  43  17  17
hist(Dalc , ylim = c(0,500) , main = "Weekday alcohol consumption" 
     , xlab = "Consumption level(0 : minimal to 5 : extreme)" , col = rainbow(10))
table(mydata$Walc)
## 
##   1   2   3   4   5 
## 247 150 120  87  45
hist(Walc , ylim = c(0,300) , main = "Weekend alcohol consumption" 
     , xlab = "Consumption Level(0 : minimal to 5 : extreme)" , col = rainbow(10))

How do various indicators of performance vary with weekly alcohol consumption

aggregate(cbind(failures , absences , G3 , studytime) ~ weekly, data = mydata , FUN = mean)
##   weekly  failures absences       G3 studytime
## 1      2 0.2116183 2.804979 12.36929  2.120332
## 2      3 0.0862069 3.689655 12.62069  1.896552
## 3      4 0.2424242 3.313131 11.66667  1.949495
## 4      5 0.2191781 4.561644 11.91781  1.863014
## 5      6 0.2800000 4.940000 10.50000  1.640000
## 6      7 0.4687500 4.875000 10.59375  1.531250
## 7      8 0.4117647 4.117647 10.52941  1.647059
## 8      9 0.0000000 8.000000 10.16667  1.666667
## 9     10 0.4666667 5.933333 10.20000  1.600000

Grades v/s Alcohol

boxplot(G3 ~ weekly , data = mydata , horizontal = TRUE , 
        col = c("lightblue", "lightblue3", "lightblue4", "turquoise", 
                "#CCFF00FF", "#80FF00FF",
 "orange", "darkorange", "red"))

How much do the people involved in higher studies drink?

prop.table(xtabs(~ weekly + higher , data = mydata) , 2) * 100
##       higher
## weekly        no       yes
##     2  30.434783 37.931034
##     3  10.144928 18.793103
##     4  18.840580 14.827586
##     5  10.144928 11.379310
##     6   7.246377  7.758621
##     7  13.043478  3.965517
##     8   4.347826  2.413793
##     9   0.000000  1.034483
##     10  5.797101  1.896552

Visualising it

interaction.plot(weekly , higher , G3 , fun = mean , legend = TRUE , type = "b" 
                 , pch = c(16,18) , col = c("blue","green") , 
                 main = "Interaction between higher education and weekly alcohol consumption" 
                 , xlab = "Level of Alcohol Consumption" , ylab = "Mean of grade received")

We see that the people pursuing higher studies tend to drink less alcohol.

Appendix 3 : Analysing Correlations

d2 <- mydata
d2$school <- !(d2$school == "GP")
d2$school[d2$school == TRUE] <- 1
d2$school[d2$school == FALSE] <- 0
d2$school <- as.integer(d2$school)

d2$sex <- !(d2$sex == "M")
d2$sex[d2$sex == TRUE] <- 1
d2$sex[d2$sex == FALSE] <- 0
d2$sex <- as.integer(d2$sex)

d2$address <- !(d2$address == "R")
d2$address[d2$address == TRUE] <- 1
d2$address[d2$address == FALSE] <- 0
d2$address <- as.integer(d2$address)

d2$activities <- !(d2$activities == "no")
d2$activities[d2$activities == TRUE] <- 1
d2$activities[d2$activities == FALSE] <- 0
d2$activities <- as.integer(d2$activities)

d2$internet <- !(d2$internet == "no")
d2$internet[d2$internet == TRUE] <- 1
d2$internet[d2$internet == FALSE] <- 0
d2$internet <- as.integer(d2$internet)

d2$romantic <- mydata$romantic
d2$romantic <- !(d2$romantic == "no")
d2$romantic[d2$romantic == TRUE] <- 1
d2$romantic[d2$romantic == FALSE] <- 0
d2$romantic <- as.integer(d2$romantic)

d2$Pstatus <- !(d2$Pstatus == "A")
d2$Pstatus[d2$Pstatus == TRUE] <- 1
d2$Pstatus[d2$Pstatus == FALSE] <- 0
d2$Pstatus <- as.integer(d2$Pstatus)
cor(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)])
##                    Medu          Fedu   traveltime    studytime
## Medu        1.000000000  0.6474766091 -0.265079003  0.097005833
## Fedu        0.647476609  1.0000000000 -0.208287978  0.050399648
## traveltime -0.265079003 -0.2082879785  1.000000000 -0.063153904
## studytime   0.097005833  0.0503996477 -0.063153904  1.000000000
## activities  0.119354338  0.0796997847 -0.033375848  0.070080254
## internet    0.266052298  0.1834826715 -0.190826470  0.037528541
## romantic   -0.030992129 -0.0676748136  0.004750636  0.033035960
## famrel      0.024420573  0.0202558848 -0.009521185 -0.004127129
## Dalc       -0.007018319  0.0000607749  0.092824284 -0.137584739
## Walc       -0.019765786  0.0384447003  0.057007178 -0.214925105
## health      0.004614056  0.0449097884 -0.048261206 -0.056432694
## G3          0.240150757  0.2117996791 -0.127172967  0.249788690
##             activities    internet     romantic       famrel          Dalc
## Medu        0.11935434  0.26605230 -0.030992129  0.024420573 -0.0070183191
## Fedu        0.07969978  0.18348267 -0.067674814  0.020255885  0.0000607749
## traveltime -0.03337585 -0.19082647  0.004750636 -0.009521185  0.0928242836
## studytime   0.07008025  0.03752854  0.033035960 -0.004127129 -0.1375847394
## activities  1.00000000  0.08237483  0.057516633  0.057597473  0.0225920962
## internet    0.08237483  1.00000000  0.034831900  0.082214307  0.0428111958
## romantic    0.05751663  0.03483190  1.000000000 -0.044919757  0.0620421218
## famrel      0.05759747  0.08221431 -0.044919757  1.000000000 -0.0757672250
## Dalc        0.02259210  0.04281120  0.062042122 -0.075767225  1.0000000000
## Walc        0.03282417  0.06065091 -0.019970702 -0.093510806  0.6165613821
## health      0.01300056 -0.02279223 -0.018024906  0.109559217  0.0590674577
## G3          0.05979145  0.15002485 -0.090582884  0.063361128 -0.2047193972
##                   Walc       health          G3
## Medu       -0.01976579  0.004614056  0.24015076
## Fedu        0.03844470  0.044909788  0.21179968
## traveltime  0.05700718 -0.048261206 -0.12717297
## studytime  -0.21492510 -0.056432694  0.24978869
## activities  0.03282417  0.013000559  0.05979145
## internet    0.06065091 -0.022792225  0.15002485
## romantic   -0.01997070 -0.018024906 -0.09058288
## famrel     -0.09351081  0.109559217  0.06336113
## Dalc        0.61656138  0.059067458 -0.20471940
## Walc        1.00000000  0.114987972 -0.17661887
## health      0.11498797  1.000000000 -0.09885124
## G3         -0.17661887 -0.098851241  1.00000000
corrgram(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)] , upper.panel = panel.pie 
         , diag.panel = panel.minmax)

Scatterplot of Grades v/s Family Background data

scatterplotMatrix(~ G3 + Fedu + Medu + Pstatus + Mjob + Fjob + famrel, data = d2)

Scatterplot of Grades v/s Drinking Habits data

scatterplotMatrix(~ G3 + weekly + health + failures + absences + activities , data = mydata)

c1 <- d2[,c("G3" , "health" , "weekly" , "Dalc" , "Walc" 
            , "absences" , "activities" , "studytime" , "freetime" , "goout")]
mat1 <- rcorr(as.matrix(c1))
mat1
##               G3 health weekly  Dalc  Walc absences activities studytime
## G3          1.00  -0.10  -0.21 -0.20 -0.18    -0.09       0.06      0.25
## health     -0.10   1.00   0.10  0.06  0.11    -0.03       0.01     -0.06
## weekly     -0.21   0.10   1.00  0.86  0.93     0.18       0.03     -0.20
## Dalc       -0.20   0.06   0.86  1.00  0.62     0.17       0.02     -0.14
## Walc       -0.18   0.11   0.93  0.62  1.00     0.16       0.03     -0.21
## absences   -0.09  -0.03   0.18  0.17  0.16     1.00      -0.02     -0.12
## activities  0.06   0.01   0.03  0.02  0.03    -0.02       1.00      0.07
## studytime   0.25  -0.06  -0.20 -0.14 -0.21    -0.12       0.07      1.00
## freetime   -0.12   0.08   0.13  0.11  0.12    -0.02       0.15     -0.07
## goout      -0.09  -0.02   0.36  0.25  0.39     0.09       0.09     -0.08
##            freetime goout
## G3            -0.12 -0.09
## health         0.08 -0.02
## weekly         0.13  0.36
## Dalc           0.11  0.25
## Walc           0.12  0.39
## absences      -0.02  0.09
## activities     0.15  0.09
## studytime     -0.07 -0.08
## freetime       1.00  0.35
## goout          0.35  1.00
## 
## n= 649 
## 
## 
## P
##            G3     health weekly Dalc   Walc   absences activities
## G3                0.0117 0.0000 0.0000 0.0000 0.0199   0.1281    
## health     0.0117        0.0096 0.1328 0.0034 0.4419   0.7410    
## weekly     0.0000 0.0096        0.0000 0.0000 0.0000   0.4209    
## Dalc       0.0000 0.1328 0.0000        0.0000 0.0000   0.5656    
## Walc       0.0000 0.0034 0.0000 0.0000        0.0000   0.4038    
## absences   0.0199 0.4419 0.0000 0.0000 0.0000          0.7007    
## activities 0.1281 0.7410 0.4209 0.5656 0.4038 0.7007             
## studytime  0.0000 0.1510 0.0000 0.0004 0.0000 0.0025   0.0744    
## freetime   0.0017 0.0313 0.0010 0.0051 0.0022 0.6341   0.0001    
## goout      0.0256 0.6890 0.0000 0.0000 0.0000 0.0297   0.0240    
##            studytime freetime goout 
## G3         0.0000    0.0017   0.0256
## health     0.1510    0.0313   0.6890
## weekly     0.0000    0.0010   0.0000
## Dalc       0.0004    0.0051   0.0000
## Walc       0.0000    0.0022   0.0000
## absences   0.0025    0.6341   0.0297
## activities 0.0744    0.0001   0.0240
## studytime            0.0797   0.0547
## freetime   0.0797             0.0000
## goout      0.0547    0.0000

An interesting correlaton to note is that Family relations and grades are postively related, so a nice environment in the house helps students score better.