Introduction

Alcohol is something consumed by most of us everyday. There is an unresolved debate as to whether alcohol is good, bad or has no effect on our productivity at work. Today, we try to answer this question in the context of students - i.e whether or not alcohol has a depreciative effect on the grades of students. We answer this question on the basis of data from two schools in Portugal.

Data

The data is a kaggle public dataset (https://www.kaggle.com/uciml/student-alcohol-consumption/data). It consists of data about student grades in Portuguese language. The students belong to two schools of Portugal - Gabriel Pereira and Mousinho da Silveira. Data about their grades in Portuguese language. There are a total of 649 students from both the genders, all kinds of family backgrounds, and with a large variation in daily habits.

Meaning of each coloumn

school - student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira)

sex - student’s sex (binary: ‘F’ - female or ‘M’ - male)

age - student’s age (numeric: from 15 to 22)

address - student’s home address type (binary: ‘U’ - urban or ‘R’ - rural)

famsize - family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3)

Pstatus - parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart)

Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)

Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 - 5th to 9th grade, 3 - secondary education or 4 - higher education)

Mjob - mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)

Fjob - father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)

reason - reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’)

guardian - student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’)

traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)

studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)

failures - number of past class failures (numeric: n if 1<=n<3, else 4)

schoolsup - extra educational support (binary: yes or no)

famsup - family educational support (binary: yes or no)

paid - extra paid classes within the course subject (Portuguese) (binary: yes or no)

activities - extra-curricular activities (binary: yes or no)

nursery - attended nursery school (binary: yes or no)

higher - wants to take higher education (binary: yes or no)

internet - Internet access at home (binary: yes or no)

romantic - with a romantic relationship (binary: yes or no)

famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

freetime - free time after school (numeric: from 1 - very low to 5 - very high)

goout - going out with friends (numeric: from 1 - very low to 5 - very high)

Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

health - current health status (numeric: from 1 - very bad to 5 - very good)

absences - number of school absences (numeric: from 0 to 93)

G1 - first period grade (numeric: from 0 to 20)

G2 - second period grade (numeric: from 0 to 20)

G3 - final grade (numeric: from 0 to 20, output target)

Reading the data

mydata <- read.csv(paste("student-por.csv" , sep = ""))
some(mydata)
##     school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob
## 59      GP   M  15       U     LE3       T    1    2    other  at_home
## 75      GP   F  16       U     GT3       T    3    3    other services
## 130     GP   M  16       U     GT3       T    2    3    other    other
## 271     GP   M  16       U     GT3       T    4    4 services services
## 277     GP   M  16       U     GT3       T    2    1    other    other
## 305     GP   F  18       U     GT3       T    2    2  at_home services
## 424     MS   F  16       U     GT3       T    1    3  at_home    other
## 440     MS   F  15       R     GT3       T    3    3    other services
## 575     MS   M  20       R     GT3       T    1    1    other    other
## 645     MS   F  19       R     GT3       T    2    3 services    other
##     reason guardian traveltime studytime failures schoolsup famsup paid
## 59    home   father          1         2        0       yes    yes   no
## 75    home   mother          1         2        0       yes    yes   no
## 130 course   mother          2         3        0        no    yes   no
## 271 course   mother          1         1        0        no     no  yes
## 277 course   mother          3         1        0        no     no   no
## 305   home   mother          1         3        0        no    yes   no
## 424  other   father          2         1        0        no    yes   no
## 440 course   father          2         1        0        no     no   no
## 575 course    other          2         1        1        no    yes   no
## 645 course   mother          1         3        1        no     no   no
##     activities nursery higher internet romantic famrel freetime goout Dalc
## 59         yes     yes    yes      yes       no      4        3     2    1
## 75         yes     yes    yes      yes       no      4        3     3    2
## 130         no      no    yes      yes      yes      3        2     3    2
## 271        yes     yes    yes      yes       no      5        3     2    1
## 277         no     yes    yes      yes       no      4        3     3    1
## 305        yes     yes    yes      yes      yes      4        3     3    1
## 424         no     yes     no      yes      yes      4        3     3    1
## 440         no      no    yes      yes       no      4        1     3    1
## 575         no     yes     no      yes      yes      4        4     3    2
## 645        yes      no    yes      yes       no      5        4     2    1
##     Walc health absences G1 G2 G3
## 59     1      5        0 14 13 14
## 75     4      5        4 11 11 11
## 130    2      1        4 13 12 13
## 271    2      5        4 14 15 15
## 277    1      4        7 15 16 16
## 305    1      3        0 11 12 13
## 424    3      5       11 10 11 11
## 440    1      4        0 14 16 16
## 575    4      4       12  8 11 10
## 645    2      5        4 10 11 10

An OLS Regression Analysis of the given dataset

Hypothesis H1 : Consumption of Alcohol negatively affects the grades.

So, the associated nulll hypothesis(H0) will be,

H0 : Consumption of alcohol has no effect whatsoever on the grades of students

m1 <- G3 ~ weekly + studytime + absences + romantic + internet + famrel
fit1 <- lm(m1 , data = mydata)
summary(fit1)
## 
## Call:
## lm(formula = m1, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.1477  -1.6349   0.0674   1.8483   7.4583 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.38775    0.69848  14.872  < 2e-16 ***
## weekly      -0.26708    0.06234  -4.284 2.11e-05 ***
## studytime    0.81601    0.14789   5.518 4.99e-08 ***
## absences    -0.02579    0.02646  -0.975  0.33001    
## romanticyes -0.64310    0.24855  -2.587  0.00989 ** 
## internetyes  1.18731    0.28506   4.165 3.54e-05 ***
## famrel       0.09497    0.12652   0.751  0.45316    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.038 on 642 degrees of freedom
## Multiple R-squared:  0.1239, Adjusted R-squared:  0.1157 
## F-statistic: 15.14 on 6 and 642 DF,  p-value: 2.999e-16

Values of the concerned \(\beta\) coefficients

coefplot(fit1 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax

Now, within weekly alcohol consumption, we will look at the effect of alcohol consumption on Weekdays and on weekends.

m2 <- G3 ~ Walc + Dalc + studytime + absences + romantic + internet + famrel
fit2 <- lm(m2 , data = mydata)
summary(fit2)
## 
## Call:
## lm(formula = m2, data = mydata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0565  -1.6515   0.0656   1.8731   7.5826 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.32483    0.69938  14.763  < 2e-16 ***
## Walc        -0.12208    0.12049  -1.013  0.31133    
## Dalc        -0.48222    0.16522  -2.919  0.00364 ** 
## studytime    0.83177    0.14821   5.612 2.97e-08 ***
## absences    -0.02444    0.02646  -0.924  0.35593    
## romanticyes -0.61104    0.24940  -2.450  0.01455 *  
## internetyes  1.17653    0.28494   4.129 4.13e-05 ***
## famrel       0.09918    0.12646   0.784  0.43316    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.036 on 641 degrees of freedom
## Multiple R-squared:  0.1266, Adjusted R-squared:  0.1171 
## F-statistic: 13.28 on 7 and 641 DF,  p-value: 4.834e-16

Values of the concerned \(\beta\) coefficients

coefplot(fit2 , outerCI = 1.96 , intercept = FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax

Results :

We see that the grades obtained by students mainly depend on the following factors :

Out of the above features, the ones marked in red have a negative effect on the grades whereas the ones marked in blue have a positive impact on student grades.

So, on the basis of p-values obtained, we can reject the null hypothesis in favour of the alternate hypothesis.

On further breakdown of the weekly alcohol consumption on the basis of workday consumption(Dalc) and Weekend consumption(Walc), we find that consuming alcohol on workdays is very much more harmful than weekend consumption.

Managerial Relevance

The parents and teachers of these students(considered managers for this case), now have concrete proof that consuming alcohol ruins grades and will probably ruin the future of the student. They need to make their kids understand the harmful effects of excessive alcohol consumption, and should help them get rid of any addictions to alcoholic products.

References

I obtained this dataset from kaggle(https://www.kaggle.com/uciml/student-alcohol-consumption). It is a public dataset and I found it interesting as wwell as socially and managerially relevant. Also, I would like to thank Prof. Sameer Mathur for guiding me on this road of data analytics and making it possible for me to create this project.

Appendix 1 : Uncovering the data

some(mydata)
##     school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob
## 52      GP   F  15       U     LE3       T    4    2   health    other
## 69      GP   F  15       R     LE3       T    2    2   health services
## 71      GP   M  16       U     GT3       T    3    1    other    other
## 73      GP   F  15       R     GT3       T    1    1    other    other
## 122     GP   M  15       U     GT3       T    2    2 services services
## 399     GP   F  18       U     GT3       T    2    3  at_home    other
## 472     MS   F  16       R     GT3       T    2    2  at_home    other
## 483     MS   F  15       R     LE3       T    1    1  at_home    other
## 508     MS   F  17       U     LE3       T    1    1    other services
## 525     MS   F  16       R     LE3       T    3    4  at_home    other
##         reason guardian traveltime studytime failures schoolsup famsup
## 52       other   mother          1         2        0        no    yes
## 69  reputation   mother          2         2        0       yes    yes
## 71  reputation   father          2         4        0        no    yes
## 73  reputation   mother          1         2        0       yes    yes
## 122       home   father          1         4        0        no    yes
## 399     course   mother          1         3        0        no    yes
## 472     course   mother          2         2        1        no    yes
## 483     course   mother          2         1        0        no    yes
## 508     course   father          1         3        0        no    yes
## 525      other   mother          3         2        0        no    yes
##     paid activities nursery higher internet romantic famrel freetime goout
## 52    no         no     yes    yes      yes       no      4        3     3
## 69    no         no     yes    yes      yes       no      4        1     3
## 71    no         no     yes    yes      yes       no      4        3     2
## 73    no         no      no    yes      yes      yes      3        3     4
## 122   no        yes     yes    yes      yes       no      5        5     4
## 399   no         no     yes    yes      yes       no      4        3     3
## 472   no        yes      no    yes       no       no      4        4     4
## 483   no         no     yes     no       no      yes      5        2     1
## 508   no         no     yes    yes       no      yes      4        3     3
## 525   no         no      no    yes       no       no      4        2     1
##     Dalc Walc health absences G1 G2 G3 weekly
## 52     1    1      5        0 16 14 16      2
## 69     1    3      4        0 11 10 11      4
## 71     1    1      5        2 13 11 11      2
## 73     2    4      5        2 13 11 11      6
## 122    1    2      5        6 14 13 13      3
## 399    1    2      3        0 11 12 14      3
## 472    2    3      5        2 12 11 12      5
## 483    1    3      4        0  9 10  9      4
## 508    1    1      3        0 11 11 10      2
## 525    1    1      2        2  7  9  8      2

Viewing the data

No. of Rows in the data

nrow(mydata)
## [1] 649

No. of coloumns in the data

ncol(mydata)
## [1] 34

Description of each coloumn

str(mydata)
## 'data.frame':    649 obs. of  34 variables:
##  $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex       : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
##  $ famsize   : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
##  $ Pstatus   : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
##  $ Fjob      : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
##  $ reason    : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
##  $ guardian  : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
##  $ famsup    : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ paid      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
##  $ nursery   : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
##  $ higher    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ internet  : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ romantic  : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  4 2 6 0 0 6 0 2 0 0 ...
##  $ G1        : int  0 9 12 14 11 12 13 10 15 12 ...
##  $ G2        : int  11 11 13 14 13 12 12 13 16 12 ...
##  $ G3        : int  11 11 12 14 13 13 13 13 17 13 ...
##  $ weekly    : int  2 2 5 2 3 3 2 2 2 2 ...

Summarizing the data

summary(mydata)
##  school   sex          age        address famsize   Pstatus
##  GP:423   F:383   Min.   :15.00   R:197   GT3:457   A: 80  
##  MS:226   M:266   1st Qu.:16.00   U:452   LE3:192   T:569  
##                   Median :17.00                            
##                   Mean   :16.74                            
##                   3rd Qu.:18.00                            
##                   Max.   :22.00                            
##       Medu            Fedu             Mjob           Fjob    
##  Min.   :0.000   Min.   :0.000   at_home :135   at_home : 42  
##  1st Qu.:2.000   1st Qu.:1.000   health  : 48   health  : 23  
##  Median :2.000   Median :2.000   other   :258   other   :367  
##  Mean   :2.515   Mean   :2.307   services:136   services:181  
##  3rd Qu.:4.000   3rd Qu.:3.000   teacher : 72   teacher : 36  
##  Max.   :4.000   Max.   :4.000                                
##         reason      guardian     traveltime      studytime    
##  course    :285   father:153   Min.   :1.000   Min.   :1.000  
##  home      :149   mother:455   1st Qu.:1.000   1st Qu.:1.000  
##  other     : 72   other : 41   Median :1.000   Median :2.000  
##  reputation:143                Mean   :1.569   Mean   :1.931  
##                                3rd Qu.:2.000   3rd Qu.:2.000  
##                                Max.   :4.000   Max.   :4.000  
##     failures      schoolsup famsup     paid     activities nursery  
##  Min.   :0.0000   no :581   no :251   no :610   no :334    no :128  
##  1st Qu.:0.0000   yes: 68   yes:398   yes: 39   yes:315    yes:521  
##  Median :0.0000                                                     
##  Mean   :0.2219                                                     
##  3rd Qu.:0.0000                                                     
##  Max.   :3.0000                                                     
##  higher    internet  romantic      famrel         freetime   
##  no : 69   no :151   no :410   Min.   :1.000   Min.   :1.00  
##  yes:580   yes:498   yes:239   1st Qu.:4.000   1st Qu.:3.00  
##                                Median :4.000   Median :3.00  
##                                Mean   :3.931   Mean   :3.18  
##                                3rd Qu.:5.000   3rd Qu.:4.00  
##                                Max.   :5.000   Max.   :5.00  
##      goout            Dalc            Walc          health     
##  Min.   :1.000   Min.   :1.000   Min.   :1.00   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.00   1st Qu.:2.000  
##  Median :3.000   Median :1.000   Median :2.00   Median :4.000  
##  Mean   :3.185   Mean   :1.502   Mean   :2.28   Mean   :3.536  
##  3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:3.00   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.00   Max.   :5.000  
##     absences            G1             G2              G3       
##  Min.   : 0.000   Min.   : 0.0   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.:10.0   1st Qu.:10.00   1st Qu.:10.00  
##  Median : 2.000   Median :11.0   Median :11.00   Median :12.00  
##  Mean   : 3.659   Mean   :11.4   Mean   :11.57   Mean   :11.91  
##  3rd Qu.: 6.000   3rd Qu.:13.0   3rd Qu.:13.00   3rd Qu.:14.00  
##  Max.   :32.000   Max.   :19.0   Max.   :19.00   Max.   :19.00  
##      weekly      
##  Min.   : 2.000  
##  1st Qu.: 2.000  
##  Median : 3.000  
##  Mean   : 3.783  
##  3rd Qu.: 5.000  
##  Max.   :10.000

Appendix 2 : Exploring the relationships in the data

Distribution of students on the basis of age

table(mydata$age)
## 
##  15  16  17  18  19  20  21  22 
## 112 177 179 140  32   6   2   1
hist(mydata$age , right = FALSE , ylim = c(0,200) , main = "Age distribution of students" 
     , col = rainbow(9) , ylab = "No. of students")

Characteristics of the data

tab1 <- table(mydata$sex)
tab2 <- table(mydata$address)
tab3 <- table(mydata$romantic)
tab4 <- table(mydata$internet)
par(mfrow = c(2,2))
lbls <- paste(names(tab1))
pie3D(tab1 , labels = lbls , explode = 0.1 , col = c("blue", "pink") 
      , labelcex = 1 , main = "Males vs Females in the sample")
pie3D(tab2 , labels = c("Rural" , "Urban") , labelcex = 1 
      , explode = 0.1 , col = c("green", "yellow") , main = "Living Environment")
pie3D(tab3 , labels = paste(names(tab3)) , labelcex = 1 
      , explode = 0.1 , col = c("orange", "blue") , main = "Romantically involved")
pie3D(tab4 , labels = paste(names(tab4)) , explode = 0.1 , 
      labelcex = 1 , col = c("violet", "peachpuff") , main = "Internet access at home")

Family background of the students

Parents’ Education

hist(Fedu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.") 
     , ylim = c(0,300) , xlab = "Educational level" , main = "Father's Education"
     , col = rev(rainbow(8)))

hist(Medu , labels = c("None","Primary","","Middle School" ,"", "Secondary" ,"", "Higher Ed.") 
     , ylim = c(0,300) , xlab = "Educational level" , main = "Mother's Education" , 
     col = (rainbow(8)))

Parents’ Job

histogram(~Fjob , col = rainbow(6) , xlab = "Type of job" , main = "Father's Job")

histogram(~Mjob , col = rainbow(6) , xlab = "Type of job" , main = "Mother's Job")

Relationships in the family

table(famrel)
## famrel
##   1   2   3   4   5 
##  22  29 101 317 180
hist(famrel , col = rev(rainbow(10)) , ylim = c(0,350) 
     , main = "Type of Relationship among family members" 
     , xlab = "Strength of relations(1 : bad to 5 : very good)")

Health(current) of the respondents

table(mydata$health)
## 
##   1   2   3   4   5 
##  90  78 124 108 249
hist(mydata$health , main = "Currrent health status" 
     , xlab = "Health(1: poor to 5 : excellent)" , col = rainbow(20))

Comparison of Weekday and weekend alcohol consumptions

comp <- xtabs(~Dalc + Walc , data = mydata)
ftable(comp)
##      Walc   1   2   3   4   5
## Dalc                         
## 1         241 113  64  28   5
## 2           3  34  43  34   7
## 3           1   1   9  20  12
## 4           1   1   4   5   6
## 5           1   1   0   0  15
mosaic(comp , shade = TRUE)

Probing the existence of any correlation in b/w weekday and weekend alcohol consumption

cor.test(Dalc,Walc)
## 
##  Pearson's product-moment correlation
## 
## data:  Dalc and Walc
## t = 19.92, df = 647, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5664804 0.6621049
## sample estimates:
##       cor 
## 0.6165614

We can conclude that - indeed there is a correlation and a highly postive one - i.e those who drink more on weekdays also tend to drink more on weekends

Plotting the variation of alcohol consumption in the respondents

table(mydata$weekly)
## 
##   2   3   4   5   6   7   8   9  10 
## 241 116  99  73  50  32  17   6  15
hist(weekly , ylim = c(0,400) , main = "Total weekly alcohol consumption" 
     , xlab = "Level of consumption(0 : minimal to 10 : very heavy)" 
     , col = rainbow(10))

layout(matrix(c(1,2), 1, 2, byrow = TRUE))
table(mydata$Dalc)
## 
##   1   2   3   4   5 
## 451 121  43  17  17
hist(Dalc , ylim = c(0,500) , main = "Weekday alcohol consumption" 
     , xlab = "Consumption level(0 : minimal to 5 : extreme)" 
     , col = rainbow(10))
table(mydata$Walc)
## 
##   1   2   3   4   5 
## 247 150 120  87  45
hist(Walc , ylim = c(0,300) , main = "Weekend alcohol consumption" 
     , xlab = "Consumption Level(0 : minimal to 5 : extreme)" 
     , col = rainbow(10))

How do various indicators of performance vary with weekly alcohol consumption

aggregate(cbind(failures , absences , G3 , studytime) ~ weekly, data = mydata , FUN = mean)
##   weekly  failures absences       G3 studytime
## 1      2 0.2116183 2.804979 12.36929  2.120332
## 2      3 0.0862069 3.689655 12.62069  1.896552
## 3      4 0.2424242 3.313131 11.66667  1.949495
## 4      5 0.2191781 4.561644 11.91781  1.863014
## 5      6 0.2800000 4.940000 10.50000  1.640000
## 6      7 0.4687500 4.875000 10.59375  1.531250
## 7      8 0.4117647 4.117647 10.52941  1.647059
## 8      9 0.0000000 8.000000 10.16667  1.666667
## 9     10 0.4666667 5.933333 10.20000  1.600000

Grades v/s Alcohol

boxplot(G3 ~ weekly , data = mydata , horizontal = TRUE , 
        col = c("lightblue", "lightblue3", "lightblue4", "turquoise", 
                "#CCFF00FF", "#80FF00FF",
 "orange", "darkorange", "red"))

How much do the people involved in higher studies drink?

prop.table(xtabs(~ weekly + higher , data = mydata) , 2) * 100
##       higher
## weekly        no       yes
##     2  30.434783 37.931034
##     3  10.144928 18.793103
##     4  18.840580 14.827586
##     5  10.144928 11.379310
##     6   7.246377  7.758621
##     7  13.043478  3.965517
##     8   4.347826  2.413793
##     9   0.000000  1.034483
##     10  5.797101  1.896552

Visualising it

interaction.plot(weekly , higher , G3 , fun = mean , legend = TRUE , type = "b" 
                 , pch = c(16,18) , col = c("blue","green") , 
                 main = "Interaction between higher education and weekly alcohol consumption" 
                 , xlab = "Level of Alcohol Consumption" , ylab = "Mean of grade received")

We see that the people pursuing higher studies tend to drink less alcohol.

Appendix 3 : Analysing Correlations

d2 <- mydata
d2$school <- !(d2$school == "GP")
d2$school[d2$school == TRUE] <- 1
d2$school[d2$school == FALSE] <- 0
d2$school <- as.integer(d2$school)

d2$sex <- !(d2$sex == "M")
d2$sex[d2$sex == TRUE] <- 1
d2$sex[d2$sex == FALSE] <- 0
d2$sex <- as.integer(d2$sex)

d2$address <- !(d2$address == "R")
d2$address[d2$address == TRUE] <- 1
d2$address[d2$address == FALSE] <- 0
d2$address <- as.integer(d2$address)

d2$activities <- !(d2$activities == "no")
d2$activities[d2$activities == TRUE] <- 1
d2$activities[d2$activities == FALSE] <- 0
d2$activities <- as.integer(d2$activities)

d2$internet <- !(d2$internet == "no")
d2$internet[d2$internet == TRUE] <- 1
d2$internet[d2$internet == FALSE] <- 0
d2$internet <- as.integer(d2$internet)

d2$romantic <- mydata$romantic
d2$romantic <- !(d2$romantic == "no")
d2$romantic[d2$romantic == TRUE] <- 1
d2$romantic[d2$romantic == FALSE] <- 0
d2$romantic <- as.integer(d2$romantic)

d2$Pstatus <- !(d2$Pstatus == "A")
d2$Pstatus[d2$Pstatus == TRUE] <- 1
d2$Pstatus[d2$Pstatus == FALSE] <- 0
d2$Pstatus <- as.integer(d2$Pstatus)
cor(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)])
##                    Medu          Fedu   traveltime    studytime
## Medu        1.000000000  0.6474766091 -0.265079003  0.097005833
## Fedu        0.647476609  1.0000000000 -0.208287978  0.050399648
## traveltime -0.265079003 -0.2082879785  1.000000000 -0.063153904
## studytime   0.097005833  0.0503996477 -0.063153904  1.000000000
## activities  0.119354338  0.0796997847 -0.033375848  0.070080254
## internet    0.266052298  0.1834826715 -0.190826470  0.037528541
## romantic   -0.030992129 -0.0676748136  0.004750636  0.033035960
## famrel      0.024420573  0.0202558848 -0.009521185 -0.004127129
## Dalc       -0.007018319  0.0000607749  0.092824284 -0.137584739
## Walc       -0.019765786  0.0384447003  0.057007178 -0.214925105
## health      0.004614056  0.0449097884 -0.048261206 -0.056432694
## G3          0.240150757  0.2117996791 -0.127172967  0.249788690
##             activities    internet     romantic       famrel          Dalc
## Medu        0.11935434  0.26605230 -0.030992129  0.024420573 -0.0070183191
## Fedu        0.07969978  0.18348267 -0.067674814  0.020255885  0.0000607749
## traveltime -0.03337585 -0.19082647  0.004750636 -0.009521185  0.0928242836
## studytime   0.07008025  0.03752854  0.033035960 -0.004127129 -0.1375847394
## activities  1.00000000  0.08237483  0.057516633  0.057597473  0.0225920962
## internet    0.08237483  1.00000000  0.034831900  0.082214307  0.0428111958
## romantic    0.05751663  0.03483190  1.000000000 -0.044919757  0.0620421218
## famrel      0.05759747  0.08221431 -0.044919757  1.000000000 -0.0757672250
## Dalc        0.02259210  0.04281120  0.062042122 -0.075767225  1.0000000000
## Walc        0.03282417  0.06065091 -0.019970702 -0.093510806  0.6165613821
## health      0.01300056 -0.02279223 -0.018024906  0.109559217  0.0590674577
## G3          0.05979145  0.15002485 -0.090582884  0.063361128 -0.2047193972
##                   Walc       health          G3
## Medu       -0.01976579  0.004614056  0.24015076
## Fedu        0.03844470  0.044909788  0.21179968
## traveltime  0.05700718 -0.048261206 -0.12717297
## studytime  -0.21492510 -0.056432694  0.24978869
## activities  0.03282417  0.013000559  0.05979145
## internet    0.06065091 -0.022792225  0.15002485
## romantic   -0.01997070 -0.018024906 -0.09058288
## famrel     -0.09351081  0.109559217  0.06336113
## Dalc        0.61656138  0.059067458 -0.20471940
## Walc        1.00000000  0.114987972 -0.17661887
## health      0.11498797  1.000000000 -0.09885124
## G3         -0.17661887 -0.098851241  1.00000000
corrgram(d2[,c(7,8,13,14,19,22,23,24,27,28,29,33)] , upper.panel = panel.pie 
         , diag.panel = panel.minmax)

Scatterplot of Grades v/s Family Background data

scatterplotMatrix(~ G3 + Fedu + Medu + Pstatus + Mjob + Fjob + famrel, data = d2)

Scatterplot of Grades v/s Drinking Habits data

scatterplotMatrix(~ G3 + weekly + health + failures + absences + activities , data = mydata)

c1 <- d2[,c("G3" , "health" , "weekly" , "Dalc" , "Walc" 
            , "absences" , "activities" , "studytime" , "freetime" , "goout")]
mat1 <- rcorr(as.matrix(c1))
mat1
##               G3 health weekly  Dalc  Walc absences activities studytime
## G3          1.00  -0.10  -0.21 -0.20 -0.18    -0.09       0.06      0.25
## health     -0.10   1.00   0.10  0.06  0.11    -0.03       0.01     -0.06
## weekly     -0.21   0.10   1.00  0.86  0.93     0.18       0.03     -0.20
## Dalc       -0.20   0.06   0.86  1.00  0.62     0.17       0.02     -0.14
## Walc       -0.18   0.11   0.93  0.62  1.00     0.16       0.03     -0.21
## absences   -0.09  -0.03   0.18  0.17  0.16     1.00      -0.02     -0.12
## activities  0.06   0.01   0.03  0.02  0.03    -0.02       1.00      0.07
## studytime   0.25  -0.06  -0.20 -0.14 -0.21    -0.12       0.07      1.00
## freetime   -0.12   0.08   0.13  0.11  0.12    -0.02       0.15     -0.07
## goout      -0.09  -0.02   0.36  0.25  0.39     0.09       0.09     -0.08
##            freetime goout
## G3            -0.12 -0.09
## health         0.08 -0.02
## weekly         0.13  0.36
## Dalc           0.11  0.25
## Walc           0.12  0.39
## absences      -0.02  0.09
## activities     0.15  0.09
## studytime     -0.07 -0.08
## freetime       1.00  0.35
## goout          0.35  1.00
## 
## n= 649 
## 
## 
## P
##            G3     health weekly Dalc   Walc   absences activities
## G3                0.0117 0.0000 0.0000 0.0000 0.0199   0.1281    
## health     0.0117        0.0096 0.1328 0.0034 0.4419   0.7410    
## weekly     0.0000 0.0096        0.0000 0.0000 0.0000   0.4209    
## Dalc       0.0000 0.1328 0.0000        0.0000 0.0000   0.5656    
## Walc       0.0000 0.0034 0.0000 0.0000        0.0000   0.4038    
## absences   0.0199 0.4419 0.0000 0.0000 0.0000          0.7007    
## activities 0.1281 0.7410 0.4209 0.5656 0.4038 0.7007             
## studytime  0.0000 0.1510 0.0000 0.0004 0.0000 0.0025   0.0744    
## freetime   0.0017 0.0313 0.0010 0.0051 0.0022 0.6341   0.0001    
## goout      0.0256 0.6890 0.0000 0.0000 0.0000 0.0297   0.0240    
##            studytime freetime goout 
## G3         0.0000    0.0017   0.0256
## health     0.1510    0.0313   0.6890
## weekly     0.0000    0.0010   0.0000
## Dalc       0.0004    0.0051   0.0000
## Walc       0.0000    0.0022   0.0000
## absences   0.0025    0.6341   0.0297
## activities 0.0744    0.0001   0.0240
## studytime            0.0797   0.0547
## freetime   0.0797             0.0000
## goout      0.0547    0.0000

An interesting correlaton to note is that Family relations and grades are postively related, so a nice environment in the house helps students score better.