This dataset found on Kaggle is called Student Alcohol Consumption regarding seconday school (high school) students in Portugal from two different schools (‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira). However, the dataset also contains information about their social & family life and education.

I will be using the dataset to determine if there is a relationship between how much time a student spends studying per week and their family life.

The variables I will be examining are:
1. studytime: amount of time the student spends studying per week (1: <2 hours, 2: 2-5 hours, 3: 5-10 hours, 4: >10 hours)
2. sex: sex of student (M or F)
3. Pstatus: parent’s cohabitation status (‘T’ = together, ‘A’ = apart)
4. famsize: size of student’s family (‘LE3’ = less than or equal 3, ‘GT3’ = greater than 3)
5. age: age of the student

Reading in the Dataset

library(dplyr)
library(Zelig)
library(ZeligChoice)
library(texreg)
library(visreg)
library(readr)
pt_students <- read_csv("/Users/rachel_ramphal/Documents/student-mat.csv")
head(pt_students)
## # A tibble: 6 x 33
##   school sex     age address famsize Pstatus  Medu  Fedu Mjob  Fjob  reason
##   <chr>  <chr> <dbl> <chr>   <chr>   <chr>   <dbl> <dbl> <chr> <chr> <chr> 
## 1 GP     F        18 U       GT3     A           4     4 at_h… teac… course
## 2 GP     F        17 U       GT3     T           1     1 at_h… other course
## 3 GP     F        15 U       LE3     T           1     1 at_h… other other 
## 4 GP     F        15 U       GT3     T           4     2 heal… serv… home  
## 5 GP     F        16 U       GT3     T           3     3 other other home  
## 6 GP     M        16 U       LE3     T           4     3 serv… other reput…
## # … with 22 more variables: guardian <chr>, traveltime <dbl>,
## #   studytime <dbl>, failures <dbl>, schoolsup <chr>, famsup <chr>,
## #   paid <chr>, activities <chr>, nursery <chr>, higher <chr>,
## #   internet <chr>, romantic <chr>, famrel <dbl>, freetime <dbl>,
## #   goout <dbl>, Dalc <dbl>, Walc <dbl>, health <dbl>, absences <dbl>,
## #   G1 <dbl>, G2 <dbl>, G3 <dbl>

Creating Models

Models 1-3
m1 <- zelig(studytime ~ sex + age, model = "poisson", data = pt_students, cite = F)
m2 <- zelig(studytime ~ sex + age + Pstatus, model = "poisson", data = pt_students, cite = F)
m3 <- zelig(studytime ~ sex + age + Pstatus + famsize, model = "poisson", data = pt_students, cite = F)
Statistical models
Model 1 Model 2 Model 3
(Intercept) 0.89 0.86 0.87
(0.47) (0.48) (0.48)
sexM -0.26*** -0.26*** -0.25***
(0.07) (0.07) (0.07)
age -0.00 -0.00 -0.00
(0.03) (0.03) (0.03)
PstatusT 0.04 0.04
(0.12) (0.12)
famsizeLE3 -0.04
(0.08)
AIC 1137.30 1139.16 1140.92
BIC 1149.24 1155.08 1160.81
Log Likelihood -565.65 -565.58 -565.46
Deviance 121.14 121.00 120.76
Num. obs. 395 395 395
p < 0.001, p < 0.01, p < 0.05

The best model of the 3 would be model 1 since it has the lowest AIC and BIC. This table shows that there is a small difference in study time for the students if their parents are together, a 0.03 unit increase in study time. It also shows that if the family size is less than 3 students study 0.03 units less. Model 1 shows the most significant relationship between study time and sex. Male students study 0.26 units less than female students per week. All the models consistenly show that age doesn’t affect the amount student’s study per week.

The relationships between study time and whether the student’s parents are living together or apart and the student’s family size and age seem to be insignificant so I will focus on the relationship between study time and sex.

Creating Factors

pt_students$sex <- as.factor(pt_students$sex)
pt_students$Pstatus <- as.factor(pt_students$Pstatus)
pt_students$famsize <- as.factor(pt_students$famsize)
Model 4
m4 <- zelig(studytime ~ sex + Pstatus + famsize, model = "mlogit", data = pt_students, cite = F)
summary(m1)
## Model: 
## 
## Call:
## z5$zelig(formula = studytime ~ sex + age, data = pt_students)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9640  -0.6151  -0.1807   0.1811   1.4497  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.894620   0.469385   1.906 0.056658
## sexM        -0.255998   0.071724  -3.569 0.000358
## age         -0.004242   0.027931  -0.152 0.879299
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 134.04  on 394  degrees of freedom
## Residual deviance: 121.14  on 392  degrees of freedom
## AIC: 1137.3
## 
## Number of Fisher Scoring iterations: 4
## 
## Next step: Use 'setx' method
x.together <- setx(m4, Pstatus = "T")
x.apart <- setx(m4, Pstatus = "A")
student.multi <- sim(m4, x = x.apart, x1 = x.together)
summary(student.multi)
## 
##  sim x :
##  -----
## ev
##               mean         sd        50%       2.5%     97.5%
## Pr(Y=1) 0.12257269 0.04825453 0.11505807 0.05185953 0.2319644
## Pr(Y=2) 0.61058680 0.08911170 0.61734292 0.42357147 0.7663122
## Pr(Y=3) 0.19876428 0.08631875 0.18867553 0.07142296 0.4089628
## Pr(Y=4) 0.06807623 0.04806086 0.05699117 0.01228072 0.1979564
## pv
##         1     2    3     4
## [1,] 0.12 0.576 0.22 0.084
## 
##  sim x1 :
##  -----
## ev
##               mean         sd        50%       2.5%     97.5%
## Pr(Y=1) 0.13203009 0.02412663 0.12970395 0.08933221 0.1846665
## Pr(Y=2) 0.48399826 0.03764825 0.48220772 0.41141233 0.5562289
## Pr(Y=3) 0.29983758 0.03605161 0.29922574 0.23186890 0.3734656
## Pr(Y=4) 0.08413407 0.02244846 0.08216416 0.04652813 0.1347041
## pv
##          1     2     3     4
## [1,] 0.136 0.463 0.315 0.086
## fd
##                mean         sd         50%        2.5%      97.5%
## Pr(Y=1)  0.00945740 0.04645138  0.01389580 -0.09124674 0.09155302
## Pr(Y=2) -0.12658854 0.09173130 -0.13001736 -0.29827035 0.06439327
## Pr(Y=3)  0.10107330 0.09237550  0.11119307 -0.10659064 0.24851107
## Pr(Y=4)  0.01605784 0.04952497  0.02472408 -0.11390474 0.08930134
par(mar=c(2,2,2,2))
plot(student.multi)

Whether a student’s parents are living together or apart does not seem to affect how much a student studies per week.
It appears that from my selected variables, only sex has a significant influence on how much a student studies per week.