This dataset found on Kaggle is called Student Alcohol Consumption regarding seconday school (high school) students in Portugal from two different schools (‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira). However, the dataset also contains information about their social & family life and education.
I will be using the dataset to determine if there is a relationship between how much time a student spends studying per week and their family life.
The variables I will be examining are:
1. studytime: amount of time the student spends studying per week (1: <2 hours, 2: 2-5 hours, 3: 5-10 hours, 4: >10 hours)
2. sex: sex of student (M or F)
3. Pstatus: parent’s cohabitation status (‘T’ = together, ‘A’ = apart)
4. famsize: size of student’s family (‘LE3’ = less than or equal 3, ‘GT3’ = greater than 3)
5. age: age of the student
library(dplyr)
library(Zelig)
library(ZeligChoice)
library(texreg)
library(visreg)
library(readr)
pt_students <- read_csv("/Users/rachel_ramphal/Documents/student-mat.csv")
head(pt_students)
## # A tibble: 6 x 33
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 GP F 18 U GT3 A 4 4 at_h… teac… course
## 2 GP F 17 U GT3 T 1 1 at_h… other course
## 3 GP F 15 U LE3 T 1 1 at_h… other other
## 4 GP F 15 U GT3 T 4 2 heal… serv… home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 serv… other reput…
## # … with 22 more variables: guardian <chr>, traveltime <dbl>,
## # studytime <dbl>, failures <dbl>, schoolsup <chr>, famsup <chr>,
## # paid <chr>, activities <chr>, nursery <chr>, higher <chr>,
## # internet <chr>, romantic <chr>, famrel <dbl>, freetime <dbl>,
## # goout <dbl>, Dalc <dbl>, Walc <dbl>, health <dbl>, absences <dbl>,
## # G1 <dbl>, G2 <dbl>, G3 <dbl>
m1 <- zelig(studytime ~ sex + age, model = "poisson", data = pt_students, cite = F)
m2 <- zelig(studytime ~ sex + age + Pstatus, model = "poisson", data = pt_students, cite = F)
m3 <- zelig(studytime ~ sex + age + Pstatus + famsize, model = "poisson", data = pt_students, cite = F)
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 0.89 | 0.86 | 0.87 | |
| (0.47) | (0.48) | (0.48) | ||
| sexM | -0.26*** | -0.26*** | -0.25*** | |
| (0.07) | (0.07) | (0.07) | ||
| age | -0.00 | -0.00 | -0.00 | |
| (0.03) | (0.03) | (0.03) | ||
| PstatusT | 0.04 | 0.04 | ||
| (0.12) | (0.12) | |||
| famsizeLE3 | -0.04 | |||
| (0.08) | ||||
| AIC | 1137.30 | 1139.16 | 1140.92 | |
| BIC | 1149.24 | 1155.08 | 1160.81 | |
| Log Likelihood | -565.65 | -565.58 | -565.46 | |
| Deviance | 121.14 | 121.00 | 120.76 | |
| Num. obs. | 395 | 395 | 395 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
The best model of the 3 would be model 1 since it has the lowest AIC and BIC. This table shows that there is a small difference in study time for the students if their parents are together, a 0.03 unit increase in study time. It also shows that if the family size is less than 3 students study 0.03 units less. Model 1 shows the most significant relationship between study time and sex. Male students study 0.26 units less than female students per week. All the models consistenly show that age doesn’t affect the amount student’s study per week.
The relationships between study time and whether the student’s parents are living together or apart and the student’s family size and age seem to be insignificant so I will focus on the relationship between study time and sex.
pt_students$sex <- as.factor(pt_students$sex)
pt_students$Pstatus <- as.factor(pt_students$Pstatus)
pt_students$famsize <- as.factor(pt_students$famsize)
m4 <- zelig(studytime ~ sex + Pstatus + famsize, model = "mlogit", data = pt_students, cite = F)
summary(m1)
## Model:
##
## Call:
## z5$zelig(formula = studytime ~ sex + age, data = pt_students)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9640 -0.6151 -0.1807 0.1811 1.4497
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.894620 0.469385 1.906 0.056658
## sexM -0.255998 0.071724 -3.569 0.000358
## age -0.004242 0.027931 -0.152 0.879299
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 134.04 on 394 degrees of freedom
## Residual deviance: 121.14 on 392 degrees of freedom
## AIC: 1137.3
##
## Number of Fisher Scoring iterations: 4
##
## Next step: Use 'setx' method
x.together <- setx(m4, Pstatus = "T")
x.apart <- setx(m4, Pstatus = "A")
student.multi <- sim(m4, x = x.apart, x1 = x.together)
summary(student.multi)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## Pr(Y=1) 0.12257269 0.04825453 0.11505807 0.05185953 0.2319644
## Pr(Y=2) 0.61058680 0.08911170 0.61734292 0.42357147 0.7663122
## Pr(Y=3) 0.19876428 0.08631875 0.18867553 0.07142296 0.4089628
## Pr(Y=4) 0.06807623 0.04806086 0.05699117 0.01228072 0.1979564
## pv
## 1 2 3 4
## [1,] 0.12 0.576 0.22 0.084
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## Pr(Y=1) 0.13203009 0.02412663 0.12970395 0.08933221 0.1846665
## Pr(Y=2) 0.48399826 0.03764825 0.48220772 0.41141233 0.5562289
## Pr(Y=3) 0.29983758 0.03605161 0.29922574 0.23186890 0.3734656
## Pr(Y=4) 0.08413407 0.02244846 0.08216416 0.04652813 0.1347041
## pv
## 1 2 3 4
## [1,] 0.136 0.463 0.315 0.086
## fd
## mean sd 50% 2.5% 97.5%
## Pr(Y=1) 0.00945740 0.04645138 0.01389580 -0.09124674 0.09155302
## Pr(Y=2) -0.12658854 0.09173130 -0.13001736 -0.29827035 0.06439327
## Pr(Y=3) 0.10107330 0.09237550 0.11119307 -0.10659064 0.24851107
## Pr(Y=4) 0.01605784 0.04952497 0.02472408 -0.11390474 0.08930134
par(mar=c(2,2,2,2))
plot(student.multi)
Whether a student’s parents are living together or apart does not seem to affect how much a student studies per week.
It appears that from my selected variables, only sex has a significant influence on how much a student studies per week.