The Ann Arbor Public Schools Board of Education (AAPS) would like to understand the effect that student absences have on mathematical educational performance. As a preliminary analysis, they would like to examine the Student Performance data set from the UCI Machine Learning Repository on student secondary educational achievement. This work will be used to inform further research. AAPS would like you to analyze these data to assess the impact of three or more absences versus less than three on math final grade. Additionally, they would like to identify student attributes that possibly contribution absences.
Data and Data Dictionare are available at: http://archive.ics.uci.edu/ml/machine-learning-databases/00320/
Variables/Data Dictionary:
school - student’s school (binary: “GP” - Gabriel Pereira or “MS” - Mousinho da Silveira)
sex - student’s sex (binary: “F” - female or “M” - male)
age - student’s age (numeric: from 15 to 22)
address - student’s home address type (binary: “U” - urban or “R” - rural)
famsize - family size (binary: “LE3” - less or equal to 3 or “GT3” - greater than 3)
Pstatus - parent’s cohabitation status (binary: “T” - living together or “A” - apart)
Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
Mjob - mother’s job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
Fjob - father’s job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)
reason - reason to choose this school (nominal: close to “home”, school “reputation”, “course” preference or “other”)
guardian - student’s guardian (nominal: “mother”, “father” or “other”)
traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures - number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup - extra educational support (binary: yes or no)
famsup - family educational support (binary: yes or no)
paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
activities - extra-curricular activities (binary: yes or no)
nursery - attended nursery school (binary: yes or no)
higher - wants to take higher education (binary: yes or no)
internet - Internet access at home (binary: yes or no)
romantic - with a romantic relationship (binary: yes or no)
famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime - free time after school (numeric: from 1 - very low to 5 - very high)
goout - going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health - current health status (numeric: from 1 - very bad to 5 - very good)
absences - number of school absences (numeric: from 0 to 93)
G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target)
setwd("C:/Users/sweeneys/Desktop/")
data=read.csv("student-mat.csv",sep=",",header=TRUE)
head(data)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## 1 GP F 18 U GT3 A 4 4 at_home teacher course
## 2 GP F 17 U GT3 T 1 1 at_home other course
## 3 GP F 15 U LE3 T 1 1 at_home other other
## 4 GP F 15 U GT3 T 4 2 health services home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 services other reputation
## guardian traveltime studytime failures schoolsup famsup paid activities
## 1 mother 2 2 0 yes no no no
## 2 father 1 2 0 no yes no no
## 3 mother 1 2 3 yes no yes no
## 4 mother 1 3 0 no yes yes yes
## 5 father 1 2 0 no yes yes no
## 6 mother 1 2 0 no yes yes yes
## nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1 yes yes no no 4 3 4 1 1 3
## 2 no yes yes no 5 3 3 1 1 3
## 3 yes yes yes no 4 3 2 2 3 3
## 4 yes yes yes yes 3 2 2 1 1 5
## 5 yes yes no no 4 3 2 1 2 5
## 6 yes yes yes no 5 4 2 1 2 5
## absences G1 G2 G3
## 1 6 5 6 6
## 2 4 5 5 6
## 3 10 7 8 10
## 4 2 15 14 15
## 5 4 6 10 10
## 6 10 15 15 15
dim(data)
## [1] 395 33
# Correcting appropriate variables to factors
data$Medu<-as.factor(data$Medu)
data$Fedu<-as.factor(data$Fedu)
data$traveltime<-as.factor(data$traveltime)
data$studytime<-as.factor(data$studytime)
data$famrel<-as.factor(data$famrel)
data$freetime<-as.factor(data$freetime)
data$goout<-as.factor(data$goout)
data$Dalc<-as.factor(data$Dalc)
data$Walc<-as.factor(data$Walc)
data$health<-as.factor(data$health)
#Assign Treatment Groups
data$treat <- ifelse(data$absences < 3, 0,1)
# Check for missing data
dat = data[complete.cases(data),]
head(dat)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## 1 GP F 18 U GT3 A 4 4 at_home teacher course
## 2 GP F 17 U GT3 T 1 1 at_home other course
## 3 GP F 15 U LE3 T 1 1 at_home other other
## 4 GP F 15 U GT3 T 4 2 health services home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 services other reputation
## guardian traveltime studytime failures schoolsup famsup paid activities
## 1 mother 2 2 0 yes no no no
## 2 father 1 2 0 no yes no no
## 3 mother 1 2 3 yes no yes no
## 4 mother 1 3 0 no yes yes yes
## 5 father 1 2 0 no yes yes no
## 6 mother 1 2 0 no yes yes yes
## nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1 yes yes no no 4 3 4 1 1 3
## 2 no yes yes no 5 3 3 1 1 3
## 3 yes yes yes no 4 3 2 2 3 3
## 4 yes yes yes yes 3 2 2 1 1 5
## 5 yes yes no no 4 3 2 1 2 5
## 6 yes yes yes no 5 4 2 1 2 5
## absences G1 G2 G3 treat
## 1 6 5 6 6 1
## 2 4 5 5 6 1
## 3 10 7 8 10 1
## 4 2 15 14 15 0
## 5 4 6 10 10 1
## 6 10 15 15 15 1
dim(dat)
## [1] 395 34
# No Missing Data in the dataset
Most students attend Gabriel Pereira School (349) as compared to Mousinho de Silveira (46)
There are moderately more females in the data set (208 v 187)
Most students live in an urban community (307 urban v 88 rural)
Over double the students live in a family of >3 members as compared to <3
Most parents are living together (354 v 41)
Mothers and fathers seem to have similar breakdown of education status with relative equal numbers completing 5-9th grade, secondary education, and higher education
Most parent jobs are in “Other” or "Services categories
Mother is the generally identified “guardian”
The reason students primarily chose their school is equally variable between options
The majority of students live within 30 minutes of their school
Most students study 5 or fewer hours a week
Most students have not previously failed, if they have failed a course it is most often 1 course
Most students do not receive additional support for school (344 v 51)
Only about 2/3 of students receive family support for their education
Nearly 1/3 of students are receiving additional paid courses in Math
81 students did not attend nursery school
Most students want to pursue higher education
66 students do not have internet access
1/3 of students are in a romantic relationship
While most students report good relationships with their family, others report very poor relationships
Many students consume alcohol on the weekends, but a small cohort also drinks heavily on weekdays
The median number of absences for students is 4 but the max is 75 - wide range
Grades appear to follow a generally normal distribution
A fair number of students have received a 0 on Grade 3 - these datapoints should be reviewed with those familiar with data collection to ensure those are a true score of 0 rather than missing data. For this analysis those scores are assumed to be true scores.
#Review of Grade Distribution
summary(data)
## school sex age address famsize Pstatus Medu Fedu
## GP:349 F:208 Min. :15.0 R: 88 GT3:281 A: 41 0: 3 0: 2
## MS: 46 M:187 1st Qu.:16.0 U:307 LE3:114 T:354 1: 59 1: 82
## Median :17.0 2:103 2:115
## Mean :16.7 3: 99 3:100
## 3rd Qu.:18.0 4:131 4: 96
## Max. :22.0
## Mjob Fjob reason guardian traveltime
## at_home : 59 at_home : 20 course :145 father: 90 1:257
## health : 34 health : 18 home :109 mother:273 2:107
## other :141 other :217 other : 36 other : 32 3: 23
## services:103 services:111 reputation:105 4: 8
## teacher : 58 teacher : 29
##
## studytime failures schoolsup famsup paid activities nursery
## 1:105 Min. :0.0000 no :344 no :153 no :214 no :194 no : 81
## 2:198 1st Qu.:0.0000 yes: 51 yes:242 yes:181 yes:201 yes:314
## 3: 65 Median :0.0000
## 4: 27 Mean :0.3342
## 3rd Qu.:0.0000
## Max. :3.0000
## higher internet romantic famrel freetime goout Dalc Walc health
## no : 20 no : 66 no :263 1: 8 1: 19 1: 23 1:276 1:151 1: 47
## yes:375 yes:329 yes:132 2: 18 2: 64 2:103 2: 75 2: 85 2: 45
## 3: 68 3:157 3:130 3: 26 3: 80 3: 91
## 4:195 4:115 4: 86 4: 9 4: 51 4: 66
## 5:106 5: 40 5: 53 5: 9 5: 28 5:146
##
## absences G1 G2 G3
## Min. : 0.000 Min. : 3.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 8.00 1st Qu.: 9.00 1st Qu.: 8.00
## Median : 4.000 Median :11.00 Median :11.00 Median :11.00
## Mean : 5.709 Mean :10.91 Mean :10.71 Mean :10.42
## 3rd Qu.: 8.000 3rd Qu.:13.00 3rd Qu.:13.00 3rd Qu.:14.00
## Max. :75.000 Max. :19.00 Max. :19.00 Max. :20.00
## treat
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5367
## 3rd Qu.:1.0000
## Max. :1.0000
hist(data$G3)
# Review of Absences
hist(data$absences)
Attempt to create a DAG using initial beliefs about where causal relationships may exist to graphically review possible confounder/colliders.
#digraph G {
# Address -> Travel Time
# Address -> Reason
# Address -> School
# Address -> Nursery School
# Address -> Internet
# Address -> Going Out
# Family Size -> Address
# Family Size -> Family Support
# Family Size -> Family Relationship
# Parent Living Arrangement -> Family Support
# Parent Living Arrangement -> Guardian
# Parent Living Arrangement -> Family Relationship
# Parent Education -> Parent Job
# Parent Education -> Family Support
# Parent Education -> Higher Education Plans
# Parent Job -> Address
# Parent Job -> Family Size
# Parent Job -> Guardian
# Parent Job -> Family Support
# Parent Job -> Paid Course
# Parent Job -> Higher Education Plans
# Parent Job -> Internet at Home
# Parent Job -> Family Relationship
# Parent Job -> Going Out
# Travel Time -> Absences
# Travel Time -> Study Time
# Travel Time -> Activities
# Travel Time -> Free Time
# Study Time -> Failures
# Study Time -> Activities
# Study Time -> Romantic Relationship
# Study Time -> Free Time
# Study Time -> Final Grade
# Failures -> Absences
# Failures -> School Support
# Failures -> Activities
# Failures -> Higher Education Plans
# School Support -> Failures
# School Support -> Activities
# School Support -> Free Time
# School Support -> Final Grade
# Family Support -> Failures
# Family Support -> Paid Course
# Family Support -> Nursery School
# Family Support -> Higher Education Plans
# Family Support -> Family Relationship
# Paid Course -> Failures
# Paid Course -> Free Time
# Paid Course -> Final Grade
# Paid course -> Study Time
# Paid Coures -> Activities
# Activities -> Study Time
# Activities -> Free Time
# Activities -> Health
# Higher Education Plans -> Study Time
# Higher Education Plans -> Activities
# Internet -> Study Time
# Internet -> Going Out
# Romantic Relationship -> Free Time
# Romantic Relationship -> Going Out
# Family Relationship -> Family Support
# Family Relationship -> Going Out
# Free Time -> Study Time
# Free Time -> Activities
# Free Time -> Going Out
# Going Out -> Free Time
# Going Out -> Alcohol Consumption
# Alcohol Consumption -> Health
# Health -> Absences
# Absences -> Final Grade
#}
data$treat2 <- ifelse(data$treat == 1, TRUE, FALSE)
#school
schoolcounts<-table(data$school, data$treat)
schoolcounts
##
## 0 1
## GP 161 188
## MS 22 24
schoolperc<-prop.table(schoolcounts,1)
schoolperc
##
## 0 1
## GP 0.4613181 0.5386819
## MS 0.4782609 0.5217391
chisq.test(data$school,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$school and data$treat
## X-squared = 0.00352, df = 1, p-value = 0.9527
ggplot(data, aes(x=school, fill=treat2)) + geom_bar(position = 'dodge')
#sex
sexcounts<-table(data$sex, data$treat)
sexperc<-prop.table(sexcounts,1)
sexperc
##
## 0 1
## F 0.4663462 0.5336538
## M 0.4598930 0.5401070
chisq.test(data$sex,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$sex and data$treat
## X-squared = 0.00074923, df = 1, p-value = 0.9782
ggplot(data, aes(x=sex, fill=treat2)) + geom_bar(position = 'dodge')
#age
tapply(data$age, data$treat, mean)
## 0 1
## 16.46448 16.89623
tapply(data$age, data$treat, sd)
## 0 1
## 1.216912 1.294752
t.test(data$age ~ data$treat)
##
## Welch Two Sample t-test
##
## data: data$age by data$treat
## t = -3.4133, df = 390.14, p-value = 0.0007091
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.6804325 -0.1830585
## sample estimates:
## mean in group 0 mean in group 1
## 16.46448 16.89623
ggplot(data, aes(x=age, fill=treat2)) + geom_density(alpha=0.25)
#address
addresscounts<-table(data$address, data$treat)
addressperc<-prop.table(addresscounts,1)
addressperc
##
## 0 1
## R 0.4318182 0.5681818
## U 0.4723127 0.5276873
chisq.test(data$address,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$address and data$treat
## X-squared = 0.30289, df = 1, p-value = 0.5821
ggplot(data, aes(x=address, fill=treat2)) + geom_bar(position = 'dodge')
#famsize
famsizecounts<-table(data$famsize, data$treat)
famsizeperc<-prop.table(famsizecounts,1)
famsizeperc
##
## 0 1
## GT3 0.4804270 0.5195730
## LE3 0.4210526 0.5789474
chisq.test(data$famsize,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$famsize and data$treat
## X-squared = 0.92341, df = 1, p-value = 0.3366
ggplot(data, aes(x=famsize, fill=treat2)) + geom_bar(position = 'dodge')
#Pstatus
Pstatuscounts<-table(data$Pstatus, data$treat)
Pstatusperc<-prop.table(Pstatuscounts,1)
Pstatusperc
##
## 0 1
## A 0.3170732 0.6829268
## T 0.4802260 0.5197740
chisq.test(data$Pstatus,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$Pstatus and data$treat
## X-squared = 3.3048, df = 1, p-value = 0.06908
ggplot(data, aes(x=Pstatus, fill=treat2)) + geom_bar(position = 'dodge')
#Medu
Meducounts<-table(data$Medu, data$treat)
Meduperc<-prop.table(Meducounts,1)
Meduperc
##
## 0 1
## 0 1.0000000 0.0000000
## 1 0.5593220 0.4406780
## 2 0.4854369 0.5145631
## 3 0.4040404 0.5959596
## 4 0.4351145 0.5648855
chisq.test(data$Medu,data$treat)
## Warning in chisq.test(data$Medu, data$treat): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data$Medu and data$treat
## X-squared = 7.6828, df = 4, p-value = 0.1039
ggplot(data, aes(x=Medu, fill=treat2)) + geom_bar(position = 'dodge')
#Fedu
Feducounts<-table(data$Fedu, data$treat)
Feduperc<-prop.table(Feducounts,1)
Feduperc
##
## 0 1
## 0 0.5000000 0.5000000
## 1 0.4878049 0.5121951
## 2 0.4695652 0.5304348
## 3 0.4200000 0.5800000
## 4 0.4791667 0.5208333
chisq.test(data$Fedu,data$treat)
## Warning in chisq.test(data$Fedu, data$treat): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data$Fedu and data$treat
## X-squared = 1.0782, df = 4, p-value = 0.8977
ggplot(data, aes(x=Fedu, fill=treat2)) + geom_bar(position = 'dodge')
#Mjob
Mjobcounts<-table(data$Mjob, data$treat)
Mjobperc<-prop.table(Mjobcounts,1)
Mjobperc
##
## 0 1
## at_home 0.5254237 0.4745763
## health 0.5000000 0.5000000
## other 0.4539007 0.5460993
## services 0.4368932 0.5631068
## teacher 0.4482759 0.5517241
chisq.test(data$Mjob,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$Mjob and data$treat
## X-squared = 1.4915, df = 4, p-value = 0.8281
ggplot(data, aes(x=Mjob, fill=treat2)) + geom_bar(position = 'dodge')
#Fjob
Fjobcounts<-table(data$Fjob, data$treat)
Fjobperc<-prop.table(Fjobcounts,1)
Fjobperc
##
## 0 1
## at_home 0.5000000 0.5000000
## health 0.5000000 0.5000000
## other 0.4423963 0.5576037
## services 0.4954955 0.5045045
## teacher 0.4482759 0.5517241
chisq.test(data$Fjob,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$Fjob and data$treat
## X-squared = 1.0762, df = 4, p-value = 0.898
ggplot(data, aes(x=Fjob, fill=treat2)) + geom_bar(position = 'dodge')
#reason
reasoncounts<-table(data$reason, data$treat)
reasonperc<-prop.table(reasoncounts,1)
reasonperc
##
## 0 1
## course 0.5517241 0.4482759
## home 0.4311927 0.5688073
## other 0.3888889 0.6111111
## reputation 0.4000000 0.6000000
chisq.test(data$reason,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$reason and data$treat
## X-squared = 7.5051, df = 3, p-value = 0.05743
ggplot(data, aes(x=reason, fill=treat2)) + geom_bar(position = 'dodge')
#guardian
guardiancounts<-table(data$guardian, data$treat)
guardianperc<-prop.table(guardiancounts,1)
guardianperc
##
## 0 1
## father 0.5333333 0.4666667
## mother 0.4652015 0.5347985
## other 0.2500000 0.7500000
chisq.test(data$guardian,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$guardian and data$treat
## X-squared = 7.6344, df = 2, p-value = 0.02199
ggplot(data, aes(x=guardian, fill=treat2)) + geom_bar(position = 'dodge')
#traveltime
traveltimecounts<-table(data$traveltime, data$treat)
traveltimeperc<-prop.table(traveltimecounts,1)
traveltimeperc
##
## 0 1
## 1 0.4513619 0.5486381
## 2 0.4953271 0.5046729
## 3 0.4347826 0.5652174
## 4 0.5000000 0.5000000
chisq.test(data$traveltime,data$treat)
## Warning in chisq.test(data$traveltime, data$treat): Chi-squared approximation
## may be incorrect
##
## Pearson's Chi-squared test
##
## data: data$traveltime and data$treat
## X-squared = 0.70726, df = 3, p-value = 0.8715
ggplot(data, aes(x=traveltime, fill=treat2)) + geom_bar(position = 'dodge')
#studytime
studytimecounts<-table(data$studytime, data$treat)
studytimeperc<-prop.table(studytimecounts,1)
studytimeperc
##
## 0 1
## 1 0.4190476 0.5809524
## 2 0.4848485 0.5151515
## 3 0.4923077 0.5076923
## 4 0.4074074 0.5925926
chisq.test(data$studytime,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$studytime and data$treat
## X-squared = 1.7559, df = 3, p-value = 0.6246
ggplot(data, aes(x=studytime, fill=treat2)) + geom_bar(position = 'dodge')
#failures
tapply(data$failures, data$treat, mean)
## 0 1
## 0.3114754 0.3537736
tapply(data$age, data$treat, sd)
## 0 1
## 1.216912 1.294752
t.test(data$failures ~ data$treat)
##
## Welch Two Sample t-test
##
## data: data$failures by data$treat
## t = -0.56152, df = 379.57, p-value = 0.5748
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1904093 0.1058130
## sample estimates:
## mean in group 0 mean in group 1
## 0.3114754 0.3537736
ggplot(data, aes(x=failures, fill=treat2)) + geom_density(alpha=0.25)
#schoolsup
schoolsupcounts<-table(data$schoolsup, data$treat)
schoolsupperc<-prop.table(schoolsupcounts,1)
schoolsupperc
##
## 0 1
## no 0.4651163 0.5348837
## yes 0.4509804 0.5490196
chisq.test(data$schoolsup, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$schoolsup and data$treat
## X-squared = 0.00148, df = 1, p-value = 0.9693
ggplot(data, aes(x=schoolsup, fill=treat2)) + geom_bar(position = 'dodge')
#famsup
famsupcounts<-table(data$famsup, data$treat)
famsupperc<-prop.table(famsupcounts,1)
famsupperc
##
## 0 1
## no 0.4575163 0.5424837
## yes 0.4669421 0.5330579
chisq.test(data$famsup, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$famsup and data$treat
## X-squared = 0.0063114, df = 1, p-value = 0.9367
ggplot(data, aes(x=famsup, fill=treat2)) + geom_bar(position = 'dodge')
#paid
paidcounts<-table(data$paid, data$treat)
paidperc<-prop.table(paidcounts,1)
paidperc
##
## 0 1
## no 0.4766355 0.5233645
## yes 0.4475138 0.5524862
chisq.test(data$paid,data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$paid and data$treat
## X-squared = 0.22759, df = 1, p-value = 0.6333
ggplot(data, aes(x=paid, fill=treat2)) + geom_bar(position = 'dodge')
#activities
activitiescounts<-table(data$activities, data$treat)
activitiesperc<-prop.table(activitiescounts,1)
activitiesperc
##
## 0 1
## no 0.4793814 0.5206186
## yes 0.4477612 0.5522388
chisq.test(data$activities, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$activities and data$treat
## X-squared = 0.27997, df = 1, p-value = 0.5967
ggplot(data, aes(x=activities, fill=treat2)) + geom_bar(position = 'dodge')
#nursery
nurserycounts<-table(data$nursery, data$treat)
nurseryperc<-prop.table(nurserycounts,1)
nurseryperc
##
## 0 1
## no 0.4814815 0.5185185
## yes 0.4585987 0.5414013
chisq.test(data$nursery, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$nursery and data$treat
## X-squared = 0.059182, df = 1, p-value = 0.8078
ggplot(data, aes(x=nursery, fill=treat2)) + geom_bar(position = 'dodge')
#higher
highercounts<-table(data$higher, data$treat)
higherperc<-prop.table(highercounts,1)
higherperc
##
## 0 1
## no 0.600 0.400
## yes 0.456 0.544
chisq.test(data$higher, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$higher and data$treat
## X-squared = 1.0573, df = 1, p-value = 0.3038
ggplot(data, aes(x=higher, fill=treat2)) + geom_bar(position = 'dodge')
#internet
internetcounts<-table(data$internet, data$treat)
internetperc<-prop.table(internetcounts,1)
internetperc
##
## 0 1
## no 0.5151515 0.4848485
## yes 0.4528875 0.5471125
chisq.test(data$internet, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$internet and data$treat
## X-squared = 0.62497, df = 1, p-value = 0.4292
ggplot(data, aes(x=internet, fill=treat2)) + geom_bar(position = 'dodge')
#romantic
romanticcounts<-table(data$romantic, data$treat)
romanticperc<-prop.table(romanticcounts,1)
romanticperc
##
## 0 1
## no 0.4790875 0.5209125
## yes 0.4318182 0.5681818
chisq.test(data$romantic, data$treat)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: data$romantic and data$treat
## X-squared = 0.6111, df = 1, p-value = 0.4344
ggplot(data, aes(x=romantic, fill=treat2)) + geom_bar(position = 'dodge')
#famrel
famrelcounts<-table(data$famrel, data$treat)
famrelperc<-prop.table(famrelcounts,1)
famrelperc
##
## 0 1
## 1 0.2500000 0.7500000
## 2 0.3333333 0.6666667
## 3 0.4411765 0.5588235
## 4 0.4871795 0.5128205
## 5 0.4716981 0.5283019
chisq.test(data$famrel, data$treat)
## Warning in chisq.test(data$famrel, data$treat): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data$famrel and data$treat
## X-squared = 3.2977, df = 4, p-value = 0.5093
ggplot(data, aes(x=famrel, fill=treat2)) + geom_bar(position = 'dodge')
#freetime
freetimecounts<-table(data$freetime, data$treat)
freetimeperc<-prop.table(freetimecounts,1)
freetimeperc
##
## 0 1
## 1 0.4210526 0.5789474
## 2 0.4218750 0.5781250
## 3 0.5159236 0.4840764
## 4 0.4434783 0.5565217
## 5 0.4000000 0.6000000
chisq.test(data$freetime,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$freetime and data$treat
## X-squared = 3.1529, df = 4, p-value = 0.5326
ggplot(data, aes(x=freetime, fill=treat2)) + geom_bar(position = 'dodge')
#goout
gooutcounts<-table(data$goout, data$treat)
gooutperc<-prop.table(gooutcounts,1)
gooutperc
##
## 0 1
## 1 0.6956522 0.3043478
## 2 0.5533981 0.4466019
## 3 0.4230769 0.5769231
## 4 0.4302326 0.5697674
## 5 0.3396226 0.6603774
chisq.test(data$goout,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$goout and data$treat
## X-squared = 12.841, df = 4, p-value = 0.01208
ggplot(data, aes(x=goout, fill=treat2)) + geom_bar(position = 'dodge')
#Dalc
Dalccounts<-table(data$Dalc, data$treat)
Dalcperc<-prop.table(Dalccounts,1)
Dalcperc
##
## 0 1
## 1 0.5000000 0.5000000
## 2 0.4933333 0.5066667
## 3 0.2692308 0.7307692
## 4 0.0000000 1.0000000
## 5 0.1111111 0.8888889
chisq.test(data$Dalc,data$treat)
## Warning in chisq.test(data$Dalc, data$treat): Chi-squared approximation may be
## incorrect
##
## Pearson's Chi-squared test
##
## data: data$Dalc and data$treat
## X-squared = 17.964, df = 4, p-value = 0.001254
ggplot(data, aes(x=Dalc, fill=treat2)) + geom_bar(position = 'dodge')
#Walc
Walccounts<-table(data$Walc, data$treat)
Walcperc<-prop.table(Walccounts,1)
Walcperc
##
## 0 1
## 1 0.5695364 0.4304636
## 2 0.5058824 0.4941176
## 3 0.3625000 0.6375000
## 4 0.3529412 0.6470588
## 5 0.2500000 0.7500000
chisq.test(data$Walc,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$Walc and data$treat
## X-squared = 18.364, df = 4, p-value = 0.001048
ggplot(data, aes(x=Walc, fill=treat2)) + geom_bar(position = 'dodge')
#health
healthcounts<-table(data$health, data$treat)
healthperc<-prop.table(healthcounts,1)
healthperc
##
## 0 1
## 1 0.4680851 0.5319149
## 2 0.3333333 0.6666667
## 3 0.4505495 0.5494505
## 4 0.3939394 0.6060606
## 5 0.5410959 0.4589041
chisq.test(data$health,data$treat)
##
## Pearson's Chi-squared test
##
## data: data$health and data$treat
## X-squared = 7.9513, df = 4, p-value = 0.09338
ggplot(data, aes(x=health, fill=treat2)) + geom_bar(position = 'dodge')
#G3
tapply(data$G3, data$treat, mean)
## 0 1
## 9.748634 10.990566
tapply(data$age, data$treat, sd)
## 0 1
## 1.216912 1.294752
t.test(data$G3 ~ data$treat)
##
## Welch Two Sample t-test
##
## data: data$G3 by data$treat
## t = -2.6088, df = 280.02, p-value = 0.009575
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.179046 -0.304818
## sample estimates:
## mean in group 0 mean in group 1
## 9.748634 10.990566
ggplot(data, aes(x=G3, fill=treat2)) + geom_density(alpha=0.25)
## Next time I would try to operationalize this code as a function that runs each variable without explicitly listing each
Noted patterns on initial exploration:
Student younger than 16.5 more often have <3 absences. After 16.5 more students have \(\geq\) 3 absences than less than.
Address does not seem to have the impact that I originally expected before analyzing.
There may be a greater association with a mother’s education and job status on the absence level than the father’s. This may be partially explained by more students being in the guardianship of their mother (but this would need further exploration because this graphical interpretation is not supported by the Chi-Squared result)
Students selecting school for interest in coursework show fewer students with \(\geq\) 3 absences, where other categories show students with more students with \(\geq\) 3 absences
Travel time also shows patterns different than I expected. Students with greater travel times have similar counts of students in each absence group (<3 and \(\geq\) 3). Students travling the shortest distance show more students with \(\geq\) 3 absences than <3. Though again this is not statistically significant based on Chi-Squared testing).
As initially suspected, student who go out more often and report consuming greater amounts of alcohol generally are more likely to have \(\geq\) 3 absences rather than <3.
Because this is an observational (rather than randomized) study, causation cannot be assigned without accounting for confounding between variables. To emulate a randomized experimental design we want the treatment groups to behave as though they came from the same distribution (i.e. students in each group have similar interactions and assignments of all possible characterstics that impact absences). We accomplish this by assigning inverse probabilty weights/propensity scores to the observations, effectually breaking the impact of confounders between absences and final grades.
Propensity scores up- or down-weight observations based on the propensity (or likelihood) of being assigned to the treatment group.
Assumptions in this analysis are important to acknowledge: 1. Sufficient overlap between observations 2. There are no unknown confounders not accounted for in the analysis
In this analysis, a nonparametric boosting model is used to assign propensity scores to the observations. The benefit of this method is that no assumptions must be met regarding the distribution of variables in the model and variable selection is automatic. In other words, the boosting method can determine the most important characteristics and account for those appropriately. There is also reduced possibility of errors in model fitting compared to parametric methods like logistic regression.
boosted.mod <- ps(treat ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health,
data=data,
estimand = "ATE",
n.trees = 5000,
interaction.depth=2,
perm.test.iters=0,
verbose=FALSE,
stop.method = c("es.mean"))
summary(boosted.mod)
## n.treat n.ctrl ess.treat ess.ctrl max.es mean.es max.ks
## unw 212 183 212.0000 183.0000 0.3383472 0.10068783 0.1633416
## es.mean.ATE 212 183 196.0893 167.0735 0.2052356 0.06013057 0.0768722
## max.ks.p mean.ks iter
## unw NA 0.03704350 NA
## es.mean.ATE NA 0.02195048 1727
summary(boosted.mod$gbm.obj,
n.trees=boosted.mod$desc$es.mean.ATE$n.trees,
plot=FALSE)
## var rel.inf
## Walc Walc 11.99046432
## health health 11.74255811
## famrel famrel 8.81694334
## reason reason 8.46741674
## Dalc Dalc 8.11038236
## Fjob Fjob 6.87361354
## guardian guardian 4.73744587
## age age 4.67030306
## goout goout 4.26779338
## Medu Medu 4.06831347
## studytime studytime 3.83554156
## freetime freetime 3.65336362
## Mjob Mjob 3.54948239
## failures failures 2.90116318
## paid paid 2.59354325
## Pstatus Pstatus 2.23254502
## higher higher 1.38448948
## Fedu Fedu 1.24213066
## internet internet 0.86643004
## schoolsup schoolsup 0.77508373
## activities activities 0.77014634
## romantic romantic 0.75397231
## school school 0.53193651
## traveltime traveltime 0.50330671
## famsup famsup 0.39092021
## nursery nursery 0.17605828
## famsize famsize 0.09465254
## sex sex 0.00000000
## address address 0.00000000
data$boosted <- get.weights(boosted.mod)
## Warning in get.weights(boosted.mod): No stop.method specified. Using es.mean.ATE
hist(data$boosted)
plot(boosted.mod)
plot(boosted.mod, plots=2)
plot(boosted.mod, plots=3)
bal.table(boosted.mod)
## $unw
## tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks
## school:GP 0.887 0.317 0.880 0.325 0.022 0.047 0.829 0.007
## school:MS 0.113 0.317 0.120 0.325 -0.022 NA NA 0.007
## sex:F 0.524 0.499 0.530 0.499 -0.013 0.016 0.898 0.006
## sex:M 0.476 0.499 0.470 0.499 0.013 NA NA 0.006
## age 16.896 1.295 16.464 1.217 0.338 3.418 0.001 0.141
## address:R 0.236 0.425 0.208 0.406 0.068 0.450 0.503 0.028
## address:U 0.764 0.425 0.792 0.406 -0.068 NA NA 0.028
## famsize:GT3 0.689 0.463 0.738 0.440 -0.108 1.147 0.285 0.049
## famsize:LE3 0.311 0.463 0.262 0.440 0.108 NA NA 0.049
## Pstatus:A 0.132 0.339 0.071 0.257 0.200 3.924 0.048 0.061
## Pstatus:T 0.868 0.339 0.929 0.257 -0.200 NA NA 0.061
## Medu:0 0.000 0.000 0.016 0.127 -0.189 1.919 0.105 0.016
## Medu:1 0.123 0.328 0.180 0.384 -0.162 NA NA 0.058
## Medu:2 0.250 0.433 0.273 0.446 -0.053 NA NA 0.023
## Medu:3 0.278 0.448 0.219 0.413 0.138 NA NA 0.060
## Medu:4 0.349 0.477 0.311 0.463 0.080 NA NA 0.038
## Fedu:0 0.005 0.069 0.005 0.074 -0.011 0.269 0.898 0.001
## Fedu:1 0.198 0.399 0.219 0.413 -0.050 NA NA 0.020
## Fedu:2 0.288 0.453 0.295 0.456 -0.016 NA NA 0.007
## Fedu:3 0.274 0.446 0.230 0.421 0.101 NA NA 0.044
## Fedu:4 0.236 0.425 0.251 0.434 -0.036 NA NA 0.016
## Mjob:at_home 0.132 0.339 0.169 0.375 -0.105 0.372 0.829 0.037
## Mjob:health 0.080 0.272 0.093 0.290 -0.045 NA NA 0.013
## Mjob:other 0.363 0.481 0.350 0.477 0.028 NA NA 0.013
## Mjob:services 0.274 0.446 0.246 0.431 0.063 NA NA 0.028
## Mjob:teacher 0.151 0.358 0.142 0.349 0.025 NA NA 0.009
## Fjob:at_home 0.047 0.212 0.055 0.227 -0.034 0.268 0.898 0.007
## Fjob:health 0.042 0.202 0.049 0.216 -0.032 NA NA 0.007
## Fjob:other 0.571 0.495 0.525 0.499 0.093 NA NA 0.046
## Fjob:services 0.264 0.441 0.301 0.458 -0.081 NA NA 0.036
## Fjob:teacher 0.075 0.264 0.071 0.257 0.017 NA NA 0.004
## reason:course 0.307 0.461 0.437 0.496 -0.271 2.495 0.058 0.131
## reason:home 0.292 0.455 0.257 0.437 0.080 NA NA 0.036
## reason:other 0.104 0.305 0.077 0.266 0.095 NA NA 0.027
## reason:reputation 0.297 0.457 0.230 0.421 0.153 NA NA 0.068
## guardian:father 0.198 0.399 0.262 0.440 -0.153 3.808 0.023 0.064
## guardian:mother 0.689 0.463 0.694 0.461 -0.011 NA NA 0.005
## guardian:other 0.113 0.317 0.044 0.204 0.255 NA NA 0.069
## traveltime:1 0.665 0.472 0.634 0.482 0.065 0.235 0.872 0.031
## traveltime:2 0.255 0.436 0.290 0.454 -0.079 NA NA 0.035
## traveltime:3 0.061 0.240 0.055 0.227 0.029 NA NA 0.007
## traveltime:4 0.019 0.136 0.022 0.146 -0.021 NA NA 0.003
## studytime:1 0.288 0.453 0.240 0.427 0.107 0.584 0.626 0.047
## studytime:2 0.481 0.500 0.525 0.499 -0.087 NA NA 0.043
## studytime:3 0.156 0.363 0.175 0.380 -0.052 NA NA 0.019
## studytime:4 0.075 0.264 0.060 0.238 0.061 NA NA 0.015
## failures 0.354 0.730 0.311 0.760 0.057 0.562 0.574 0.056
## schoolsup:no 0.868 0.339 0.874 0.331 -0.019 0.036 0.850 0.006
## schoolsup:yes 0.132 0.339 0.126 0.331 0.019 NA NA 0.006
## famsup:no 0.392 0.488 0.383 0.486 0.018 0.033 0.855 0.009
## famsup:yes 0.608 0.488 0.617 0.486 -0.018 NA NA 0.009
## paid:no 0.528 0.499 0.557 0.497 -0.058 0.334 0.564 0.029
## paid:yes 0.472 0.499 0.443 0.497 0.058 NA NA 0.029
## activities:no 0.476 0.499 0.508 0.500 -0.064 0.396 0.530 0.032
## activities:yes 0.524 0.499 0.492 0.500 0.064 NA NA 0.032
## nursery:no 0.198 0.399 0.213 0.410 -0.037 0.135 0.713 0.015
## nursery:yes 0.802 0.399 0.787 0.410 0.037 NA NA 0.015
## higher:no 0.038 0.191 0.066 0.248 -0.127 1.579 0.210 0.028
## higher:yes 0.962 0.191 0.934 0.248 0.127 NA NA 0.028
## internet:no 0.151 0.358 0.186 0.389 -0.093 0.855 0.356 0.035
## internet:yes 0.849 0.358 0.814 0.389 0.093 NA NA 0.035
## romantic:no 0.646 0.478 0.689 0.463 -0.090 0.788 0.375 0.042
## romantic:yes 0.354 0.478 0.311 0.463 0.090 NA NA 0.042
## famrel:1 0.028 0.166 0.011 0.104 0.123 0.822 0.511 0.017
## famrel:2 0.057 0.231 0.033 0.178 0.114 NA NA 0.024
## famrel:3 0.179 0.384 0.164 0.370 0.041 NA NA 0.015
## famrel:4 0.472 0.499 0.519 0.500 -0.095 NA NA 0.047
## famrel:5 0.264 0.441 0.273 0.446 -0.020 NA NA 0.009
## freetime:1 0.052 0.222 0.044 0.204 0.038 0.786 0.534 0.008
## freetime:2 0.175 0.380 0.148 0.355 0.073 NA NA 0.027
## freetime:3 0.358 0.480 0.443 0.497 -0.172 NA NA 0.084
## freetime:4 0.302 0.459 0.279 0.448 0.051 NA NA 0.023
## freetime:5 0.113 0.317 0.087 0.282 0.085 NA NA 0.026
## goout:1 0.033 0.179 0.087 0.282 -0.232 3.202 0.012 0.054
## goout:2 0.217 0.412 0.311 0.463 -0.215 NA NA 0.094
## goout:3 0.354 0.478 0.301 0.458 0.113 NA NA 0.053
## goout:4 0.231 0.422 0.202 0.402 0.070 NA NA 0.029
## goout:5 0.165 0.371 0.098 0.298 0.196 NA NA 0.067
## Dalc:1 0.651 0.477 0.754 0.431 -0.225 4.503 0.001 0.103
## Dalc:2 0.179 0.384 0.202 0.402 -0.058 NA NA 0.023
## Dalc:3 0.090 0.286 0.038 0.192 0.207 NA NA 0.051
## Dalc:4 0.042 0.202 0.000 0.000 0.285 NA NA 0.042
## Dalc:5 0.038 0.191 0.005 0.074 0.216 NA NA 0.032
## Walc:1 0.307 0.461 0.470 0.499 -0.336 4.579 0.001 0.163
## Walc:2 0.198 0.399 0.235 0.424 -0.090 NA NA 0.037
## Walc:3 0.241 0.427 0.158 0.365 0.204 NA NA 0.082
## Walc:4 0.156 0.363 0.098 0.298 0.171 NA NA 0.057
## Walc:5 0.099 0.299 0.038 0.192 0.237 NA NA 0.061
## health:1 0.118 0.323 0.120 0.325 -0.007 1.983 0.095 0.002
## health:2 0.142 0.349 0.082 0.274 0.187 NA NA 0.060
## health:3 0.236 0.425 0.224 0.417 0.028 NA NA 0.012
## health:4 0.189 0.391 0.142 0.349 0.125 NA NA 0.047
## health:5 0.316 0.465 0.432 0.495 -0.240 NA NA 0.116
## ks.pval
## school:GP 0.829
## school:MS 0.829
## sex:F 0.898
## sex:M 0.898
## age 0.036
## address:R 0.503
## address:U 0.503
## famsize:GT3 0.285
## famsize:LE3 0.285
## Pstatus:A 0.048
## Pstatus:T 0.048
## Medu:0 0.105
## Medu:1 0.105
## Medu:2 0.105
## Medu:3 0.105
## Medu:4 0.105
## Fedu:0 0.898
## Fedu:1 0.898
## Fedu:2 0.898
## Fedu:3 0.898
## Fedu:4 0.898
## Mjob:at_home 0.829
## Mjob:health 0.829
## Mjob:other 0.829
## Mjob:services 0.829
## Mjob:teacher 0.829
## Fjob:at_home 0.898
## Fjob:health 0.898
## Fjob:other 0.898
## Fjob:services 0.898
## Fjob:teacher 0.898
## reason:course 0.058
## reason:home 0.058
## reason:other 0.058
## reason:reputation 0.058
## guardian:father 0.023
## guardian:mother 0.023
## guardian:other 0.023
## traveltime:1 0.872
## traveltime:2 0.872
## traveltime:3 0.872
## traveltime:4 0.872
## studytime:1 0.626
## studytime:2 0.626
## studytime:3 0.626
## studytime:4 0.626
## failures 0.900
## schoolsup:no 0.850
## schoolsup:yes 0.850
## famsup:no 0.855
## famsup:yes 0.855
## paid:no 0.564
## paid:yes 0.564
## activities:no 0.530
## activities:yes 0.530
## nursery:no 0.713
## nursery:yes 0.713
## higher:no 0.210
## higher:yes 0.210
## internet:no 0.356
## internet:yes 0.356
## romantic:no 0.375
## romantic:yes 0.375
## famrel:1 0.511
## famrel:2 0.511
## famrel:3 0.511
## famrel:4 0.511
## famrel:5 0.511
## freetime:1 0.534
## freetime:2 0.534
## freetime:3 0.534
## freetime:4 0.534
## freetime:5 0.534
## goout:1 0.012
## goout:2 0.012
## goout:3 0.012
## goout:4 0.012
## goout:5 0.012
## Dalc:1 0.001
## Dalc:2 0.001
## Dalc:3 0.001
## Dalc:4 0.001
## Dalc:5 0.001
## Walc:1 0.001
## Walc:2 0.001
## Walc:3 0.001
## Walc:4 0.001
## Walc:5 0.001
## health:1 0.095
## health:2 0.095
## health:3 0.095
## health:4 0.095
## health:5 0.095
##
## $es.mean.ATE
## tx.mn tx.sd ct.mn ct.sd std.eff.sz stat p ks
## school:GP 0.893 0.309 0.883 0.321 0.032 0.098 0.755 0.010
## school:MS 0.107 0.309 0.117 0.321 -0.032 NA NA 0.010
## sex:F 0.537 0.499 0.529 0.499 0.016 0.022 0.881 0.008
## sex:M 0.463 0.499 0.471 0.499 -0.016 NA NA 0.008
## age 16.801 1.270 16.598 1.220 0.159 1.567 0.118 0.066
## address:R 0.227 0.419 0.201 0.401 0.062 0.364 0.546 0.026
## address:U 0.773 0.419 0.799 0.401 -0.062 NA NA 0.026
## famsize:GT3 0.693 0.461 0.737 0.440 -0.098 0.873 0.351 0.044
## famsize:LE3 0.307 0.461 0.263 0.440 0.098 NA NA 0.044
## Pstatus:A 0.119 0.323 0.080 0.271 0.127 1.468 0.226 0.039
## Pstatus:T 0.881 0.323 0.920 0.271 -0.127 NA NA 0.039
## Medu:0 0.000 0.000 0.015 0.120 -0.168 0.919 0.452 0.015
## Medu:1 0.137 0.344 0.157 0.364 -0.057 NA NA 0.020
## Medu:2 0.274 0.446 0.294 0.456 -0.045 NA NA 0.020
## Medu:3 0.257 0.437 0.235 0.424 0.051 NA NA 0.022
## Medu:4 0.331 0.471 0.299 0.458 0.069 NA NA 0.032
## Fedu:0 0.006 0.079 0.004 0.066 0.028 0.126 0.973 0.002
## Fedu:1 0.207 0.405 0.206 0.404 0.005 NA NA 0.002
## Fedu:2 0.300 0.458 0.319 0.466 -0.042 NA NA 0.019
## Fedu:3 0.265 0.441 0.238 0.426 0.062 NA NA 0.027
## Fedu:4 0.221 0.415 0.233 0.423 -0.028 NA NA 0.012
## Mjob:at_home 0.138 0.345 0.156 0.363 -0.051 0.180 0.948 0.018
## Mjob:health 0.073 0.261 0.090 0.286 -0.060 NA NA 0.017
## Mjob:other 0.377 0.485 0.354 0.478 0.048 NA NA 0.023
## Mjob:services 0.271 0.445 0.259 0.438 0.028 NA NA 0.012
## Mjob:teacher 0.140 0.347 0.140 0.347 0.000 NA NA 0.000
## Fjob:at_home 0.047 0.212 0.050 0.218 -0.013 0.068 0.991 0.003
## Fjob:health 0.040 0.195 0.046 0.210 -0.031 NA NA 0.007
## Fjob:other 0.588 0.492 0.571 0.495 0.034 NA NA 0.017
## Fjob:services 0.262 0.440 0.275 0.447 -0.028 NA NA 0.013
## Fjob:teacher 0.063 0.243 0.058 0.233 0.021 NA NA 0.005
## reason:course 0.330 0.470 0.395 0.489 -0.135 0.587 0.623 0.065
## reason:home 0.288 0.453 0.268 0.443 0.047 NA NA 0.021
## reason:other 0.097 0.296 0.078 0.268 0.067 NA NA 0.019
## reason:reputation 0.284 0.451 0.259 0.438 0.056 NA NA 0.025
## guardian:father 0.212 0.409 0.247 0.431 -0.083 1.309 0.271 0.035
## guardian:mother 0.695 0.460 0.702 0.457 -0.015 NA NA 0.007
## guardian:other 0.092 0.289 0.051 0.219 0.152 NA NA 0.041
## traveltime:1 0.659 0.474 0.635 0.481 0.051 0.165 0.916 0.024
## traveltime:2 0.269 0.443 0.294 0.456 -0.056 NA NA 0.025
## traveltime:3 0.058 0.234 0.053 0.225 0.021 NA NA 0.005
## traveltime:4 0.013 0.115 0.018 0.133 -0.032 NA NA 0.005
## studytime:1 0.273 0.446 0.228 0.420 0.102 0.381 0.767 0.045
## studytime:2 0.493 0.500 0.537 0.499 -0.087 NA NA 0.044
## studytime:3 0.159 0.366 0.166 0.372 -0.019 NA NA 0.007
## studytime:4 0.075 0.263 0.069 0.253 0.023 NA NA 0.006
## failures 0.325 0.706 0.297 0.735 0.038 0.383 0.702 0.041
## schoolsup:no 0.867 0.339 0.884 0.320 -0.050 0.250 0.618 0.017
## schoolsup:yes 0.133 0.339 0.116 0.320 0.050 NA NA 0.017
## famsup:no 0.395 0.489 0.381 0.486 0.028 0.069 0.793 0.013
## famsup:yes 0.605 0.489 0.619 0.486 -0.028 NA NA 0.013
## paid:no 0.530 0.499 0.540 0.498 -0.019 0.032 0.857 0.009
## paid:yes 0.470 0.499 0.460 0.498 0.019 NA NA 0.009
## activities:no 0.486 0.500 0.517 0.500 -0.063 0.356 0.551 0.031
## activities:yes 0.514 0.500 0.483 0.500 0.063 NA NA 0.031
## nursery:no 0.203 0.402 0.210 0.407 -0.017 0.026 0.873 0.007
## nursery:yes 0.797 0.402 0.790 0.407 0.017 NA NA 0.007
## higher:no 0.043 0.204 0.061 0.240 -0.081 0.563 0.454 0.018
## higher:yes 0.957 0.204 0.939 0.240 0.081 NA NA 0.018
## internet:no 0.152 0.359 0.176 0.380 -0.064 0.400 0.527 0.024
## internet:yes 0.848 0.359 0.824 0.380 0.064 NA NA 0.024
## romantic:no 0.652 0.476 0.691 0.462 -0.083 0.625 0.430 0.039
## romantic:yes 0.348 0.476 0.309 0.462 0.083 NA NA 0.039
## famrel:1 0.025 0.157 0.012 0.109 0.093 0.388 0.815 0.013
## famrel:2 0.045 0.208 0.030 0.170 0.074 NA NA 0.015
## famrel:3 0.183 0.387 0.177 0.382 0.014 NA NA 0.005
## famrel:4 0.474 0.499 0.503 0.500 -0.058 NA NA 0.029
## famrel:5 0.273 0.445 0.277 0.448 -0.011 NA NA 0.005
## freetime:1 0.050 0.218 0.043 0.203 0.033 0.335 0.854 0.007
## freetime:2 0.161 0.368 0.145 0.353 0.042 NA NA 0.016
## freetime:3 0.383 0.486 0.438 0.496 -0.112 NA NA 0.055
## freetime:4 0.299 0.458 0.287 0.452 0.026 NA NA 0.012
## freetime:5 0.107 0.309 0.087 0.282 0.067 NA NA 0.020
## goout:1 0.039 0.193 0.070 0.255 -0.133 0.794 0.529 0.031
## goout:2 0.245 0.430 0.276 0.447 -0.070 NA NA 0.031
## goout:3 0.348 0.476 0.329 0.470 0.040 NA NA 0.019
## goout:4 0.224 0.417 0.220 0.414 0.008 NA NA 0.003
## goout:5 0.144 0.351 0.105 0.306 0.116 NA NA 0.039
## Dalc:1 0.694 0.461 0.728 0.445 -0.072 1.377 0.246 0.033
## Dalc:2 0.178 0.382 0.203 0.402 -0.064 NA NA 0.025
## Dalc:3 0.071 0.256 0.052 0.222 0.076 NA NA 0.019
## Dalc:4 0.031 0.172 0.000 0.000 0.205 NA NA 0.031
## Dalc:5 0.026 0.161 0.018 0.131 0.060 NA NA 0.009
## Walc:1 0.347 0.476 0.424 0.494 -0.158 0.817 0.512 0.077
## Walc:2 0.214 0.410 0.230 0.421 -0.038 NA NA 0.016
## Walc:3 0.223 0.416 0.183 0.387 0.100 NA NA 0.040
## Walc:4 0.138 0.345 0.109 0.312 0.086 NA NA 0.029
## Walc:5 0.077 0.267 0.053 0.225 0.092 NA NA 0.024
## health:1 0.116 0.320 0.118 0.322 -0.007 0.362 0.836 0.002
## health:2 0.130 0.336 0.098 0.297 0.101 NA NA 0.032
## health:3 0.232 0.422 0.219 0.413 0.031 NA NA 0.013
## health:4 0.177 0.382 0.172 0.377 0.015 NA NA 0.006
## health:5 0.345 0.475 0.394 0.489 -0.101 NA NA 0.049
## ks.pval
## school:GP 0.755
## school:MS 0.755
## sex:F 0.881
## sex:M 0.881
## age 0.792
## address:R 0.546
## address:U 0.546
## famsize:GT3 0.351
## famsize:LE3 0.351
## Pstatus:A 0.226
## Pstatus:T 0.226
## Medu:0 0.452
## Medu:1 0.452
## Medu:2 0.452
## Medu:3 0.452
## Medu:4 0.452
## Fedu:0 0.973
## Fedu:1 0.973
## Fedu:2 0.973
## Fedu:3 0.973
## Fedu:4 0.973
## Mjob:at_home 0.948
## Mjob:health 0.948
## Mjob:other 0.948
## Mjob:services 0.948
## Mjob:teacher 0.948
## Fjob:at_home 0.991
## Fjob:health 0.991
## Fjob:other 0.991
## Fjob:services 0.991
## Fjob:teacher 0.991
## reason:course 0.623
## reason:home 0.623
## reason:other 0.623
## reason:reputation 0.623
## guardian:father 0.271
## guardian:mother 0.271
## guardian:other 0.271
## traveltime:1 0.916
## traveltime:2 0.916
## traveltime:3 0.916
## traveltime:4 0.916
## studytime:1 0.767
## studytime:2 0.767
## studytime:3 0.767
## studytime:4 0.767
## failures 0.996
## schoolsup:no 0.618
## schoolsup:yes 0.618
## famsup:no 0.793
## famsup:yes 0.793
## paid:no 0.857
## paid:yes 0.857
## activities:no 0.551
## activities:yes 0.551
## nursery:no 0.873
## nursery:yes 0.873
## higher:no 0.454
## higher:yes 0.454
## internet:no 0.527
## internet:yes 0.527
## romantic:no 0.430
## romantic:yes 0.430
## famrel:1 0.815
## famrel:2 0.815
## famrel:3 0.815
## famrel:4 0.815
## famrel:5 0.815
## freetime:1 0.854
## freetime:2 0.854
## freetime:3 0.854
## freetime:4 0.854
## freetime:5 0.854
## goout:1 0.529
## goout:2 0.529
## goout:3 0.529
## goout:4 0.529
## goout:5 0.529
## Dalc:1 0.246
## Dalc:2 0.246
## Dalc:3 0.246
## Dalc:4 0.246
## Dalc:5 0.246
## Walc:1 0.512
## Walc:2 0.512
## Walc:3 0.512
## Walc:4 0.512
## Walc:5 0.512
## health:1 0.836
## health:2 0.836
## health:3 0.836
## health:4 0.836
## health:5 0.836
The variables with highest relative influence on the model are:
Walc
health
famrel
reason
Dalc
Overall the model appears to have done a sufficient job creating balance between the absence groups using propensity scores:
Approximately 1800 iterations were needed to achieve maximum balance bewteen the absence groups.
There is moderate overlab between the propensity score assignments
This is where I would have used available information to attempt to update a visual representation in the DAG if the code was working properly.
library(survey)
design <- svydesign(ids=~1, weights=~boosted, data=data)
glm1 <- svyglm(G3 ~ treat, design=design)
summary(glm1)
##
## Call:
## svyglm(formula = G3 ~ treat, design = design)
##
## Survey design:
## svydesign(ids = ~1, weights = ~boosted, data = data)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6554 0.4379 22.051 < 2e-16 ***
## treat 1.2882 0.4937 2.609 0.00941 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 21.06651)
##
## Number of Fisher Scoring iterations: 2
summary(lm(G3 ~ treat + school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health, data=data))
##
## Call:
## lm(formula = G3 ~ treat + school + sex + age + address + famsize +
## Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian +
## traveltime + studytime + failures + schoolsup + famsup +
## paid + activities + nursery + higher + internet + romantic +
## famrel + freetime + goout + Dalc + Walc + health, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1355 -1.8760 0.2887 2.5208 8.1363
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.151948 6.144831 3.279 0.00115 **
## treat 1.502487 0.460329 3.264 0.00121 **
## schoolMS 0.532316 0.807679 0.659 0.51031
## sexM 1.131111 0.508792 2.223 0.02689 *
## age -0.362450 0.219431 -1.652 0.09954 .
## addressU 0.487091 0.600682 0.811 0.41801
## famsizeLE3 0.928643 0.502577 1.848 0.06554 .
## PstatusT -0.128065 0.744580 -0.172 0.86355
## Medu1 -5.544647 2.545040 -2.179 0.03007 *
## Medu2 -5.086842 2.547288 -1.997 0.04666 *
## Medu3 -4.575419 2.568162 -1.782 0.07574 .
## Medu4 -3.458266 2.651455 -1.304 0.19305
## Fedu1 -1.561613 3.049864 -0.512 0.60898
## Fedu2 -2.136014 3.054245 -0.699 0.48482
## Fedu3 -2.031520 3.059328 -0.664 0.50713
## Fedu4 -1.911090 3.109291 -0.615 0.53922
## Mjobhealth 0.258299 1.153241 0.224 0.82292
## Mjobother -0.372113 0.719372 -0.517 0.60531
## Mjobservices 0.536011 0.812412 0.660 0.50986
## Mjobteacher -1.822923 1.077738 -1.691 0.09171 .
## Fjobhealth 0.734201 1.454764 0.505 0.61412
## Fjobother -0.015496 1.040852 -0.015 0.98813
## Fjobservices 0.226038 1.071624 0.211 0.83307
## Fjobteacher 1.409956 1.339151 1.053 0.29318
## reasonhome 0.327844 0.559582 0.586 0.55837
## reasonother 0.812056 0.820637 0.990 0.32313
## reasonreputation 0.706095 0.580919 1.215 0.22506
## guardianmother -0.133738 0.551167 -0.243 0.80843
## guardianother 0.353286 1.008572 0.350 0.72635
## traveltime2 -0.496755 0.517630 -0.960 0.33793
## traveltime3 0.436326 1.007957 0.433 0.66539
## traveltime4 -0.617111 1.684344 -0.366 0.71432
## studytime2 0.633603 0.561360 1.129 0.25985
## studytime3 1.810980 0.776981 2.331 0.02037 *
## studytime4 0.550546 1.003203 0.549 0.58353
## failures -1.759718 0.336435 -5.230 3.02e-07 ***
## schoolsupyes -1.137042 0.675290 -1.684 0.09318 .
## famsupyes -0.835687 0.480092 -1.741 0.08268 .
## paidyes 0.345747 0.491419 0.704 0.48220
## activitiesyes -0.550787 0.452164 -1.218 0.22406
## nurseryyes 0.002477 0.561081 0.004 0.99648
## higheryes 0.862466 1.104274 0.781 0.43535
## internetyes 0.665971 0.622520 1.070 0.28550
## romanticyes -1.264604 0.480198 -2.634 0.00885 **
## famrel2 0.634126 1.862254 0.341 0.73369
## famrel3 0.541825 1.629220 0.333 0.73967
## famrel4 0.929551 1.582494 0.587 0.55734
## famrel5 0.963462 1.608358 0.599 0.54956
## freetime2 1.501133 1.142114 1.314 0.18965
## freetime3 0.211635 1.082754 0.195 0.84515
## freetime4 0.925131 1.122519 0.824 0.41045
## freetime5 2.733815 1.279865 2.136 0.03342 *
## goout2 1.123072 1.003618 1.119 0.26395
## goout3 0.447399 1.003043 0.446 0.65586
## goout4 -0.689451 1.049833 -0.657 0.51182
## goout5 -1.317822 1.141852 -1.154 0.24930
## Dalc2 -1.138735 0.662004 -1.720 0.08635 .
## Dalc3 -0.858622 1.048376 -0.819 0.41338
## Dalc4 -2.936450 1.623896 -1.808 0.07148 .
## Dalc5 -1.539840 1.872343 -0.822 0.41144
## Walc2 -0.529700 0.621374 -0.852 0.39458
## Walc3 0.617525 0.700787 0.881 0.37886
## Walc4 -0.007190 0.900554 -0.008 0.99363
## Walc5 2.364031 1.344224 1.759 0.07957 .
## health2 -1.991989 0.918020 -2.170 0.03074 *
## health3 -1.335354 0.808179 -1.652 0.09943 .
## health4 -1.422606 0.847843 -1.678 0.09432 .
## health5 -1.251048 0.755965 -1.655 0.09890 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.01 on 327 degrees of freedom
## Multiple R-squared: 0.3641, Adjusted R-squared: 0.2338
## F-statistic: 2.795 on 67 and 327 DF, p-value: 7.803e-10
The final analysis shows a causal relationship between \(\geq\) 3 absences on final math scores. This is true both with causal design (with propensity scores) and without (linear model)
Interesting the variables with highest relative importance in the boosting model to assign weights are not aligned with the variables assigned signficance in the linear model.