Assignment/Background

The Ann Arbor Public Schools Board of Education (AAPS) would like to understand the effect that student absences have on mathematical educational performance. As a preliminary analysis, they would like to examine the Student Performance data set from the UCI Machine Learning Repository on student secondary educational achievement. This work will be used to inform further research. AAPS would like you to analyze these data to assess the impact of three or more absences versus less than three on math final grade. Additionally, they would like to identify student attributes that possibly contribution absences.

Data and Data Dictionary

Data and Data Dictionare are available at: http://archive.ics.uci.edu/ml/machine-learning-databases/00320/

Variables/Data Dictionary:

  1. school - student’s school (binary: “GP” - Gabriel Pereira or “MS” - Mousinho da Silveira)

  2. sex - student’s sex (binary: “F” - female or “M” - male)

  3. age - student’s age (numeric: from 15 to 22)

  4. address - student’s home address type (binary: “U” - urban or “R” - rural)

  5. famsize - family size (binary: “LE3” - less or equal to 3 or “GT3” - greater than 3)

  6. Pstatus - parent’s cohabitation status (binary: “T” - living together or “A” - apart)

  7. Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)

  8. Fedu - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)

  9. Mjob - mother’s job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)

  10. Fjob - father’s job (nominal: “teacher”, “health” care related, civil “services” (e.g. administrative or police), “at_home” or “other”)

  11. reason - reason to choose this school (nominal: close to “home”, school “reputation”, “course” preference or “other”)

  12. guardian - student’s guardian (nominal: “mother”, “father” or “other”)

  13. traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)

  14. studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)

  15. failures - number of past class failures (numeric: n if 1<=n<3, else 4)

  16. schoolsup - extra educational support (binary: yes or no)

  17. famsup - family educational support (binary: yes or no)

  18. paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)

  19. activities - extra-curricular activities (binary: yes or no)

  20. nursery - attended nursery school (binary: yes or no)

  21. higher - wants to take higher education (binary: yes or no)

  22. internet - Internet access at home (binary: yes or no)

  23. romantic - with a romantic relationship (binary: yes or no)

  24. famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

  25. freetime - free time after school (numeric: from 1 - very low to 5 - very high)

  26. goout - going out with friends (numeric: from 1 - very low to 5 - very high)

  27. Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

  28. Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

  29. health - current health status (numeric: from 1 - very bad to 5 - very good)

  30. absences - number of school absences (numeric: from 0 to 93)

  31. G1 - first period grade (numeric: from 0 to 20)

  32. G2 - second period grade (numeric: from 0 to 20)

  33. G3 - final grade (numeric: from 0 to 20, output target)

setwd("C:/Users/sweeneys/Desktop/")
data=read.csv("student-mat.csv",sep=",",header=TRUE)
head(data)
##   school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob     reason
## 1     GP   F  18       U     GT3       A    4    4  at_home  teacher     course
## 2     GP   F  17       U     GT3       T    1    1  at_home    other     course
## 3     GP   F  15       U     LE3       T    1    1  at_home    other      other
## 4     GP   F  15       U     GT3       T    4    2   health services       home
## 5     GP   F  16       U     GT3       T    3    3    other    other       home
## 6     GP   M  16       U     LE3       T    4    3 services    other reputation
##   guardian traveltime studytime failures schoolsup famsup paid activities
## 1   mother          2         2        0       yes     no   no         no
## 2   father          1         2        0        no    yes   no         no
## 3   mother          1         2        3       yes     no  yes         no
## 4   mother          1         3        0        no    yes  yes        yes
## 5   father          1         2        0        no    yes  yes         no
## 6   mother          1         2        0        no    yes  yes        yes
##   nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1     yes    yes       no       no      4        3     4    1    1      3
## 2      no    yes      yes       no      5        3     3    1    1      3
## 3     yes    yes      yes       no      4        3     2    2    3      3
## 4     yes    yes      yes      yes      3        2     2    1    1      5
## 5     yes    yes       no       no      4        3     2    1    2      5
## 6     yes    yes      yes       no      5        4     2    1    2      5
##   absences G1 G2 G3
## 1        6  5  6  6
## 2        4  5  5  6
## 3       10  7  8 10
## 4        2 15 14 15
## 5        4  6 10 10
## 6       10 15 15 15
dim(data)
## [1] 395  33
# Correcting appropriate variables to factors
data$Medu<-as.factor(data$Medu)
data$Fedu<-as.factor(data$Fedu)
data$traveltime<-as.factor(data$traveltime)
data$studytime<-as.factor(data$studytime)
data$famrel<-as.factor(data$famrel)
data$freetime<-as.factor(data$freetime)
data$goout<-as.factor(data$goout)
data$Dalc<-as.factor(data$Dalc)
data$Walc<-as.factor(data$Walc)
data$health<-as.factor(data$health)

#Assign Treatment Groups
data$treat <- ifelse(data$absences < 3, 0,1)

# Check for missing data
dat = data[complete.cases(data),]
head(dat)
##   school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob     reason
## 1     GP   F  18       U     GT3       A    4    4  at_home  teacher     course
## 2     GP   F  17       U     GT3       T    1    1  at_home    other     course
## 3     GP   F  15       U     LE3       T    1    1  at_home    other      other
## 4     GP   F  15       U     GT3       T    4    2   health services       home
## 5     GP   F  16       U     GT3       T    3    3    other    other       home
## 6     GP   M  16       U     LE3       T    4    3 services    other reputation
##   guardian traveltime studytime failures schoolsup famsup paid activities
## 1   mother          2         2        0       yes     no   no         no
## 2   father          1         2        0        no    yes   no         no
## 3   mother          1         2        3       yes     no  yes         no
## 4   mother          1         3        0        no    yes  yes        yes
## 5   father          1         2        0        no    yes  yes         no
## 6   mother          1         2        0        no    yes  yes        yes
##   nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1     yes    yes       no       no      4        3     4    1    1      3
## 2      no    yes      yes       no      5        3     3    1    1      3
## 3     yes    yes      yes       no      4        3     2    2    3      3
## 4     yes    yes      yes      yes      3        2     2    1    1      5
## 5     yes    yes       no       no      4        3     2    1    2      5
## 6     yes    yes      yes       no      5        4     2    1    2      5
##   absences G1 G2 G3 treat
## 1        6  5  6  6     1
## 2        4  5  5  6     1
## 3       10  7  8 10     1
## 4        2 15 14 15     0
## 5        4  6 10 10     1
## 6       10 15 15 15     1
dim(dat)
## [1] 395  34
# No Missing Data in the dataset

Initial Data Exploration

#Review of Grade Distribution
summary(data)
##  school   sex          age       address famsize   Pstatus Medu    Fedu   
##  GP:349   F:208   Min.   :15.0   R: 88   GT3:281   A: 41   0:  3   0:  2  
##  MS: 46   M:187   1st Qu.:16.0   U:307   LE3:114   T:354   1: 59   1: 82  
##                   Median :17.0                             2:103   2:115  
##                   Mean   :16.7                             3: 99   3:100  
##                   3rd Qu.:18.0                             4:131   4: 96  
##                   Max.   :22.0                                            
##        Mjob           Fjob            reason      guardian   traveltime
##  at_home : 59   at_home : 20   course    :145   father: 90   1:257     
##  health  : 34   health  : 18   home      :109   mother:273   2:107     
##  other   :141   other   :217   other     : 36   other : 32   3: 23     
##  services:103   services:111   reputation:105                4:  8     
##  teacher : 58   teacher : 29                                           
##                                                                        
##  studytime    failures      schoolsup famsup     paid     activities nursery  
##  1:105     Min.   :0.0000   no :344   no :153   no :214   no :194    no : 81  
##  2:198     1st Qu.:0.0000   yes: 51   yes:242   yes:181   yes:201    yes:314  
##  3: 65     Median :0.0000                                                     
##  4: 27     Mean   :0.3342                                                     
##            3rd Qu.:0.0000                                                     
##            Max.   :3.0000                                                     
##  higher    internet  romantic  famrel  freetime goout   Dalc    Walc    health 
##  no : 20   no : 66   no :263   1:  8   1: 19    1: 23   1:276   1:151   1: 47  
##  yes:375   yes:329   yes:132   2: 18   2: 64    2:103   2: 75   2: 85   2: 45  
##                                3: 68   3:157    3:130   3: 26   3: 80   3: 91  
##                                4:195   4:115    4: 86   4:  9   4: 51   4: 66  
##                                5:106   5: 40    5: 53   5:  9   5: 28   5:146  
##                                                                                
##     absences            G1              G2              G3       
##  Min.   : 0.000   Min.   : 3.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.: 8.00   1st Qu.: 9.00   1st Qu.: 8.00  
##  Median : 4.000   Median :11.00   Median :11.00   Median :11.00  
##  Mean   : 5.709   Mean   :10.91   Mean   :10.71   Mean   :10.42  
##  3rd Qu.: 8.000   3rd Qu.:13.00   3rd Qu.:13.00   3rd Qu.:14.00  
##  Max.   :75.000   Max.   :19.00   Max.   :19.00   Max.   :20.00  
##      treat       
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :1.0000  
##  Mean   :0.5367  
##  3rd Qu.:1.0000  
##  Max.   :1.0000
hist(data$G3)

# Review of Absences
hist(data$absences)

Baysian Network/DAG for Causal Pathways

Attempt to create a DAG using initial beliefs about where causal relationships may exist to graphically review possible confounder/colliders.

#digraph G {
#  Address -> Travel Time
#  Address -> Reason
#  Address -> School
#  Address -> Nursery School
#  Address -> Internet
#  Address -> Going Out
#  Family Size -> Address
#  Family Size -> Family Support
#  Family Size -> Family Relationship
#  Parent Living Arrangement -> Family Support
#  Parent Living Arrangement -> Guardian
#  Parent Living Arrangement -> Family Relationship
#  Parent Education -> Parent Job
#  Parent Education -> Family Support
#  Parent Education -> Higher Education Plans
#  Parent Job -> Address
#  Parent Job -> Family Size
#  Parent Job -> Guardian
#  Parent Job -> Family Support
#  Parent Job -> Paid Course
#  Parent Job -> Higher Education Plans
#  Parent Job -> Internet at Home
#  Parent Job -> Family Relationship
#  Parent Job -> Going Out
#  Travel Time -> Absences
#  Travel Time -> Study Time
#  Travel Time -> Activities
#  Travel Time -> Free Time
#  Study Time -> Failures
#  Study Time -> Activities
#  Study Time -> Romantic Relationship
#  Study Time -> Free Time
#  Study Time -> Final Grade
#  Failures -> Absences
#  Failures -> School Support
#  Failures -> Activities
#  Failures -> Higher Education Plans
#  School Support -> Failures
#  School Support -> Activities
#  School Support -> Free Time
#  School Support -> Final Grade
#  Family Support -> Failures
#  Family Support -> Paid Course
#  Family Support -> Nursery School
#  Family Support -> Higher Education Plans
#  Family Support -> Family Relationship
#  Paid Course -> Failures
#  Paid Course -> Free Time
#  Paid Course -> Final Grade
#  Paid course -> Study Time
#  Paid Coures -> Activities
#  Activities -> Study Time
#  Activities -> Free Time
#  Activities -> Health
#  Higher Education Plans -> Study Time
#  Higher Education Plans -> Activities
#  Internet -> Study Time
#  Internet -> Going Out
#  Romantic Relationship -> Free Time
#  Romantic Relationship -> Going Out
#  Family Relationship -> Family Support
#  Family Relationship -> Going Out
#  Free Time -> Study Time
#  Free Time -> Activities
#  Free Time -> Going Out
#  Going Out -> Free Time
#  Going Out -> Alcohol Consumption
#  Alcohol Consumption -> Health
#  Health -> Absences
#  Absences -> Final Grade
#}

Graphical Review of Differences between students with >3 and \(\geq\) 3 absences (i.e. treatment vs. non-treatment groups)

data$treat2 <- ifelse(data$treat == 1, TRUE, FALSE)

#school
schoolcounts<-table(data$school, data$treat)
schoolcounts
##     
##        0   1
##   GP 161 188
##   MS  22  24
schoolperc<-prop.table(schoolcounts,1)
schoolperc
##     
##              0         1
##   GP 0.4613181 0.5386819
##   MS 0.4782609 0.5217391
chisq.test(data$school,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$school and data$treat
## X-squared = 0.00352, df = 1, p-value = 0.9527
ggplot(data, aes(x=school, fill=treat2)) +  geom_bar(position = 'dodge')

#sex
sexcounts<-table(data$sex, data$treat)
sexperc<-prop.table(sexcounts,1)
sexperc
##    
##             0         1
##   F 0.4663462 0.5336538
##   M 0.4598930 0.5401070
chisq.test(data$sex,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$sex and data$treat
## X-squared = 0.00074923, df = 1, p-value = 0.9782
ggplot(data, aes(x=sex, fill=treat2)) +  geom_bar(position = 'dodge')

#age
tapply(data$age, data$treat, mean)
##        0        1 
## 16.46448 16.89623
tapply(data$age, data$treat, sd)
##        0        1 
## 1.216912 1.294752
t.test(data$age ~ data$treat)
## 
##  Welch Two Sample t-test
## 
## data:  data$age by data$treat
## t = -3.4133, df = 390.14, p-value = 0.0007091
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6804325 -0.1830585
## sample estimates:
## mean in group 0 mean in group 1 
##        16.46448        16.89623
ggplot(data, aes(x=age, fill=treat2)) +  geom_density(alpha=0.25)

#address
addresscounts<-table(data$address, data$treat)
addressperc<-prop.table(addresscounts,1)
addressperc
##    
##             0         1
##   R 0.4318182 0.5681818
##   U 0.4723127 0.5276873
chisq.test(data$address,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$address and data$treat
## X-squared = 0.30289, df = 1, p-value = 0.5821
ggplot(data, aes(x=address, fill=treat2)) +  geom_bar(position = 'dodge')

#famsize
famsizecounts<-table(data$famsize, data$treat)
famsizeperc<-prop.table(famsizecounts,1)
famsizeperc
##      
##               0         1
##   GT3 0.4804270 0.5195730
##   LE3 0.4210526 0.5789474
chisq.test(data$famsize,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$famsize and data$treat
## X-squared = 0.92341, df = 1, p-value = 0.3366
ggplot(data, aes(x=famsize, fill=treat2)) +  geom_bar(position = 'dodge')

#Pstatus
Pstatuscounts<-table(data$Pstatus, data$treat)
Pstatusperc<-prop.table(Pstatuscounts,1)
Pstatusperc
##    
##             0         1
##   A 0.3170732 0.6829268
##   T 0.4802260 0.5197740
chisq.test(data$Pstatus,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$Pstatus and data$treat
## X-squared = 3.3048, df = 1, p-value = 0.06908
ggplot(data, aes(x=Pstatus, fill=treat2)) +  geom_bar(position = 'dodge')

#Medu
Meducounts<-table(data$Medu, data$treat)
Meduperc<-prop.table(Meducounts,1)
Meduperc
##    
##             0         1
##   0 1.0000000 0.0000000
##   1 0.5593220 0.4406780
##   2 0.4854369 0.5145631
##   3 0.4040404 0.5959596
##   4 0.4351145 0.5648855
chisq.test(data$Medu,data$treat)
## Warning in chisq.test(data$Medu, data$treat): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$Medu and data$treat
## X-squared = 7.6828, df = 4, p-value = 0.1039
ggplot(data, aes(x=Medu, fill=treat2)) +  geom_bar(position = 'dodge')

#Fedu
Feducounts<-table(data$Fedu, data$treat)
Feduperc<-prop.table(Feducounts,1)
Feduperc
##    
##             0         1
##   0 0.5000000 0.5000000
##   1 0.4878049 0.5121951
##   2 0.4695652 0.5304348
##   3 0.4200000 0.5800000
##   4 0.4791667 0.5208333
chisq.test(data$Fedu,data$treat)
## Warning in chisq.test(data$Fedu, data$treat): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$Fedu and data$treat
## X-squared = 1.0782, df = 4, p-value = 0.8977
ggplot(data, aes(x=Fedu, fill=treat2)) +  geom_bar(position = 'dodge')

#Mjob
Mjobcounts<-table(data$Mjob, data$treat)
Mjobperc<-prop.table(Mjobcounts,1)
Mjobperc
##           
##                    0         1
##   at_home  0.5254237 0.4745763
##   health   0.5000000 0.5000000
##   other    0.4539007 0.5460993
##   services 0.4368932 0.5631068
##   teacher  0.4482759 0.5517241
chisq.test(data$Mjob,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$Mjob and data$treat
## X-squared = 1.4915, df = 4, p-value = 0.8281
ggplot(data, aes(x=Mjob, fill=treat2)) +  geom_bar(position = 'dodge')

#Fjob
Fjobcounts<-table(data$Fjob, data$treat)
Fjobperc<-prop.table(Fjobcounts,1)
Fjobperc
##           
##                    0         1
##   at_home  0.5000000 0.5000000
##   health   0.5000000 0.5000000
##   other    0.4423963 0.5576037
##   services 0.4954955 0.5045045
##   teacher  0.4482759 0.5517241
chisq.test(data$Fjob,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$Fjob and data$treat
## X-squared = 1.0762, df = 4, p-value = 0.898
ggplot(data, aes(x=Fjob, fill=treat2)) +  geom_bar(position = 'dodge')

#reason
reasoncounts<-table(data$reason, data$treat)
reasonperc<-prop.table(reasoncounts,1)
reasonperc
##             
##                      0         1
##   course     0.5517241 0.4482759
##   home       0.4311927 0.5688073
##   other      0.3888889 0.6111111
##   reputation 0.4000000 0.6000000
chisq.test(data$reason,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$reason and data$treat
## X-squared = 7.5051, df = 3, p-value = 0.05743
ggplot(data, aes(x=reason, fill=treat2)) +  geom_bar(position = 'dodge')

#guardian
guardiancounts<-table(data$guardian, data$treat)
guardianperc<-prop.table(guardiancounts,1)
guardianperc
##         
##                  0         1
##   father 0.5333333 0.4666667
##   mother 0.4652015 0.5347985
##   other  0.2500000 0.7500000
chisq.test(data$guardian,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$guardian and data$treat
## X-squared = 7.6344, df = 2, p-value = 0.02199
ggplot(data, aes(x=guardian, fill=treat2)) +  geom_bar(position = 'dodge')

#traveltime
traveltimecounts<-table(data$traveltime, data$treat)
traveltimeperc<-prop.table(traveltimecounts,1)
traveltimeperc
##    
##             0         1
##   1 0.4513619 0.5486381
##   2 0.4953271 0.5046729
##   3 0.4347826 0.5652174
##   4 0.5000000 0.5000000
chisq.test(data$traveltime,data$treat)
## Warning in chisq.test(data$traveltime, data$treat): Chi-squared approximation
## may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$traveltime and data$treat
## X-squared = 0.70726, df = 3, p-value = 0.8715
ggplot(data, aes(x=traveltime, fill=treat2)) +  geom_bar(position = 'dodge')

#studytime
studytimecounts<-table(data$studytime, data$treat)
studytimeperc<-prop.table(studytimecounts,1)
studytimeperc
##    
##             0         1
##   1 0.4190476 0.5809524
##   2 0.4848485 0.5151515
##   3 0.4923077 0.5076923
##   4 0.4074074 0.5925926
chisq.test(data$studytime,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$studytime and data$treat
## X-squared = 1.7559, df = 3, p-value = 0.6246
ggplot(data, aes(x=studytime, fill=treat2)) +  geom_bar(position = 'dodge')

#failures
tapply(data$failures, data$treat, mean)
##         0         1 
## 0.3114754 0.3537736
tapply(data$age, data$treat, sd)
##        0        1 
## 1.216912 1.294752
t.test(data$failures ~ data$treat)
## 
##  Welch Two Sample t-test
## 
## data:  data$failures by data$treat
## t = -0.56152, df = 379.57, p-value = 0.5748
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1904093  0.1058130
## sample estimates:
## mean in group 0 mean in group 1 
##       0.3114754       0.3537736
ggplot(data, aes(x=failures, fill=treat2)) +  geom_density(alpha=0.25)

#schoolsup
schoolsupcounts<-table(data$schoolsup, data$treat)
schoolsupperc<-prop.table(schoolsupcounts,1)
schoolsupperc
##      
##               0         1
##   no  0.4651163 0.5348837
##   yes 0.4509804 0.5490196
chisq.test(data$schoolsup, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$schoolsup and data$treat
## X-squared = 0.00148, df = 1, p-value = 0.9693
ggplot(data, aes(x=schoolsup, fill=treat2)) +  geom_bar(position = 'dodge')

#famsup
famsupcounts<-table(data$famsup, data$treat)
famsupperc<-prop.table(famsupcounts,1)
famsupperc
##      
##               0         1
##   no  0.4575163 0.5424837
##   yes 0.4669421 0.5330579
chisq.test(data$famsup, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$famsup and data$treat
## X-squared = 0.0063114, df = 1, p-value = 0.9367
ggplot(data, aes(x=famsup, fill=treat2)) +  geom_bar(position = 'dodge')

#paid
paidcounts<-table(data$paid, data$treat)
paidperc<-prop.table(paidcounts,1)
paidperc
##      
##               0         1
##   no  0.4766355 0.5233645
##   yes 0.4475138 0.5524862
chisq.test(data$paid,data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$paid and data$treat
## X-squared = 0.22759, df = 1, p-value = 0.6333
ggplot(data, aes(x=paid, fill=treat2)) +  geom_bar(position = 'dodge')

#activities
activitiescounts<-table(data$activities, data$treat)
activitiesperc<-prop.table(activitiescounts,1)
activitiesperc
##      
##               0         1
##   no  0.4793814 0.5206186
##   yes 0.4477612 0.5522388
chisq.test(data$activities, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$activities and data$treat
## X-squared = 0.27997, df = 1, p-value = 0.5967
ggplot(data, aes(x=activities, fill=treat2)) +  geom_bar(position = 'dodge')

#nursery
nurserycounts<-table(data$nursery, data$treat)
nurseryperc<-prop.table(nurserycounts,1)
nurseryperc
##      
##               0         1
##   no  0.4814815 0.5185185
##   yes 0.4585987 0.5414013
chisq.test(data$nursery, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$nursery and data$treat
## X-squared = 0.059182, df = 1, p-value = 0.8078
ggplot(data, aes(x=nursery, fill=treat2)) +  geom_bar(position = 'dodge')

#higher
highercounts<-table(data$higher, data$treat)
higherperc<-prop.table(highercounts,1)
higherperc
##      
##           0     1
##   no  0.600 0.400
##   yes 0.456 0.544
chisq.test(data$higher, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$higher and data$treat
## X-squared = 1.0573, df = 1, p-value = 0.3038
ggplot(data, aes(x=higher, fill=treat2)) +  geom_bar(position = 'dodge')

#internet
internetcounts<-table(data$internet, data$treat)
internetperc<-prop.table(internetcounts,1)
internetperc
##      
##               0         1
##   no  0.5151515 0.4848485
##   yes 0.4528875 0.5471125
chisq.test(data$internet, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$internet and data$treat
## X-squared = 0.62497, df = 1, p-value = 0.4292
ggplot(data, aes(x=internet, fill=treat2)) +  geom_bar(position = 'dodge')

#romantic
romanticcounts<-table(data$romantic, data$treat)
romanticperc<-prop.table(romanticcounts,1)
romanticperc
##      
##               0         1
##   no  0.4790875 0.5209125
##   yes 0.4318182 0.5681818
chisq.test(data$romantic, data$treat)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data$romantic and data$treat
## X-squared = 0.6111, df = 1, p-value = 0.4344
ggplot(data, aes(x=romantic, fill=treat2)) +  geom_bar(position = 'dodge')

#famrel
famrelcounts<-table(data$famrel, data$treat)
famrelperc<-prop.table(famrelcounts,1)
famrelperc
##    
##             0         1
##   1 0.2500000 0.7500000
##   2 0.3333333 0.6666667
##   3 0.4411765 0.5588235
##   4 0.4871795 0.5128205
##   5 0.4716981 0.5283019
chisq.test(data$famrel, data$treat)
## Warning in chisq.test(data$famrel, data$treat): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$famrel and data$treat
## X-squared = 3.2977, df = 4, p-value = 0.5093
ggplot(data, aes(x=famrel, fill=treat2)) +  geom_bar(position = 'dodge')

#freetime
freetimecounts<-table(data$freetime, data$treat)
freetimeperc<-prop.table(freetimecounts,1)
freetimeperc
##    
##             0         1
##   1 0.4210526 0.5789474
##   2 0.4218750 0.5781250
##   3 0.5159236 0.4840764
##   4 0.4434783 0.5565217
##   5 0.4000000 0.6000000
chisq.test(data$freetime,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$freetime and data$treat
## X-squared = 3.1529, df = 4, p-value = 0.5326
ggplot(data, aes(x=freetime, fill=treat2)) +  geom_bar(position = 'dodge')

#goout
gooutcounts<-table(data$goout, data$treat)
gooutperc<-prop.table(gooutcounts,1)
gooutperc
##    
##             0         1
##   1 0.6956522 0.3043478
##   2 0.5533981 0.4466019
##   3 0.4230769 0.5769231
##   4 0.4302326 0.5697674
##   5 0.3396226 0.6603774
chisq.test(data$goout,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$goout and data$treat
## X-squared = 12.841, df = 4, p-value = 0.01208
ggplot(data, aes(x=goout, fill=treat2)) +  geom_bar(position = 'dodge')

#Dalc
Dalccounts<-table(data$Dalc, data$treat)
Dalcperc<-prop.table(Dalccounts,1)
Dalcperc
##    
##             0         1
##   1 0.5000000 0.5000000
##   2 0.4933333 0.5066667
##   3 0.2692308 0.7307692
##   4 0.0000000 1.0000000
##   5 0.1111111 0.8888889
chisq.test(data$Dalc,data$treat)
## Warning in chisq.test(data$Dalc, data$treat): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  data$Dalc and data$treat
## X-squared = 17.964, df = 4, p-value = 0.001254
ggplot(data, aes(x=Dalc, fill=treat2)) +  geom_bar(position = 'dodge')

#Walc
Walccounts<-table(data$Walc, data$treat)
Walcperc<-prop.table(Walccounts,1)
Walcperc
##    
##             0         1
##   1 0.5695364 0.4304636
##   2 0.5058824 0.4941176
##   3 0.3625000 0.6375000
##   4 0.3529412 0.6470588
##   5 0.2500000 0.7500000
chisq.test(data$Walc,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$Walc and data$treat
## X-squared = 18.364, df = 4, p-value = 0.001048
ggplot(data, aes(x=Walc, fill=treat2)) +  geom_bar(position = 'dodge')

#health
healthcounts<-table(data$health, data$treat)
healthperc<-prop.table(healthcounts,1)
healthperc
##    
##             0         1
##   1 0.4680851 0.5319149
##   2 0.3333333 0.6666667
##   3 0.4505495 0.5494505
##   4 0.3939394 0.6060606
##   5 0.5410959 0.4589041
chisq.test(data$health,data$treat)
## 
##  Pearson's Chi-squared test
## 
## data:  data$health and data$treat
## X-squared = 7.9513, df = 4, p-value = 0.09338
ggplot(data, aes(x=health, fill=treat2)) +  geom_bar(position = 'dodge')

#G3
tapply(data$G3, data$treat, mean)
##         0         1 
##  9.748634 10.990566
tapply(data$age, data$treat, sd)
##        0        1 
## 1.216912 1.294752
t.test(data$G3 ~ data$treat)
## 
##  Welch Two Sample t-test
## 
## data:  data$G3 by data$treat
## t = -2.6088, df = 280.02, p-value = 0.009575
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.179046 -0.304818
## sample estimates:
## mean in group 0 mean in group 1 
##        9.748634       10.990566
ggplot(data, aes(x=G3, fill=treat2)) +  geom_density(alpha=0.25)

## Next time I would try to operationalize this code as a function that runs each variable without explicitly listing each

Noted patterns on initial exploration:

Assigning Inverse Probability Treatment Weighting

Because this is an observational (rather than randomized) study, causation cannot be assigned without accounting for confounding between variables. To emulate a randomized experimental design we want the treatment groups to behave as though they came from the same distribution (i.e. students in each group have similar interactions and assignments of all possible characterstics that impact absences). We accomplish this by assigning inverse probabilty weights/propensity scores to the observations, effectually breaking the impact of confounders between absences and final grades.

Propensity scores up- or down-weight observations based on the propensity (or likelihood) of being assigned to the treatment group.

Assumptions in this analysis are important to acknowledge: 1. Sufficient overlap between observations 2. There are no unknown confounders not accounted for in the analysis

In this analysis, a nonparametric boosting model is used to assign propensity scores to the observations. The benefit of this method is that no assumptions must be met regarding the distribution of variables in the model and variable selection is automatic. In other words, the boosting method can determine the most important characteristics and account for those appropriately. There is also reduced possibility of errors in model fitting compared to parametric methods like logistic regression.

boosted.mod <- ps(treat ~ school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health,
                  data=data,
                  estimand = "ATE",
                  n.trees = 5000, 
                  interaction.depth=2, 
                  perm.test.iters=0, 
                  verbose=FALSE, 
                  stop.method = c("es.mean"))
summary(boosted.mod)
##             n.treat n.ctrl ess.treat ess.ctrl    max.es    mean.es    max.ks
## unw             212    183  212.0000 183.0000 0.3383472 0.10068783 0.1633416
## es.mean.ATE     212    183  196.0893 167.0735 0.2052356 0.06013057 0.0768722
##             max.ks.p    mean.ks iter
## unw               NA 0.03704350   NA
## es.mean.ATE       NA 0.02195048 1727
summary(boosted.mod$gbm.obj,
        n.trees=boosted.mod$desc$es.mean.ATE$n.trees, 
        plot=FALSE)
##                   var     rel.inf
## Walc             Walc 11.99046432
## health         health 11.74255811
## famrel         famrel  8.81694334
## reason         reason  8.46741674
## Dalc             Dalc  8.11038236
## Fjob             Fjob  6.87361354
## guardian     guardian  4.73744587
## age               age  4.67030306
## goout           goout  4.26779338
## Medu             Medu  4.06831347
## studytime   studytime  3.83554156
## freetime     freetime  3.65336362
## Mjob             Mjob  3.54948239
## failures     failures  2.90116318
## paid             paid  2.59354325
## Pstatus       Pstatus  2.23254502
## higher         higher  1.38448948
## Fedu             Fedu  1.24213066
## internet     internet  0.86643004
## schoolsup   schoolsup  0.77508373
## activities activities  0.77014634
## romantic     romantic  0.75397231
## school         school  0.53193651
## traveltime traveltime  0.50330671
## famsup         famsup  0.39092021
## nursery       nursery  0.17605828
## famsize       famsize  0.09465254
## sex               sex  0.00000000
## address       address  0.00000000
data$boosted <- get.weights(boosted.mod)
## Warning in get.weights(boosted.mod): No stop.method specified.  Using es.mean.ATE
hist(data$boosted)

plot(boosted.mod)

plot(boosted.mod, plots=2)

plot(boosted.mod, plots=3)

bal.table(boosted.mod)
## $unw
##                    tx.mn tx.sd  ct.mn ct.sd std.eff.sz  stat     p    ks
## school:GP          0.887 0.317  0.880 0.325      0.022 0.047 0.829 0.007
## school:MS          0.113 0.317  0.120 0.325     -0.022    NA    NA 0.007
## sex:F              0.524 0.499  0.530 0.499     -0.013 0.016 0.898 0.006
## sex:M              0.476 0.499  0.470 0.499      0.013    NA    NA 0.006
## age               16.896 1.295 16.464 1.217      0.338 3.418 0.001 0.141
## address:R          0.236 0.425  0.208 0.406      0.068 0.450 0.503 0.028
## address:U          0.764 0.425  0.792 0.406     -0.068    NA    NA 0.028
## famsize:GT3        0.689 0.463  0.738 0.440     -0.108 1.147 0.285 0.049
## famsize:LE3        0.311 0.463  0.262 0.440      0.108    NA    NA 0.049
## Pstatus:A          0.132 0.339  0.071 0.257      0.200 3.924 0.048 0.061
## Pstatus:T          0.868 0.339  0.929 0.257     -0.200    NA    NA 0.061
## Medu:0             0.000 0.000  0.016 0.127     -0.189 1.919 0.105 0.016
## Medu:1             0.123 0.328  0.180 0.384     -0.162    NA    NA 0.058
## Medu:2             0.250 0.433  0.273 0.446     -0.053    NA    NA 0.023
## Medu:3             0.278 0.448  0.219 0.413      0.138    NA    NA 0.060
## Medu:4             0.349 0.477  0.311 0.463      0.080    NA    NA 0.038
## Fedu:0             0.005 0.069  0.005 0.074     -0.011 0.269 0.898 0.001
## Fedu:1             0.198 0.399  0.219 0.413     -0.050    NA    NA 0.020
## Fedu:2             0.288 0.453  0.295 0.456     -0.016    NA    NA 0.007
## Fedu:3             0.274 0.446  0.230 0.421      0.101    NA    NA 0.044
## Fedu:4             0.236 0.425  0.251 0.434     -0.036    NA    NA 0.016
## Mjob:at_home       0.132 0.339  0.169 0.375     -0.105 0.372 0.829 0.037
## Mjob:health        0.080 0.272  0.093 0.290     -0.045    NA    NA 0.013
## Mjob:other         0.363 0.481  0.350 0.477      0.028    NA    NA 0.013
## Mjob:services      0.274 0.446  0.246 0.431      0.063    NA    NA 0.028
## Mjob:teacher       0.151 0.358  0.142 0.349      0.025    NA    NA 0.009
## Fjob:at_home       0.047 0.212  0.055 0.227     -0.034 0.268 0.898 0.007
## Fjob:health        0.042 0.202  0.049 0.216     -0.032    NA    NA 0.007
## Fjob:other         0.571 0.495  0.525 0.499      0.093    NA    NA 0.046
## Fjob:services      0.264 0.441  0.301 0.458     -0.081    NA    NA 0.036
## Fjob:teacher       0.075 0.264  0.071 0.257      0.017    NA    NA 0.004
## reason:course      0.307 0.461  0.437 0.496     -0.271 2.495 0.058 0.131
## reason:home        0.292 0.455  0.257 0.437      0.080    NA    NA 0.036
## reason:other       0.104 0.305  0.077 0.266      0.095    NA    NA 0.027
## reason:reputation  0.297 0.457  0.230 0.421      0.153    NA    NA 0.068
## guardian:father    0.198 0.399  0.262 0.440     -0.153 3.808 0.023 0.064
## guardian:mother    0.689 0.463  0.694 0.461     -0.011    NA    NA 0.005
## guardian:other     0.113 0.317  0.044 0.204      0.255    NA    NA 0.069
## traveltime:1       0.665 0.472  0.634 0.482      0.065 0.235 0.872 0.031
## traveltime:2       0.255 0.436  0.290 0.454     -0.079    NA    NA 0.035
## traveltime:3       0.061 0.240  0.055 0.227      0.029    NA    NA 0.007
## traveltime:4       0.019 0.136  0.022 0.146     -0.021    NA    NA 0.003
## studytime:1        0.288 0.453  0.240 0.427      0.107 0.584 0.626 0.047
## studytime:2        0.481 0.500  0.525 0.499     -0.087    NA    NA 0.043
## studytime:3        0.156 0.363  0.175 0.380     -0.052    NA    NA 0.019
## studytime:4        0.075 0.264  0.060 0.238      0.061    NA    NA 0.015
## failures           0.354 0.730  0.311 0.760      0.057 0.562 0.574 0.056
## schoolsup:no       0.868 0.339  0.874 0.331     -0.019 0.036 0.850 0.006
## schoolsup:yes      0.132 0.339  0.126 0.331      0.019    NA    NA 0.006
## famsup:no          0.392 0.488  0.383 0.486      0.018 0.033 0.855 0.009
## famsup:yes         0.608 0.488  0.617 0.486     -0.018    NA    NA 0.009
## paid:no            0.528 0.499  0.557 0.497     -0.058 0.334 0.564 0.029
## paid:yes           0.472 0.499  0.443 0.497      0.058    NA    NA 0.029
## activities:no      0.476 0.499  0.508 0.500     -0.064 0.396 0.530 0.032
## activities:yes     0.524 0.499  0.492 0.500      0.064    NA    NA 0.032
## nursery:no         0.198 0.399  0.213 0.410     -0.037 0.135 0.713 0.015
## nursery:yes        0.802 0.399  0.787 0.410      0.037    NA    NA 0.015
## higher:no          0.038 0.191  0.066 0.248     -0.127 1.579 0.210 0.028
## higher:yes         0.962 0.191  0.934 0.248      0.127    NA    NA 0.028
## internet:no        0.151 0.358  0.186 0.389     -0.093 0.855 0.356 0.035
## internet:yes       0.849 0.358  0.814 0.389      0.093    NA    NA 0.035
## romantic:no        0.646 0.478  0.689 0.463     -0.090 0.788 0.375 0.042
## romantic:yes       0.354 0.478  0.311 0.463      0.090    NA    NA 0.042
## famrel:1           0.028 0.166  0.011 0.104      0.123 0.822 0.511 0.017
## famrel:2           0.057 0.231  0.033 0.178      0.114    NA    NA 0.024
## famrel:3           0.179 0.384  0.164 0.370      0.041    NA    NA 0.015
## famrel:4           0.472 0.499  0.519 0.500     -0.095    NA    NA 0.047
## famrel:5           0.264 0.441  0.273 0.446     -0.020    NA    NA 0.009
## freetime:1         0.052 0.222  0.044 0.204      0.038 0.786 0.534 0.008
## freetime:2         0.175 0.380  0.148 0.355      0.073    NA    NA 0.027
## freetime:3         0.358 0.480  0.443 0.497     -0.172    NA    NA 0.084
## freetime:4         0.302 0.459  0.279 0.448      0.051    NA    NA 0.023
## freetime:5         0.113 0.317  0.087 0.282      0.085    NA    NA 0.026
## goout:1            0.033 0.179  0.087 0.282     -0.232 3.202 0.012 0.054
## goout:2            0.217 0.412  0.311 0.463     -0.215    NA    NA 0.094
## goout:3            0.354 0.478  0.301 0.458      0.113    NA    NA 0.053
## goout:4            0.231 0.422  0.202 0.402      0.070    NA    NA 0.029
## goout:5            0.165 0.371  0.098 0.298      0.196    NA    NA 0.067
## Dalc:1             0.651 0.477  0.754 0.431     -0.225 4.503 0.001 0.103
## Dalc:2             0.179 0.384  0.202 0.402     -0.058    NA    NA 0.023
## Dalc:3             0.090 0.286  0.038 0.192      0.207    NA    NA 0.051
## Dalc:4             0.042 0.202  0.000 0.000      0.285    NA    NA 0.042
## Dalc:5             0.038 0.191  0.005 0.074      0.216    NA    NA 0.032
## Walc:1             0.307 0.461  0.470 0.499     -0.336 4.579 0.001 0.163
## Walc:2             0.198 0.399  0.235 0.424     -0.090    NA    NA 0.037
## Walc:3             0.241 0.427  0.158 0.365      0.204    NA    NA 0.082
## Walc:4             0.156 0.363  0.098 0.298      0.171    NA    NA 0.057
## Walc:5             0.099 0.299  0.038 0.192      0.237    NA    NA 0.061
## health:1           0.118 0.323  0.120 0.325     -0.007 1.983 0.095 0.002
## health:2           0.142 0.349  0.082 0.274      0.187    NA    NA 0.060
## health:3           0.236 0.425  0.224 0.417      0.028    NA    NA 0.012
## health:4           0.189 0.391  0.142 0.349      0.125    NA    NA 0.047
## health:5           0.316 0.465  0.432 0.495     -0.240    NA    NA 0.116
##                   ks.pval
## school:GP           0.829
## school:MS           0.829
## sex:F               0.898
## sex:M               0.898
## age                 0.036
## address:R           0.503
## address:U           0.503
## famsize:GT3         0.285
## famsize:LE3         0.285
## Pstatus:A           0.048
## Pstatus:T           0.048
## Medu:0              0.105
## Medu:1              0.105
## Medu:2              0.105
## Medu:3              0.105
## Medu:4              0.105
## Fedu:0              0.898
## Fedu:1              0.898
## Fedu:2              0.898
## Fedu:3              0.898
## Fedu:4              0.898
## Mjob:at_home        0.829
## Mjob:health         0.829
## Mjob:other          0.829
## Mjob:services       0.829
## Mjob:teacher        0.829
## Fjob:at_home        0.898
## Fjob:health         0.898
## Fjob:other          0.898
## Fjob:services       0.898
## Fjob:teacher        0.898
## reason:course       0.058
## reason:home         0.058
## reason:other        0.058
## reason:reputation   0.058
## guardian:father     0.023
## guardian:mother     0.023
## guardian:other      0.023
## traveltime:1        0.872
## traveltime:2        0.872
## traveltime:3        0.872
## traveltime:4        0.872
## studytime:1         0.626
## studytime:2         0.626
## studytime:3         0.626
## studytime:4         0.626
## failures            0.900
## schoolsup:no        0.850
## schoolsup:yes       0.850
## famsup:no           0.855
## famsup:yes          0.855
## paid:no             0.564
## paid:yes            0.564
## activities:no       0.530
## activities:yes      0.530
## nursery:no          0.713
## nursery:yes         0.713
## higher:no           0.210
## higher:yes          0.210
## internet:no         0.356
## internet:yes        0.356
## romantic:no         0.375
## romantic:yes        0.375
## famrel:1            0.511
## famrel:2            0.511
## famrel:3            0.511
## famrel:4            0.511
## famrel:5            0.511
## freetime:1          0.534
## freetime:2          0.534
## freetime:3          0.534
## freetime:4          0.534
## freetime:5          0.534
## goout:1             0.012
## goout:2             0.012
## goout:3             0.012
## goout:4             0.012
## goout:5             0.012
## Dalc:1              0.001
## Dalc:2              0.001
## Dalc:3              0.001
## Dalc:4              0.001
## Dalc:5              0.001
## Walc:1              0.001
## Walc:2              0.001
## Walc:3              0.001
## Walc:4              0.001
## Walc:5              0.001
## health:1            0.095
## health:2            0.095
## health:3            0.095
## health:4            0.095
## health:5            0.095
## 
## $es.mean.ATE
##                    tx.mn tx.sd  ct.mn ct.sd std.eff.sz  stat     p    ks
## school:GP          0.893 0.309  0.883 0.321      0.032 0.098 0.755 0.010
## school:MS          0.107 0.309  0.117 0.321     -0.032    NA    NA 0.010
## sex:F              0.537 0.499  0.529 0.499      0.016 0.022 0.881 0.008
## sex:M              0.463 0.499  0.471 0.499     -0.016    NA    NA 0.008
## age               16.801 1.270 16.598 1.220      0.159 1.567 0.118 0.066
## address:R          0.227 0.419  0.201 0.401      0.062 0.364 0.546 0.026
## address:U          0.773 0.419  0.799 0.401     -0.062    NA    NA 0.026
## famsize:GT3        0.693 0.461  0.737 0.440     -0.098 0.873 0.351 0.044
## famsize:LE3        0.307 0.461  0.263 0.440      0.098    NA    NA 0.044
## Pstatus:A          0.119 0.323  0.080 0.271      0.127 1.468 0.226 0.039
## Pstatus:T          0.881 0.323  0.920 0.271     -0.127    NA    NA 0.039
## Medu:0             0.000 0.000  0.015 0.120     -0.168 0.919 0.452 0.015
## Medu:1             0.137 0.344  0.157 0.364     -0.057    NA    NA 0.020
## Medu:2             0.274 0.446  0.294 0.456     -0.045    NA    NA 0.020
## Medu:3             0.257 0.437  0.235 0.424      0.051    NA    NA 0.022
## Medu:4             0.331 0.471  0.299 0.458      0.069    NA    NA 0.032
## Fedu:0             0.006 0.079  0.004 0.066      0.028 0.126 0.973 0.002
## Fedu:1             0.207 0.405  0.206 0.404      0.005    NA    NA 0.002
## Fedu:2             0.300 0.458  0.319 0.466     -0.042    NA    NA 0.019
## Fedu:3             0.265 0.441  0.238 0.426      0.062    NA    NA 0.027
## Fedu:4             0.221 0.415  0.233 0.423     -0.028    NA    NA 0.012
## Mjob:at_home       0.138 0.345  0.156 0.363     -0.051 0.180 0.948 0.018
## Mjob:health        0.073 0.261  0.090 0.286     -0.060    NA    NA 0.017
## Mjob:other         0.377 0.485  0.354 0.478      0.048    NA    NA 0.023
## Mjob:services      0.271 0.445  0.259 0.438      0.028    NA    NA 0.012
## Mjob:teacher       0.140 0.347  0.140 0.347      0.000    NA    NA 0.000
## Fjob:at_home       0.047 0.212  0.050 0.218     -0.013 0.068 0.991 0.003
## Fjob:health        0.040 0.195  0.046 0.210     -0.031    NA    NA 0.007
## Fjob:other         0.588 0.492  0.571 0.495      0.034    NA    NA 0.017
## Fjob:services      0.262 0.440  0.275 0.447     -0.028    NA    NA 0.013
## Fjob:teacher       0.063 0.243  0.058 0.233      0.021    NA    NA 0.005
## reason:course      0.330 0.470  0.395 0.489     -0.135 0.587 0.623 0.065
## reason:home        0.288 0.453  0.268 0.443      0.047    NA    NA 0.021
## reason:other       0.097 0.296  0.078 0.268      0.067    NA    NA 0.019
## reason:reputation  0.284 0.451  0.259 0.438      0.056    NA    NA 0.025
## guardian:father    0.212 0.409  0.247 0.431     -0.083 1.309 0.271 0.035
## guardian:mother    0.695 0.460  0.702 0.457     -0.015    NA    NA 0.007
## guardian:other     0.092 0.289  0.051 0.219      0.152    NA    NA 0.041
## traveltime:1       0.659 0.474  0.635 0.481      0.051 0.165 0.916 0.024
## traveltime:2       0.269 0.443  0.294 0.456     -0.056    NA    NA 0.025
## traveltime:3       0.058 0.234  0.053 0.225      0.021    NA    NA 0.005
## traveltime:4       0.013 0.115  0.018 0.133     -0.032    NA    NA 0.005
## studytime:1        0.273 0.446  0.228 0.420      0.102 0.381 0.767 0.045
## studytime:2        0.493 0.500  0.537 0.499     -0.087    NA    NA 0.044
## studytime:3        0.159 0.366  0.166 0.372     -0.019    NA    NA 0.007
## studytime:4        0.075 0.263  0.069 0.253      0.023    NA    NA 0.006
## failures           0.325 0.706  0.297 0.735      0.038 0.383 0.702 0.041
## schoolsup:no       0.867 0.339  0.884 0.320     -0.050 0.250 0.618 0.017
## schoolsup:yes      0.133 0.339  0.116 0.320      0.050    NA    NA 0.017
## famsup:no          0.395 0.489  0.381 0.486      0.028 0.069 0.793 0.013
## famsup:yes         0.605 0.489  0.619 0.486     -0.028    NA    NA 0.013
## paid:no            0.530 0.499  0.540 0.498     -0.019 0.032 0.857 0.009
## paid:yes           0.470 0.499  0.460 0.498      0.019    NA    NA 0.009
## activities:no      0.486 0.500  0.517 0.500     -0.063 0.356 0.551 0.031
## activities:yes     0.514 0.500  0.483 0.500      0.063    NA    NA 0.031
## nursery:no         0.203 0.402  0.210 0.407     -0.017 0.026 0.873 0.007
## nursery:yes        0.797 0.402  0.790 0.407      0.017    NA    NA 0.007
## higher:no          0.043 0.204  0.061 0.240     -0.081 0.563 0.454 0.018
## higher:yes         0.957 0.204  0.939 0.240      0.081    NA    NA 0.018
## internet:no        0.152 0.359  0.176 0.380     -0.064 0.400 0.527 0.024
## internet:yes       0.848 0.359  0.824 0.380      0.064    NA    NA 0.024
## romantic:no        0.652 0.476  0.691 0.462     -0.083 0.625 0.430 0.039
## romantic:yes       0.348 0.476  0.309 0.462      0.083    NA    NA 0.039
## famrel:1           0.025 0.157  0.012 0.109      0.093 0.388 0.815 0.013
## famrel:2           0.045 0.208  0.030 0.170      0.074    NA    NA 0.015
## famrel:3           0.183 0.387  0.177 0.382      0.014    NA    NA 0.005
## famrel:4           0.474 0.499  0.503 0.500     -0.058    NA    NA 0.029
## famrel:5           0.273 0.445  0.277 0.448     -0.011    NA    NA 0.005
## freetime:1         0.050 0.218  0.043 0.203      0.033 0.335 0.854 0.007
## freetime:2         0.161 0.368  0.145 0.353      0.042    NA    NA 0.016
## freetime:3         0.383 0.486  0.438 0.496     -0.112    NA    NA 0.055
## freetime:4         0.299 0.458  0.287 0.452      0.026    NA    NA 0.012
## freetime:5         0.107 0.309  0.087 0.282      0.067    NA    NA 0.020
## goout:1            0.039 0.193  0.070 0.255     -0.133 0.794 0.529 0.031
## goout:2            0.245 0.430  0.276 0.447     -0.070    NA    NA 0.031
## goout:3            0.348 0.476  0.329 0.470      0.040    NA    NA 0.019
## goout:4            0.224 0.417  0.220 0.414      0.008    NA    NA 0.003
## goout:5            0.144 0.351  0.105 0.306      0.116    NA    NA 0.039
## Dalc:1             0.694 0.461  0.728 0.445     -0.072 1.377 0.246 0.033
## Dalc:2             0.178 0.382  0.203 0.402     -0.064    NA    NA 0.025
## Dalc:3             0.071 0.256  0.052 0.222      0.076    NA    NA 0.019
## Dalc:4             0.031 0.172  0.000 0.000      0.205    NA    NA 0.031
## Dalc:5             0.026 0.161  0.018 0.131      0.060    NA    NA 0.009
## Walc:1             0.347 0.476  0.424 0.494     -0.158 0.817 0.512 0.077
## Walc:2             0.214 0.410  0.230 0.421     -0.038    NA    NA 0.016
## Walc:3             0.223 0.416  0.183 0.387      0.100    NA    NA 0.040
## Walc:4             0.138 0.345  0.109 0.312      0.086    NA    NA 0.029
## Walc:5             0.077 0.267  0.053 0.225      0.092    NA    NA 0.024
## health:1           0.116 0.320  0.118 0.322     -0.007 0.362 0.836 0.002
## health:2           0.130 0.336  0.098 0.297      0.101    NA    NA 0.032
## health:3           0.232 0.422  0.219 0.413      0.031    NA    NA 0.013
## health:4           0.177 0.382  0.172 0.377      0.015    NA    NA 0.006
## health:5           0.345 0.475  0.394 0.489     -0.101    NA    NA 0.049
##                   ks.pval
## school:GP           0.755
## school:MS           0.755
## sex:F               0.881
## sex:M               0.881
## age                 0.792
## address:R           0.546
## address:U           0.546
## famsize:GT3         0.351
## famsize:LE3         0.351
## Pstatus:A           0.226
## Pstatus:T           0.226
## Medu:0              0.452
## Medu:1              0.452
## Medu:2              0.452
## Medu:3              0.452
## Medu:4              0.452
## Fedu:0              0.973
## Fedu:1              0.973
## Fedu:2              0.973
## Fedu:3              0.973
## Fedu:4              0.973
## Mjob:at_home        0.948
## Mjob:health         0.948
## Mjob:other          0.948
## Mjob:services       0.948
## Mjob:teacher        0.948
## Fjob:at_home        0.991
## Fjob:health         0.991
## Fjob:other          0.991
## Fjob:services       0.991
## Fjob:teacher        0.991
## reason:course       0.623
## reason:home         0.623
## reason:other        0.623
## reason:reputation   0.623
## guardian:father     0.271
## guardian:mother     0.271
## guardian:other      0.271
## traveltime:1        0.916
## traveltime:2        0.916
## traveltime:3        0.916
## traveltime:4        0.916
## studytime:1         0.767
## studytime:2         0.767
## studytime:3         0.767
## studytime:4         0.767
## failures            0.996
## schoolsup:no        0.618
## schoolsup:yes       0.618
## famsup:no           0.793
## famsup:yes          0.793
## paid:no             0.857
## paid:yes            0.857
## activities:no       0.551
## activities:yes      0.551
## nursery:no          0.873
## nursery:yes         0.873
## higher:no           0.454
## higher:yes          0.454
## internet:no         0.527
## internet:yes        0.527
## romantic:no         0.430
## romantic:yes        0.430
## famrel:1            0.815
## famrel:2            0.815
## famrel:3            0.815
## famrel:4            0.815
## famrel:5            0.815
## freetime:1          0.854
## freetime:2          0.854
## freetime:3          0.854
## freetime:4          0.854
## freetime:5          0.854
## goout:1             0.529
## goout:2             0.529
## goout:3             0.529
## goout:4             0.529
## goout:5             0.529
## Dalc:1              0.246
## Dalc:2              0.246
## Dalc:3              0.246
## Dalc:4              0.246
## Dalc:5              0.246
## Walc:1              0.512
## Walc:2              0.512
## Walc:3              0.512
## Walc:4              0.512
## Walc:5              0.512
## health:1            0.836
## health:2            0.836
## health:3            0.836
## health:4            0.836
## health:5            0.836

The variables with highest relative influence on the model are:

Overall the model appears to have done a sufficient job creating balance between the absence groups using propensity scores:

Causal Analysis using Propensity Scores of Absences compared to Observational Analysis using a Linear Model without weighting

library(survey)
design <- svydesign(ids=~1, weights=~boosted, data=data)
glm1 <- svyglm(G3 ~ treat, design=design)
summary(glm1)
## 
## Call:
## svyglm(formula = G3 ~ treat, design = design)
## 
## Survey design:
## svydesign(ids = ~1, weights = ~boosted, data = data)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6554     0.4379  22.051  < 2e-16 ***
## treat         1.2882     0.4937   2.609  0.00941 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 21.06651)
## 
## Number of Fisher Scoring iterations: 2
summary(lm(G3 ~ treat + school + sex + age + address + famsize + Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + traveltime + studytime + failures + schoolsup + famsup + paid + activities + nursery + higher + internet + romantic + famrel + freetime + goout + Dalc + Walc + health, data=data))
## 
## Call:
## lm(formula = G3 ~ treat + school + sex + age + address + famsize + 
##     Pstatus + Medu + Fedu + Mjob + Fjob + reason + guardian + 
##     traveltime + studytime + failures + schoolsup + famsup + 
##     paid + activities + nursery + higher + internet + romantic + 
##     famrel + freetime + goout + Dalc + Walc + health, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1355  -1.8760   0.2887   2.5208   8.1363 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      20.151948   6.144831   3.279  0.00115 ** 
## treat             1.502487   0.460329   3.264  0.00121 ** 
## schoolMS          0.532316   0.807679   0.659  0.51031    
## sexM              1.131111   0.508792   2.223  0.02689 *  
## age              -0.362450   0.219431  -1.652  0.09954 .  
## addressU          0.487091   0.600682   0.811  0.41801    
## famsizeLE3        0.928643   0.502577   1.848  0.06554 .  
## PstatusT         -0.128065   0.744580  -0.172  0.86355    
## Medu1            -5.544647   2.545040  -2.179  0.03007 *  
## Medu2            -5.086842   2.547288  -1.997  0.04666 *  
## Medu3            -4.575419   2.568162  -1.782  0.07574 .  
## Medu4            -3.458266   2.651455  -1.304  0.19305    
## Fedu1            -1.561613   3.049864  -0.512  0.60898    
## Fedu2            -2.136014   3.054245  -0.699  0.48482    
## Fedu3            -2.031520   3.059328  -0.664  0.50713    
## Fedu4            -1.911090   3.109291  -0.615  0.53922    
## Mjobhealth        0.258299   1.153241   0.224  0.82292    
## Mjobother        -0.372113   0.719372  -0.517  0.60531    
## Mjobservices      0.536011   0.812412   0.660  0.50986    
## Mjobteacher      -1.822923   1.077738  -1.691  0.09171 .  
## Fjobhealth        0.734201   1.454764   0.505  0.61412    
## Fjobother        -0.015496   1.040852  -0.015  0.98813    
## Fjobservices      0.226038   1.071624   0.211  0.83307    
## Fjobteacher       1.409956   1.339151   1.053  0.29318    
## reasonhome        0.327844   0.559582   0.586  0.55837    
## reasonother       0.812056   0.820637   0.990  0.32313    
## reasonreputation  0.706095   0.580919   1.215  0.22506    
## guardianmother   -0.133738   0.551167  -0.243  0.80843    
## guardianother     0.353286   1.008572   0.350  0.72635    
## traveltime2      -0.496755   0.517630  -0.960  0.33793    
## traveltime3       0.436326   1.007957   0.433  0.66539    
## traveltime4      -0.617111   1.684344  -0.366  0.71432    
## studytime2        0.633603   0.561360   1.129  0.25985    
## studytime3        1.810980   0.776981   2.331  0.02037 *  
## studytime4        0.550546   1.003203   0.549  0.58353    
## failures         -1.759718   0.336435  -5.230 3.02e-07 ***
## schoolsupyes     -1.137042   0.675290  -1.684  0.09318 .  
## famsupyes        -0.835687   0.480092  -1.741  0.08268 .  
## paidyes           0.345747   0.491419   0.704  0.48220    
## activitiesyes    -0.550787   0.452164  -1.218  0.22406    
## nurseryyes        0.002477   0.561081   0.004  0.99648    
## higheryes         0.862466   1.104274   0.781  0.43535    
## internetyes       0.665971   0.622520   1.070  0.28550    
## romanticyes      -1.264604   0.480198  -2.634  0.00885 ** 
## famrel2           0.634126   1.862254   0.341  0.73369    
## famrel3           0.541825   1.629220   0.333  0.73967    
## famrel4           0.929551   1.582494   0.587  0.55734    
## famrel5           0.963462   1.608358   0.599  0.54956    
## freetime2         1.501133   1.142114   1.314  0.18965    
## freetime3         0.211635   1.082754   0.195  0.84515    
## freetime4         0.925131   1.122519   0.824  0.41045    
## freetime5         2.733815   1.279865   2.136  0.03342 *  
## goout2            1.123072   1.003618   1.119  0.26395    
## goout3            0.447399   1.003043   0.446  0.65586    
## goout4           -0.689451   1.049833  -0.657  0.51182    
## goout5           -1.317822   1.141852  -1.154  0.24930    
## Dalc2            -1.138735   0.662004  -1.720  0.08635 .  
## Dalc3            -0.858622   1.048376  -0.819  0.41338    
## Dalc4            -2.936450   1.623896  -1.808  0.07148 .  
## Dalc5            -1.539840   1.872343  -0.822  0.41144    
## Walc2            -0.529700   0.621374  -0.852  0.39458    
## Walc3             0.617525   0.700787   0.881  0.37886    
## Walc4            -0.007190   0.900554  -0.008  0.99363    
## Walc5             2.364031   1.344224   1.759  0.07957 .  
## health2          -1.991989   0.918020  -2.170  0.03074 *  
## health3          -1.335354   0.808179  -1.652  0.09943 .  
## health4          -1.422606   0.847843  -1.678  0.09432 .  
## health5          -1.251048   0.755965  -1.655  0.09890 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.01 on 327 degrees of freedom
## Multiple R-squared:  0.3641, Adjusted R-squared:  0.2338 
## F-statistic: 2.795 on 67 and 327 DF,  p-value: 7.803e-10

The final analysis shows a causal relationship between \(\geq\) 3 absences on final math scores. This is true both with causal design (with propensity scores) and without (linear model)

Interesting the variables with highest relative importance in the boosting model to assign weights are not aligned with the variables assigned signficance in the linear model.